Discussion:
[9fans] International Ispell in Plan9
(too old to reply)
trebol
2013-04-10 22:01:36 UTC
Permalink
Hello everyone,

First of all, I'm just starting to learn programming, and I'm a complete
newbie in Plan9, so please, be patient... I was sad with the English
only spell checker, so I compiled international ispell with ape:

-Installed pdcurses.
-Patched term.c for termios.h, I used a linux patch (I don't have idea)
-Modified correct.c:827 (I had type conflicts).
(void) fputs ((const char*)tok, logfile);
-Edited local.h.generic for local.h:

#define MINIMENU /* Display a mini-menu at the bottom of the screen */
#define USG /* Define on System V or if term.c won't compile */
#undef NO_FCNTL_H /* Define if you get compile errors on fcntl.h */
#define NO_MKSTEMP /* Define if you get compile or link errors */
#define CFLAGS "-O -D_POSIX_SOURCE -D_BSD_EXTENSION"
#define TERMLIB "-lcurses"
#define REGLIB ""
#undef NO8BIT
#define WORDS "/usr/trebol/local/share/dict/words"

#define LANGUAGES "{american,MASTERDICTS=american.med,HASHFILES=americanmed.hash,EXTRADICT=} {español}"
/*
* Important directory paths. If you change MAN45DIR from man5 to
* something else, you probably also want to set MAN45SECT and
* MAN45EXT (but not if you keep the man pages in section 5 and just
* store them in a different place).
*/
#define BINDIR "/usr/trebol/local/bin"
#define LIBDIR "/usr/trebol/local/lib"
#define MAN1DIR "/usr/trebol/local/man/man1"
#define MAN45DIR "/usr/trebol/local/man/man5"

I used http://www.datsi.fi.upm.es/~coes/espa~nol-1.7.tar.gz for Spanish
spell checking, untar it in ispell-3.3.02/languages/español and made
some changes:

-Changed all files and directories names to utf8 (acme don't work fine with the ~)
-Added a utf8 formatter to the aff file:

altstringtype "utf8" "tex" ".txt"

altstringchar á \'a
altstringchar Á \'A
altstringchar é \'e
altstringchar É \'E
altstringchar í \'i
altstringchar Í \'I
altstringchar ñ \'n
altstringchar Ñ \'N
altstringchar ó \'o
altstringchar Ó \'O
altstringchar ú \'u
altstringchar Ú \'U
altstringchar ü \"u
altstringchar Ü \"U

-Edited Makefile, changed LANGUAGE to español and corrected paths:

...
PATHADDER = ../../
BUILDHASH = ../../buildhash
UNSQ = ../../unsq
FIX8BIT = ../../fix8bit
...
LANGUAGE = español
...
eñe:
sh eñes
...
../../munchlist -v -l ...

'make' works fine, but the deformatters must be compiled, 'make install'
expects executables but 'make all' creates .o files. After 'make install'
the Spanish .aff and .hash files must be moved to the correct directory.

Well, ispell's normal mode works, but the suggestions and the line of the
misspelled word aren't showed (curses's problems?). Interactive and -a,
-l modes works fine.

The next problem was acme. I had to change spout.c:63 for no ASCII characters:
if(isalpharune(c))

Then aspell. I made a version for ispell... well this was for me a
nightmare, but I've learned a lot about rc.

#!/bin/rc

args=()
spellflags=()
for(x){
switch($x){
case -d*
spellflags=($spellflags $x)
case -p*
spellflags=($spellflags $x)
case -T*
spellflags=($spellflags $x)
case *
args = ($args $x)
}
}

dir = /mnt/wsys
if(! test -f $dir/cons)
dir = /mnt/term/$dir
id=`{cat $dir/new/ctl}
id=$id(1)

if(~ $#args 1 && ~ $args /*){
adir = `{basename -d $args}
args = `{basename $args}
echo 'name '^$adir^/-spell > $dir/$id/ctl
cd $adir
}
if not {
echo 'name '^`{pwd}^/-spell > $dir/$id/ctl
}

{
echo noscroll
if(~ $#args 0)
for(j in `{$home/local/bin/acme/spout | sort -t: -u +2 | sort -t: +1.1n}){if(test `{echo -n $j | $home/local/bin/ispell -l $spellflags}){echo -n $j; echo -n $j | $home/local/bin/ispell -a $spellflags | awk -F: '/^&/{ORS=""; print $2}'; echo}} > $dir/$id/body
if not for(i in $args)
cat $i | for(j in `{$home/local/bin/acme/spout | sort -t: -u +2 | sort -t: +1.1n}){if(test `{echo -n $j | $home/local/bin/ispell -l $spellflags}){echo -n $i; echo -n $j; echo -n $j | $home/local/bin/ispell -a $spellflags | awk -F: '/^&/{ORS=""; print $2}'; echo}} > $dir/$id/body
echo clean
}> $dir/$id/ctl


This works, and the output is like this:

test:#0,#7:centeya centena
test:#8,#14:camiño camilo, camino, cariño
test:#15,#21:camion camino, camión

So you can click to visit the file in the word misspelled, and also see the suggestions.

With -dlanguage, -ppersonaldictionary and -Tformatter you can use all of
the international ispell's dictionaries. I put functions in my profile
for a easy use:

fn aispelles {$home/local/bin/acme/aispell -p$home/lib/pdict -Tutf8 -despañol $*}
fn aispellen {$home/local/bin/acme/aispell -p$home/lib/pdict_en -damerican $*}

And you can have '>> /personal/dictionary/path' to acme's tag (or commands
file's window) for adding words to your personal dictionary quickly.

The problem is the 'for' statement. When the file grows a little
the script works VERY slow. And some times I have 'test: unexpected
operator/operand:' with this test expression.

I'm sure there is a faster (and more proper) way to make this right,
so I'll appreciate any help.

Regards,
trebol.
Mark van Atten
2013-04-11 06:57:03 UTC
Permalink
That's interesting work---thanks!

Mark.
Nemo
2013-04-11 17:57:13 UTC
Permalink
you could put it in sources, if not yet there.
trebol
2013-04-12 12:21:53 UTC
Permalink
Post by Nemo
you could put it in sources, if not yet there.
I want to put order in this mess before put it in sources.

I change the for loop to work in the output of ispell instead, and now
ispell works only one time in terse mode. The script is now much faster
thanks to the good design of awk and grep.

#!/bin/rc

rm -f /tmp/$pid^'.'aispell*

args=()
spellflags=()
for(x){
switch($x){
case -d*
spellflags=($spellflags $x)
case -p*
spellflags=($spellflags $x)
case -T*
spellflags=($spellflags $x)
case *
args = ($args $x)
}
}

dir = /mnt/wsys
if(! test -f $dir/cons)
dir = /mnt/term/$dir
id=`{cat $dir/new/ctl}
id=$id(1)

if(~ $#args 1 && ~ $args /*){
adir = `{basename -d $args}
args = `{basename $args}
echo 'name '^$adir^/-spell > $dir/$id/ctl
cd $adir
}
if not {
echo 'name '^`{pwd}^/-spell > $dir/$id/ctl
}

{
echo noscroll
if(~ $#args 0){
cat > /tmp/$pid^'.'aispell0; i = /tmp/$pid^'.'aispell0; winname = `{cat /mnt/acme/$winid/tag | awk '{print $1}'}; for(j in `{cat $i | $home/local/bin/ispell -a $spellflags | awk '/^[&#]/{gsub(/ /,"_"); print}'}){$home/local/bin/acme/spout $i | grep `{echo $j | awk -F_ '{print $2}'} | awk -F: '{OFS=":";$1 = "'$winname'"; print}' >> /tmp/$pid^'.'aispell } ; sort -u /tmp/$pid^'.'aispell > $dir/$id/body; rm -f /tmp/$pid^'.'aispell*
}
if not for(i in $args){
for(j in `{cat $i | $home/local/bin/ispell -a $spellflags | awk '/^[&#]/{gsub(/ /,"_"); print}'}){$home/local/bin/acme/spout $i | grep `{echo $j | awk -F_ '{print $2}'} >> /tmp/$pid^'.'aispell } ; sort -u /tmp/$pid^'.'aispell > $dir/$id/body; rm -f /tmp/$pid^'.'aispell
}
echo clean
}> $dir/$id/ctl


Now you can use it in the tag line to spell check the dot, and the
output begins with the name of the window, so if you select all the
window's body, you can spell check it without save it with the same
commodity. Of course the addresses of the misspelled words don't works
with a common selection.

To make this work, I need to know how to get the dot address within
the script. Also the functions don't work in acme, but a similar script
works. Why?

fn aispellen {$home/local/bin/acme/aispell -p$home/lib/pdict_en -damerican $*}

Any help?
Francisco J Ballesteros
2013-04-12 12:39:55 UTC
Permalink
look under /acme for examples.
Post by trebol
Post by Nemo
you could put it in sources, if not yet there.
I want to put order in this mess before put it in sources.
I change the for loop to work in the output of ispell instead, and now
ispell works only one time in terse mode. The script is now much faster
thanks to the good design of awk and grep.
#!/bin/rc
rm -f /tmp/$pid^'.'aispell*
args=()
spellflags=()
for(x){
switch($x){
case -d*
spellflags=($spellflags $x)
case -p*
spellflags=($spellflags $x)
case -T*
spellflags=($spellflags $x)
case *
args = ($args $x)
}
}
dir = /mnt/wsys
if(! test -f $dir/cons)
dir = /mnt/term/$dir
id=`{cat $dir/new/ctl}
id=$id(1)
if(~ $#args 1 && ~ $args /*){
adir = `{basename -d $args}
args = `{basename $args}
echo 'name '^$adir^/-spell > $dir/$id/ctl
cd $adir
}
if not {
echo 'name '^`{pwd}^/-spell > $dir/$id/ctl
}
{
echo noscroll
if(~ $#args 0){
cat > /tmp/$pid^'.'aispell0; i = /tmp/$pid^'.'aispell0; winname = `{cat /mnt/acme/$winid/tag | awk '{print $1}'}; for(j in `{cat $i | $home/local/bin/ispell -a $spellflags | awk '/^[&#]/{gsub(/ /,"_"); print}'}){$home/local/bin/acme/spout $i | grep `{echo $j | awk -F_ '{print $2}'} | awk -F: '{OFS=":";$1 = "'$winname'"; print}' >> /tmp/$pid^'.'aispell } ; sort -u /tmp/$pid^'.'aispell > $dir/$id/body; rm -f /tmp/$pid^'.'aispell*
}
if not for(i in $args){
for(j in `{cat $i | $home/local/bin/ispell -a $spellflags | awk '/^[&#]/{gsub(/ /,"_"); print}'}){$home/local/bin/acme/spout $i | grep `{echo $j | awk -F_ '{print $2}'} >> /tmp/$pid^'.'aispell } ; sort -u /tmp/$pid^'.'aispell > $dir/$id/body; rm -f /tmp/$pid^'.'aispell
}
echo clean
}> $dir/$id/ctl
Now you can use it in the tag line to spell check the dot, and the
output begins with the name of the window, so if you select all the
window's body, you can spell check it without save it with the same
commodity. Of course the addresses of the misspelled words don't works
with a common selection.
To make this work, I need to know how to get the dot address within
the script. Also the functions don't work in acme, but a similar script
works. Why?
fn aispellen {$home/local/bin/acme/aispell -p$home/lib/pdict_en -damerican $*}
Any help?
erik quanstrom
2013-04-12 12:56:21 UTC
Permalink
Post by trebol
{
echo noscroll
if(~ $#args 0){
cat > /tmp/$pid^'.'aispell0; i = /tmp/$pid^'.'aispell0; winname = `{cat /mnt/acme/$winid/tag | awk '{print $1}'}; for(j in `{cat $i | $home/local/bin/ispell -a $spellflags | awk '/^[&#]/{gsub(/ /,"_"); print}'}){$home/local/bin/acme/spout $i | grep `{echo $j | awk -F_ '{print $2}'} | awk -F: '{OFS=":";$1 = "'$winname'"; print}' >> /tmp/$pid^'.'aispell } ; sort -u /tmp/$pid^'.'aispell > $dir/$id/body; rm -f /tmp/$pid^'.'aispell*
}
if not for(i in $args){
for(j in `{cat $i | $home/local/bin/ispell -a $spellflags | awk '/^[&#]/{gsub(/ /,"_"); print}'}){$home/local/bin/acme/spout $i | grep `{echo $j | awk -F_ '{print $2}'} >> /tmp/$pid^'.'aispell } ; sort -u /tmp/$pid^'.'aispell > $dir/$id/body; rm -f /tmp/$pid^'.'aispell
}
echo clean
}> $dir/$id/ctl
the rest of the script is nicely formatted. but it looks
like this bit could use some formatting. remember, you
get a free newline after { and |.

there is some opportunity to simplify, too. for example
cat /mnt/acme/$winid/tag | awk '{print $1}'
is more simply
sed 's/ .*//g' < /mnt/acme/$winid/tag
i see several places one could replace awk with sed and cat with
redirection.

and i think further simplification is possible by using
{} instead of the temporary file dance. and i'd imagine
that that's where the bug is.

- erik
trebol
2013-04-13 03:36:45 UTC
Permalink
Thanks for the help erik!

This is the best I made for now...

/////////////////////////////////////
/////////////////////////////////////

#!/bin/rc

rm -f /tmp/$pid^'.'aispell*

args=()
spellflags=()
for(x){
switch($x){
case -d*
spellflags=($spellflags $x)
case -p*
spellflags=($spellflags $x)
case -T*
spellflags=($spellflags $x)
case *
args = ($args $x)
}
}

dir = /mnt/wsys
if(! test -f $dir/cons)
dir = /mnt/term/$dir
id=`{cat $dir/new/ctl}
id=$id(1)

if(~ $#args 1 && ~ $args /*){
adir = `{basename -d $args}
args = `{basename $args}
echo 'name '^$adir^/-spell > $dir/$id/ctl
cd $adir
}
if not {
echo 'name '^`{pwd}^/-spell > $dir/$id/ctl
}

{
echo noscroll
if(~ $#args 0){
cat > /tmp/$pid^'.'aispell0
i = /tmp/$pid^'.'aispell0
winname = `{sed 's/ .*//g' < /mnt/acme/$winid/tag}
if(~ $winname '') winname = nonamedwindow
for(j in `{$home/local/bin/ispell -a $spellflags < $i | grep '^[&#]' | sed 's/ /_/g'}){
{cat $i; echo} | $home/local/bin/acme/spout | # spout needs \n
grep `{echo $j |
awk -F_ '{print $2}'} |
awk -F: '{OFS=":";$1 = "'$i'"; print}' >> /tmp/$pid^'.'aispell
}
sort -u /tmp/$pid^'.'aispell > $dir/$id/body
rm -f /tmp/$pid^'.'aispell*
}
if not for(i in $args){
for(j in `{ $home/local/bin/ispell -a $spellflags < $i | grep '^[&#]' | sed 's/ /_/g'}){
{cat $i; echo} | $home/local/bin/acme/spout |
grep `{echo $j | awk -F_ '{print $2}'} |
awk -F: '{OFS=":";$1 = "'$i'"; print}'>> /tmp/$pid^'.'aispell
}
sort -u /tmp/$pid^'.'aispell > $dir/$id/body
rm -f /tmp/$pid^'.'aispell
}
echo clean
}> $dir/$id/ctl


/////////////////////////////////////
/////////////////////////////////////

Is
grep '^[&#]' | sed 's/ /_/g'

better than

awk '/^[&#]/{gsub(/ /,"_"); print}'
?

I can use sed instead of awk in
awk -F: '{OFS=":";$1 = "'$i'"; print}'

but if the tag name contains the character used in sed as '/' it will
cause errors. I discovered working in this script that acme has problems
with file names containing whitespaces (I only can open it at startup
and if you click in the tag strange things happens...) so a solution would
be:

sed 's ^[^:]*: '$winname': g'
Post by erik quanstrom
and i think further simplification is possible by using
{} instead of the temporary file dance.
I don't have the knowledge to make a better redirection work with the
standard input sent to the script. Specially with the output of the
statements inside the for loop. The file funky dance was the only way I
found to make the work.
Post by erik quanstrom
and i'd imagine that that's where the bug is.
The script works fine, 'aispell /some/file', 'aispell < /some/file' and
executing '> aispell' in a acme's window with some text selected works as
expected. But executing '> aispelles' in a acme's window, where aispelles
is a function:

'fn aispelles {/aispell/path -despañol -Tutf8 -p/personal/dictionary/path $*}'

doesn't works. But this function in win or outside acme works fine
('aispelles < /some/file' and 'aispelles /some/file'), so I
don't know what is the problem.

Also, I corrected a bug related with spout, it needs \n previous to EOF, so
the last line in a input without \n now doesn't stay out of the check.

I have looked in the acme scripts a lot and I can't find any example of
getting the dot address, only in c code. If anyone knows how to make it
in rc, I will be very grateful. I think with the feature of selecting a address
in the output of the script executed to any selected text in a window,
and pointing the cursor in the right position, the work will
be finish.

I'm learning a lot about Plan9, acme, rc, awk, sed ... perhaps I should have to wait
until I have the skills and knowledge about programming and Plan9 before
posting anything like this in the list, but I thought this could be useful
for someone.

Regards,
trebol.

Continue reading on narkive:
Loading...