diff options
Diffstat (limited to 'index_old.html')
-rw-r--r-- | index_old.html | 298 |
1 files changed, 298 insertions, 0 deletions
diff --git a/index_old.html b/index_old.html new file mode 100644 index 0000000..7267096 --- /dev/null +++ b/index_old.html @@ -0,0 +1,298 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> + +<html> +<head> +<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> +<title>The Norwegian ispell-dictionary page</title> +<style type="text/css"> +<!-- + body { background-color: white } + h1, h2, h3, b { font-family: sans-serif } + .center { align: center } + .red { color: red } +--> +</style> + +<body> +<h1 class="center">The Norwegian ispell-dictionary home page</h1> + + +<hr> + +<p>The most important file available here contains a list of 750 000 +Norwegian words. Each word is marked with a number indicating the +commonness of that word. Compound words are hyphenated at their +compound points. Some words are marked as belonging to a specific +classes; mathematics, oil, conservative language, 'samnorsk' etc. +Words marked with a star are allowed in Nynorsk. + +<p>This file is usable to several things: +<ul> +<li>Making dictionaries for the Ispell program of different sizes, +choosing which words to include in a sensible way. + +<li>Making Norwegian dictionaries for word processors that doesn't +have one, again with a sensible subset of words. + +<li>Making new and better hyphenation patterns for TeX. + +<li>Text-recognition (OCR) programs. + +<li>Encourage the e-TeX team to implement multi-level hyphenation in +TeX. + +<li>Encourage people to use frequency-information when they write +programs making suggestions for replacements for misspelled words. + +</ul> + +<p>Routines for the first three items on the above list is included in +the Makefiles. The last three was too hard to implement in Make. + +<H3>Requirements</H3> + + +<ul> +<li><b>Ispell</b><br> +For the ispell-related stuff, you need the ispell program, and you can +get it from the +<A +HREF="http://ficus-www.cs.ucla.edu/ficus-members/geoff/ispell.html"> +ispell home-page</A>. You can also find dictionaries for a lot of +languages there. + +You also need the version of the look program in <a +href="ftp://ftp.win.tue.nl/pub/linux/utils/util-linux/">util-linux-2.9</a>. +Older versions have a bug which shows up when searching dictionaries +with non-English characters. Ispell uses look to complete words +(ispell-complete-word). If you don't plan to have a Norwegian words +file for lookup, you don't need to worry about the Look program. + +<li><b>Emacs</b><br> + +If you want to use ispell from Emacs, i recommend upgrading to the +latest version of <A +HREF="ftp://kdstevens.com/pub/stevens/ispell.el.gz"> ispell.el</A>. +This version supports Norwegian, and it has become clean to include +local dictionary definitions. It is almost like the version included +in in Emacs-20.4. There is also an add-on to ispell.el, +<A HREF="http://kaolin.unice.fr/~serrano/emacs/flyspell"> +flyspell.el</A>, written by <A href="http://kaolin.unice.fr/~serrano"> +Manuel Serrano</A> available, offering better `on-the-fly' +spell-checking. An old version is included in emacs-20.3, but you +would like to have the new version with important speed improvements. + +<li><b>(La)TeX</b><br> + +If you want to make your own hyphenation patterns for TeX (you +probably don't), you need a version of the patgen program with greater +capacities than standard versions, e.g. you have to compile patgen +with a different patgen.ch. See the patterns/Makefile for more +information. Almost every TeX distribution contains the patgen +program. I recommend <a +href="ftp://ftp.rrzn.uni-hannover.de/pub/local/misc/teTeX-beta/">teTeX</a>. +If you want both kinds of hyphenation in the same TeX format, you +probably need to recompile TeX due to capacity problems. Again, this +is easy with teTeX. + +</ul> + + +<h3>Distribution</h3> + +<p>The distribution <a +href="http://www.uio.no/~runekl/ispell-norsk-2.0.tar.gz">ispell-norsk-2.0.tar.gz</a> +(2204k) is free in the GPL sense and contains these files: + +<ul> +<li><b><a href="README">README</a></b><br> +How to make ispell and emacs work with these dictionaries. + +<li><b>words.norsk.sq</b><br> This file contains the Norwegian words and +the indication of their commonness compressed with the sq program. + +<li><b>norsk.aff.in</b><br> A template for the affix file for the +Norwegian language. This file is made for ispell with 64 maskbits +that understands HTML. Most pre-made versions of ispell supports only +32 maskbits and don't understand HTML. Use the patch and recompile +ispell, or delete the html-related stuff. + +<li><b>Ispell-3.1.20.no.patch</b><br>A patch for ispell-3.1 that adds +the amsmath and breqn environments to the skip-list and fixes a bug in +buildhash. It also makes ispell html-aware, and tries to fix `the +backslash bug'. In addition it makes ispell suggest "- as a compound +word mark when seeing an unknown compound word, but only in TeX mode +if the dictionary is named norsk. This is an ugly hack that works for +me. It also implement the -r flag which is like the -a flag, but the +suggestions are printed even if the word is found in the dictionary. + +<li><b>norsk.single.tex</b><br>This is a set of hyphenation patterns +for TeX that works well on non-compound words. It is used when making +the new hyphenation patterns for TeX. This file is basically made +from nohyph3.tex, a hyphenation file I released May 1998. But a lot of +errors have been removed by comparing its action on the single words +by the action of <a +href="ftp://ftp.dante.de/tex-archive/language/hyphenation/nohyph.tex">nohyph.tex</a> +(standard in teTeX), <a +href="ftp://ftp.dante.de/tex-archive/language/hyphenation/nohyph2.tex">nohyph2.tex</a>, +and the unreleased hyphenation patterns by Simen Gaure used at the <a +href="http://www.math.uio.no/index.html">Department of +Mathematics</a>, <a href="http://www.uio.no/index.html">University of +Oslo</a>. I have tried to follow the rules given in <a +href="ftp://ftp.dante.de/tex-archive/language/hyphenation/nohyph.tex">nohyph.tex</a>, +at least where I find it reasonable. Bear in mind that there is no +authoritative source for hyphenation in Norwegian. Please get in +touch if you want to help improving the Norwegian hyphenation +patterns. + +<li><b><a href="norsk.cfg">norsk.cfg</a></b><br> An +addition to Babel-3.6 for LaTeX that makes the character " active and +offers you many `different' hyphen signs. You can say o"ppussing in +LaTeX to get correct hyphenation opp-pussing! This functionality will +appear in Babel-3.7 for Norwegian. Danish and Swedish have had it for +several years. + +<li><b><a href="inorsk-compwordsmaybe">inorsk-compwordsmaybe</a></b><br> +Search for words in a file or from standard input that maybe should be +written in one word. Like `matematikk lærer' etc. + +<li><b><a href="inorsk-hyphenmaybe">inorsk-hyphenmaybe</a></b><br> +Search for words in a file from from standard input that the Norwegian +hyphenation patterns from this distribution might not hyphenate +properly. Incorrect hyphenation of words not printed is considered to +be a bug in the patterns. There is only a finite number of them. + +<li><b>Makefile</b><br> This file contains rules for making +dictionaries for ispell and lists of the most common words for dumb +word processors. There is also a Makefile in the patterns directory +for making hyphenation patterns. + +<li><b><a href="nohyphbc.tex">nohyphbc.tex</a></b>, <b><a +href="nohyphb.tex">nohyphb.tex</a></b><br> This +is the hyphenation patterns for TeX. The file nohyphbc.tex hyphenates +only at compound points. The nohyphb.tex hyphenates each component of +a word too, but avoiding to hyphenate 'near' compound points. I think +'bar-nepsykologen' looks really bad. Too bad TeX doesn't support +multi-level hyphenation yet.<br> + + +The naming of the files follow the paradigm in Babel; if a replacement +for a file foo.bar is offered, it is named foob.bar, where the b +stands for big. + +These new patterns easily outpreforms those available before, mostly +because of better compound word hyphenation. For reference I have +made lists of about 2000 compound word errors made by previous +patterns: <a href="err.nohyph">err.nokyph</a> and <a href="err.nohyph2"> +err.nokyph2</a>. +<br> + +The size of the patterns can be argued over. The patterns are copied +into each format file, thus occupying some disk space. They also +limit the number of languages one can load hyphenation patterns for on +most TeX systems. But size considerations has become less important +recent years, so I prefer to focus on getting things right, not small. +It is also possible to recompile teTeX such that there is more room +for hyphenation patterns, but the patterns take up more memory then. +There is surely a lot of unnessesary structure within the hyphenation +patterns , but it is very time-consuming to remove. The file +patterns/Makefile can be configured, such that one can make smaller +sets of patterns, taking only the most common words into +concideration. Everyone is invited to play. + +<li><b>COPYING</b><br> The GNU general public license. + +</ul> + +<p> + + +<h3>Changes</h3> + +<p>There has been a lot of changes since version 1.1a. The quality has +improved a lot, and the structure of the distribution is completely +new. Therefore i choose not to make the previous versions available +from this site. + +<p>Here is a rough summary of the changes: + +<ul> + +<li>New distribution format + +<li>Support for Nynorsk + +<li>Commonness indicator for each word from Bokmål + +<li>Words are hyphenated at their compound points + +<li>A lot of common words added, especially compound words + +<li>Makefile completly rewritten. It is possible to configure the +size of the dictionary for ispell without beaking the munching. + +<li>Makefile to make hyphenation patterns for TeX added + +<li>The pregenerated TeX patterns are included in the distribution. + +<li>Controlled compoundwords support added. This includes affix file +updates. + +<li>Some uncommon and misspelled words removed + +<li>Affix file updated for html. This will only work if you use the patch. + +</ul> + +<h3>Todo list</h3> + +<ul> + +<li>Remove/mark uncommon words that are close to common words. If you +type 're' you probably meant 'er', even if 're' is a valid word. + +<li>There are too many words with commonness 0. Split this group in +two. + +<li>Some words in the basic category belongs in special categories. +When making a small dictionary with all words from mathematics, many +such words are missing, since they are in the basic category. They +should be moved. + +<li>Make ispell sort the suggested replacements for misspelled words +by commonness of the suggested words. One (easy) way to do this is to +make an external file containing the most common words, and make +ispell look into that file each time it has more than one suggestion. +Or the file could be read into memory. (I don't think frequency +information is representable within the root/affix structure, since +one flag can represent multiple words.) This would slow ispell down a +little bit, but only when it makes suggestions. If you would like to +help with this, please get in touch. + +</ul> + +<p>Comments, suggestions and bug-reports to <a +href="mailto:runekl@opoint.com">runekl@opoint.com</a>. If you have +or want to make a correct dictionary from some field of knowledge, i +would like to include it in the next release. See the <a +href="README">README</a> file for some +suggestions about how to get started. All you need is a large amount +of Norwegian text from the field in question and some time to organize +the dictionary.<p> + +<a target="_top" href="http://v.extreme-dm.com/?login=runekl"> +<img src="http://v1.extreme-dm.com/i.gif" height=38 +border=0 width=41 alt=""></a><script language="javascript"><!-- +an=navigator.appName;d=document;function +pr(){d.write("<img src=\"http://v0.extreme-dm.com", +"/0.gif?tag=runekl&j=y&srw="+srw+"&srb="+srb+"&", +"rs="+r+"&l="+escape(d.referrer)+"\" height=1 ", +"width=1>");}srb="na";srw="na";//--> +</script><script language="javascript1.2"><!-- +s=screen;srw=s.width;an!="Netscape"? +srb=s.colorDepth:srb=s.pixelDepth;//--> +</script><script language="javascript"><!-- +r=41;d.images?r=d.im.width:z=0;pr();//--> +</script><noscript><img height=1 width=1 alt="" +src="http://v0.extreme-dm.com/0.gif?tag=runekl&j=n"></noscript> +</html> |