aboutsummaryrefslogtreecommitdiffstats
path: root/index_old.html
blob: 726709674797296c950436466017f6960a1a5a84 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">

<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<title>The Norwegian ispell-dictionary page</title>
<style type="text/css">
<!--
  body { background-color: white }
  h1, h2, h3, b { font-family: sans-serif }
  .center { align: center }
  .red { color: red }
-->
</style>

<body>
<h1 class="center">The Norwegian ispell-dictionary home page</h1>


<hr>

<p>The most important file available here contains a list of 750 000
Norwegian words. Each word is marked with a number indicating the
commonness of that word.  Compound words are hyphenated at their
compound points.  Some words are marked as belonging to a specific
classes; mathematics, oil, conservative language, 'samnorsk' etc.
Words marked with a star are allowed in Nynorsk.

<p>This file is usable to several things:
<ul>
<li>Making dictionaries for the Ispell program of different sizes,
choosing which words to include in a sensible way.

<li>Making Norwegian dictionaries for word processors that doesn't
have one, again with a sensible subset of words.

<li>Making new and better hyphenation patterns for TeX.

<li>Text-recognition (OCR) programs.

<li>Encourage the e-TeX team to implement multi-level hyphenation in
TeX.

<li>Encourage people to use frequency-information when they write
programs making suggestions for replacements for misspelled words.

</ul>

<p>Routines for the first three items on the above list is included in
the Makefiles.  The last three was too hard to implement in Make.

<H3>Requirements</H3>


<ul>
<li><b>Ispell</b><br>
For the ispell-related stuff, you need the ispell program, and you can
get it from the
<A
HREF="http://ficus-www.cs.ucla.edu/ficus-members/geoff/ispell.html">
ispell home-page</A>.  You can also find dictionaries for a lot of
languages there.

You also need the version of the look program in <a
href="ftp://ftp.win.tue.nl/pub/linux/utils/util-linux/">util-linux-2.9</a>.
Older versions have a bug which shows up when searching dictionaries
with non-English characters.  Ispell uses look to complete words
(ispell-complete-word).  If you don't plan to have a Norwegian words
file for lookup, you don't need to worry about the Look program.

<li><b>Emacs</b><br>

If you want to use ispell from Emacs, i recommend upgrading to the
latest version of <A
HREF="ftp://kdstevens.com/pub/stevens/ispell.el.gz"> ispell.el</A>.
This version supports Norwegian, and it has become clean to include
local dictionary definitions.  It is almost like the version included
in in Emacs-20.4.  There is also an add-on to ispell.el,
<A HREF="http://kaolin.unice.fr/~serrano/emacs/flyspell">
flyspell.el</A>, written by <A href="http://kaolin.unice.fr/~serrano">
Manuel Serrano</A> available, offering better `on-the-fly'
spell-checking.  An old version is included in emacs-20.3, but you
would like to have the new version with important speed improvements.

<li><b>(La)TeX</b><br>

If you want to make your own hyphenation patterns for TeX (you
probably don't), you need a version of the patgen program with greater
capacities than standard versions, e.g. you have to compile patgen
with a different patgen.ch. See the patterns/Makefile for more
information.  Almost every TeX distribution contains the patgen
program.  I recommend <a
href="ftp://ftp.rrzn.uni-hannover.de/pub/local/misc/teTeX-beta/">teTeX</a>.
If you want both kinds of hyphenation in the same TeX format, you
probably need to recompile TeX due to capacity problems.  Again, this
is easy with teTeX.

</ul>


<h3>Distribution</h3>

<p>The distribution <a
href="http://www.uio.no/~runekl/ispell-norsk-2.0.tar.gz">ispell-norsk-2.0.tar.gz</a>
(2204k) is free in the GPL sense and contains these files:

<ul>
<li><b><a href="README">README</a></b><br>
How to make ispell and emacs work with these dictionaries.

<li><b>words.norsk.sq</b><br> This file contains the Norwegian words and
the indication of their commonness compressed with the sq program.

<li><b>norsk.aff.in</b><br> A template for the affix file for the
Norwegian language.  This file is made for ispell with 64 maskbits
that understands HTML.  Most pre-made versions of ispell supports only
32 maskbits and don't understand HTML.  Use the patch and recompile
ispell, or delete the html-related stuff.

<li><b>Ispell-3.1.20.no.patch</b><br>A patch for ispell-3.1 that adds
the amsmath and breqn environments to the skip-list and fixes a bug in
buildhash.  It also makes ispell html-aware, and tries to fix `the
backslash bug'.  In addition it makes ispell suggest "- as a compound
word mark when seeing an unknown compound word, but only in TeX mode
if the dictionary is named norsk.  This is an ugly hack that works for
me.  It also implement the -r flag which is like the -a flag, but the
suggestions are printed even if the word is found in the dictionary.

<li><b>norsk.single.tex</b><br>This is a set of hyphenation patterns
for TeX that works well on non-compound words.  It is used when making
the new hyphenation patterns for TeX.  This file is basically made
from nohyph3.tex, a hyphenation file I released May 1998. But a lot of
errors have been removed by comparing its action on the single words
by the action of <a
href="ftp://ftp.dante.de/tex-archive/language/hyphenation/nohyph.tex">nohyph.tex</a>
(standard in teTeX), <a
href="ftp://ftp.dante.de/tex-archive/language/hyphenation/nohyph2.tex">nohyph2.tex</a>,
and the unreleased hyphenation patterns by Simen Gaure used at the <a
href="http://www.math.uio.no/index.html">Department of
Mathematics</a>, <a href="http://www.uio.no/index.html">University of
Oslo</a>.  I have tried to follow the rules given in <a
href="ftp://ftp.dante.de/tex-archive/language/hyphenation/nohyph.tex">nohyph.tex</a>,
at least where I find it reasonable.  Bear in mind that there is no
authoritative source for hyphenation in Norwegian.  Please get in
touch if you want to help improving the Norwegian hyphenation
patterns.

<li><b><a href="norsk.cfg">norsk.cfg</a></b><br> An
addition to Babel-3.6 for LaTeX that makes the character " active and
offers you many `different' hyphen signs.  You can say o"ppussing in
LaTeX to get correct hyphenation opp-pussing!  This functionality will
appear in Babel-3.7 for Norwegian.  Danish and Swedish have had it for
several years.

<li><b><a href="inorsk-compwordsmaybe">inorsk-compwordsmaybe</a></b><br>
Search for words in a file or from standard input that maybe should be
written in one word.  Like `matematikk l�rer' etc.

<li><b><a href="inorsk-hyphenmaybe">inorsk-hyphenmaybe</a></b><br>
Search for words in a file from from standard input that the Norwegian
hyphenation patterns from this distribution might not hyphenate
properly.  Incorrect hyphenation of words not printed is considered to
be a bug in the patterns.  There is only a finite number of them.

<li><b>Makefile</b><br> This file contains rules for making
dictionaries for ispell and lists of the most common words for dumb
word processors.  There is also a Makefile in the patterns directory
for making hyphenation patterns.

<li><b><a href="nohyphbc.tex">nohyphbc.tex</a></b>, <b><a
href="nohyphb.tex">nohyphb.tex</a></b><br> This
is the hyphenation patterns for TeX.  The file nohyphbc.tex hyphenates
only at compound points.  The nohyphb.tex hyphenates each component of
a word too, but avoiding to hyphenate 'near' compound points.  I think
'bar-nepsykologen' looks really bad.  Too bad TeX doesn't support
multi-level hyphenation yet.<br>


The naming of the files follow the paradigm in Babel; if a replacement
for a file foo.bar is offered, it is named foob.bar, where the b
stands for big.

These new patterns easily outpreforms those available before, mostly
because of better compound word hyphenation.  For reference I have
made lists of about 2000 compound word errors made by previous
patterns: <a href="err.nohyph">err.nokyph</a> and <a href="err.nohyph2">
err.nokyph2</a>.
<br>

The size of the patterns can be argued over.  The patterns are copied
into each format file, thus occupying some disk space.  They also
limit the number of languages one can load hyphenation patterns for on
most TeX systems.  But size considerations has become less important
recent years, so I prefer to focus on getting things right, not small.
It is also possible to recompile teTeX such that there is more room
for hyphenation patterns, but the patterns take up more memory then.
There is surely a lot of unnessesary structure within the hyphenation
patterns , but it is very time-consuming to remove.  The file
patterns/Makefile can be configured, such that one can make smaller
sets of patterns, taking only the most common words into
concideration.  Everyone is invited to play.

<li><b>COPYING</b><br> The GNU general public license.

</ul>

<p>


<h3>Changes</h3>

<p>There has been a lot of changes since version 1.1a.  The quality has
improved a lot, and the structure of the distribution is completely
new.  Therefore i choose not to make the previous versions available
from this site.

<p>Here is a rough summary of the changes:

<ul>

<li>New distribution format

<li>Support for Nynorsk

<li>Commonness indicator for each word from Bokm�l

<li>Words are hyphenated at their compound points

<li>A lot of common words added, especially compound words

<li>Makefile completly rewritten.  It is possible to configure the
size of the dictionary for ispell without beaking the munching.

<li>Makefile to make hyphenation patterns for TeX added

<li>The pregenerated TeX patterns are included in the distribution.

<li>Controlled compoundwords support added.  This includes affix file
updates.

<li>Some uncommon and misspelled words removed

<li>Affix file updated for html.  This will only work if you use the patch.

</ul>

<h3>Todo list</h3>

<ul>

<li>Remove/mark uncommon words that are close to common words.  If you
type 're' you probably meant 'er', even if 're' is a valid word.

<li>There are too many words with commonness 0.  Split this group in
two.

<li>Some words in the basic category belongs in special categories.
When making a small dictionary with all words from mathematics, many
such words are missing, since they are in the basic category.  They
should be moved.

<li>Make ispell sort the suggested replacements for misspelled words
by commonness of the suggested words.  One (easy) way to do this is to
make an external file containing the most common words, and make
ispell look into that file each time it has more than one suggestion.
Or the file could be read into memory.  (I don't think frequency
information is representable within the root/affix structure, since
one flag can represent multiple words.)  This would slow ispell down a
little bit, but only when it makes suggestions.  If you would like to
help with this, please get in touch.

</ul>

<p>Comments, suggestions and bug-reports to <a
href="mailto:runekl@opoint.com">runekl@opoint.com</a>.  If you have
or want to make a correct dictionary from some field of knowledge, i
would like to include it in the next release.  See the <a
href="README">README</a> file for some
suggestions about how to get started.  All you need is a large amount
of Norwegian text from the field in question and some time to organize
the dictionary.<p>

<a target="_top" href="http://v.extreme-dm.com/?login=runekl">
<img src="http://v1.extreme-dm.com/i.gif" height=38
border=0 width=41 alt=""></a><script language="javascript"><!--
an=navigator.appName;d=document;function
pr(){d.write("<img src=\"http://v0.extreme-dm.com",
"/0.gif?tag=runekl&j=y&srw="+srw+"&srb="+srb+"&",
"rs="+r+"&l="+escape(d.referrer)+"\" height=1 ",
"width=1>");}srb="na";srw="na";//-->
</script><script language="javascript1.2"><!--
s=screen;srw=s.width;an!="Netscape"?
srb=s.colorDepth:srb=s.pixelDepth;//-->
</script><script language="javascript"><!--
r=41;d.images?r=d.im.width:z=0;pr();//-->
</script><noscript><img height=1 width=1 alt=""
src="http://v0.extreme-dm.com/0.gif?tag=runekl&j=n"></noscript>
</html>