diff options
Diffstat (limited to 'JLanguageTool/src/resource/README.txt')
-rw-r--r-- | JLanguageTool/src/resource/README.txt | 30 |
1 files changed, 30 insertions, 0 deletions
diff --git a/JLanguageTool/src/resource/README.txt b/JLanguageTool/src/resource/README.txt new file mode 100644 index 0000000..b6c5efa --- /dev/null +++ b/JLanguageTool/src/resource/README.txt @@ -0,0 +1,30 @@ + +The *.dict files in the sub directories are produced by the +fsa tools (http://www.eti.pg.gda.pl/katedry/kiw/pracownicy/Jan.Daciuk/personal/fsa.html). + +To export *.dict data to plain text: +export LANG=C +fsa_prefix -a -d ~/workspace/JLanguageTool/src/resource/de/german.dict >export + +Import exported data back again to *.dict: +export LANG=C +sort -u export | fsa_build -O -o output.dict + +If you want to edit the data in the tabbed format, use de_morph_data.awk script +from fsa: + +gawk -f de_morph_data.awk < export > export.txt + +To compile the dictionary into binary form, you will have to use +morph_data.awk again, i.e.: + +export LANG=C +gawk -f morph_data.awk export.txt | sort -u | fsa_build -O -o output.dic + +Note: the .dict files are accompanied with .info files that describe the +encoding used and if there was infix compression used as well. In the +case of infix compression, you need to use de_morph_infix.awk script +instead. Recompiling will require using morph_infix.awk + +See http://languagetool.wikidot.com/developing-a-tagger-dictionary +for more information |