diff options
Diffstat (limited to 'JLanguageTool/README.txt')
-rw-r--r-- | JLanguageTool/README.txt | 276 |
1 files changed, 276 insertions, 0 deletions
diff --git a/JLanguageTool/README.txt b/JLanguageTool/README.txt new file mode 100644 index 0000000..dc9207f --- /dev/null +++ b/JLanguageTool/README.txt @@ -0,0 +1,276 @@ +LanguageTool, a proof-reading tool for English, German, Polish, +French, Dutch, Slovenian, Russian, Romanian, Italian, Danish, and Catalan with +initial support for Belarusian, Esperanto, Galician, Icelandic, Lithuanian, +Malayalam, Slovak, Spanish, Swedish, and Ukrainian + +Copyright (C) 2005-2011 Daniel Naber (naber at danielnaber de) +Version ###VERSION###, ###DATE### +Homepage: http://www.languagetool.org + +Requirements: + -Java 1.5 or later (Sun Java or IcedTea; GIJ is not supported) + -For OpenOffice.org integration, OpenOffice 3.0.1 or later. + +Usage: + -To integrate LanguageTool into OpenOffice.org, you + can use two methods: + + 1. Double-click LanguageTool-###VERSION###.oxt. If you + have OpenOffice.org 3.0.1 integrated into the environment, + the extension should start installing. Follow the on-screen + instructions. + + 2. If the above method doesn't work, call Tools > Extension + Manager > Add... in OpenOffice.org and browse for the + LanguageTool-###VERSION###.oxt file. + + Close and restart OpenOffice.org Writer. Remember to close the + OpenOffice.org QuickStarter as well if you use it. Type text with + an error, e.g. "This is an test." - make sure the text language + is set to English for this example. + You should see a blue underline under the word "an". Opening + the context menu with the right mouse button offers you a + description of the error and, if available, a correction. + + Note that there will also be a new menu item "LanguageTool" + under the "Tools" menu which you might need to use if + on-the-fly checking doesn't properly work. If the native + spelling and grammar dialog doesn't check grammar, make + sure that the check box "Check Grammar" is checked in it + (if the window closes because of no mistakes in the document, + simply make any spelling mistake to make it open for a longer + time, and check the box). Check also if LanguageTool is visible + under "Grammar" in Tools > Options > Language Settings > Spelling + for your language. Note: you can disable the grammar check without + uninstalling LanguageTool simply by clearing the check box next to + LanguageTool in the same dialog. + + Please see http://www.languagetool.org/#commonproblems if you + experience problems + + -To use the simple demo GUI, first rename the .oxt file + to zip, then unzip it to a new directory and double click on + the LanguageToolGUI.jar file or call + java -jar LanguageToolGUI.jar + + -To check plain text files from the command line: + java -jar LanguageTool.jar <filename> + +Known bugs: + -OpenOffice.org integration: + -doesn't work correctly with documents that contain revisions + -general: + -for some rules there may be a lot of false alarms, i.e., LanguageTool complains + about text which is actually correct + -Java 1.5: + -you cannot display the configuration dialog box due to the bug + present in GridBagLayout that limits the number of items displayed; + the languages affected are currently French and Polish. + +TODO: + -see if java.text.RuleBasedBreakIterator would be better for word + tokenization than the current scheme (especially check performance) + -see http://papyr.com/hypertextbooks/grammar/gramchek.htm + -update languagetool.xml.update automatically (i.e. replace @version@) + -add more redundancy rules, see e.g. + http://grammar.about.com/od/words/a/redundancies.htm?p=1 + -use hunspell via jni, see http://tkltrans.sourceforge.net/spell.htm and + http://tkltrans.sourceforge.net/magyar/huncheck.tar.gz + -finish "Add language..." work + -put licenses in extra subdir + -check if some rules from xml-copy-editor.sourceforge.net are useful for us + (see its source in the xmlcopyeditor-1.0.9.5/src/rulesets directory) + -make the dist-src work (= compile out of the box) + -add a layer to use the simple XML so the LanguageTool GUIs can use An Gramadoir? + -stand-alone GUI: mark errors in upper part of window + -Auto-reload rules if file timestamp has changed? + -enable style registers and/or rule classes + -clean up rule descriptions so that they coherently contain the error or the rule + (e.g., "did + baseform" vs. "did + non-baseform") + -add more rules, especially agreement stuff + -fix AvsAnRule: use 'an' before any abbreviation that begins with a vowel + sound (like 'an MSc'). + -add a simple sentence/word complexity test like that: http://www.ooomacros.org/user.php#111318 + -German rule: Vergleichs vs Vergleiches etc -> only one variant per document should be used + -create abstract SentenceRule and TextRule classes to get rid of reset() method? + -check if there's a nice design that lets us extend PatternRule and PatternRuleLoader + to make them more powerful, but without having all features in these classes + -add more docs and examples + -Make adding language possible without changing the LanguageTool core code: + -make rule loading dynamic by using reflection (in progress) + -create the list of languages using reflection (add a LanguageInformation + interface that each language needs to implement) + -Add an "Add language pack..." menu to both the stand-alone version and the + OpenOffice.org version + -create a general mechanism for setting and storing rule parameters (including + Java rules and XML rules) like sensitivity level + -create a Firefox/Thunderbird extension using some of SpellBound extension code + -German: + "*Ich kaufe den Hund einen Knochen" (den -> dem), aber: + "*Ich kaufe dem Hund." (dem -> den) + -see if it's feasible to check bitexts (especially looking for false friends), + for example for checking translation files in xliff format + -see "TODO" / "FIXME" in the source: + find . -iname "*.java" -exec egrep -H "TODO|FIXME" {} \; + -... + +------------------------------------------------ + +Using LanguageTool from .NET: + + Thanks to IKVM (http://www.ikvm.net/) you can easily turn LanguageTool + into a .NET exe or dll (without the GUI and the OpenOffice.org integration). + Just adapt these commands to you local path names (this example shows using mono): + + export MONO_PATH=/path/to/ikvm/bin + mono /path/to/ikvm/bin/ikvmc.exe -target:library -r:/path/to/ikvm/bin/IKVM.GNU.Classpath.dll libs/morfologik-stemming-nodict-1.1.14.jar + mono /path/to/ikvm/bin/ikvmc.exe -target:library -r:/path/to/ikvm/bin/IKVM.GNU.Classpath.dll libs/jWordSplitter.jar + mono /path/to/ikvm/bin/ikvmc.exe -r:/path/to/ikvm/bin/IKVM.GNU.Classpath.dll -r:morfologik-stemming-nodict-1.1.14.dll -r:jWordSplitter.dll LanguageTool.jar + + However, the resulting LanguageTool.exe has not been tested much yet. You can expect + problems with resource loading (path names are not recognized properly). + +------------------------------------------------ + +License: + + Unless otherwise noted, this software is distributed under + the LGPL, see file COPYING.txt + + See README-license.txt for the copyright of the external libraries + + German: + The German data for part-of-speech tagging is taken from Morphy + (http://www.wolfganglezius.de/doku.php?id=public:cl:morphy) + under Creative Commons Attribution-Share Alike 3.0 + + Polish: + The Polish data for part-of-speech tagging is from Morfologik project, + licensed as LGPL (see http://morfologik.blogspot.com). + + Italian: + The Italian data for part-of-speech tagging is taken from Morph-it!, + licensed under the Creative Commons Attribution ShareAlike 2.0 License + and the GNU Lesser General Public License (LGPL) + (see http://sslmitdev-online.sslmit.unibo.it/linguistics/morph-it.php). + + Romanian: + The Romanian data for part-of-speech tagging is developed by Ionuț Păduraru + (http://www.archeus.ro). It's being released here on LGPL license. + + Slovak: + The Slovak data were created by Zdenko Podobný based on Slovak National + Corpus data (http://korpus.juls.savba.sk/). They are released here on + LGPL license. + + Spanish: + The dictionary was mainly obtained from the Freeling project. + http://devel.cpl.upc.edu/freeling/svn/latest/freeling/data/es/dicc.src + http://garraf.epsevg.upc.es/freeling/ + It is released under the GNU General Public License. + + Dutch: + The Dutch data are based on Alpino parser for Dutch by Gertjan van + Noord and is released on LGPL license. Alpino is available at + http://www.let.rug.nl/~vannoord/alp/Alpino/. + + Russian: + Russian dictionary originally developed by www.aot.ru and licensed under LGPL. + http://www.aot.ru/download.php file rus-src-morph.tar.gz + It was partially converted to fsa format in 2008 by Yakov. + + Swedish: + The Swedish data are based on DSSO. The Initial Developer of the Original Code is Göran Andersson. + Contributor(s): + Tom Westerberg <tweg@welho.com> + Niklas Johansson <sleeping.pillow@gmail.com> + The Swedish Dictionary may be used under the terms of the GNU Lesser General Public License Version 2.1 or later + (the "LGPL"). + http://dsso.se + + French: + The French data for part-of-speech tagging are from the Dicollecte project. + They are made available here under LGPL. See detailed information in + resource/fr/README_lexique.txt + + Galician: + The Galician data for part-of-speech tagging were created by Susana Sotelo + Docio based on Freeling dictionary and henceforth licensed under GPL. + +------------------------------------------------ + + English: + The English data for part-of-speech tagging are based on: + + 1) Automatically Generated Inflection Database (AGID) version 4, + Copyright 2000-2003 by Kevin Atkinson <kevina@gnu.org> + The part-of-speech database is taken from Alan Beale 2of12id + and the WordNet database which is under the following copyright: + + This software and database is being provided to you, the LICENSEE, by + Princeton University under the following license. By obtaining, using + and/or copying this software and database, you agree that you have + read, understood, and will comply with these terms and conditions.: + + Permission to use, copy, modify and distribute this software and + database and its documentation for any purpose and without fee or + royalty is hereby granted, provided that you agree to comply with + the following copyright notice and statements, including the disclaimer, + and that the same appear on ALL copies of the software, database and + documentation, including modifications that you make for internal + use or for distribution. + + WordNet 1.6 Copyright 1997 by Princeton University. All rights reserved. + + THIS SOFTWARE AND DATABASE IS PROVIDED "AS IS" AND PRINCETON + UNIVERSITY MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR + IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, PRINCETON + UNIVERSITY MAKES NO REPRESENTATIONS OR WARRANTIES OF MERCHANT- + ABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE OR THAT THE USE + OF THE LICENSED SOFTWARE, DATABASE OR DOCUMENTATION WILL NOT + INFRINGE ANY THIRD PARTY PATENTS, COPYRIGHTS, TRADEMARKS OR + OTHER RIGHTS. + + The name of Princeton University or Princeton may not be used in + advertising or publicity pertaining to distribution of the software + and/or database. Title to copyright in this software, database and + any associated documentation shall at all times remain with + Princeton University and LICENSEE agrees to preserve same. + + Alan Beale 2of12id.txt is indirectly derived from the Moby part-of-speech + database and the WordNet database. The Moby part-of-speech is in the + public domain: + + The Moby lexicon project is complete and has + been place into the public domain. Use, sell, + rework, excerpt and use in any way on any platform. + + Placing this material on internal or public servers is + also encouraged. The compiler is not aware of any + export restrictions so freely distribute world-wide. + + You can verify the public domain status by contacting + + Grady Ward + 3449 Martha Ct. + Arcata, CA 95521-4884 + + grady@netcom.com + grady@northcoast.com + + For more information on wordlists used, see agid-readme.txt. + + 2) Part Of Speech Database, compiled by Kevin Atkinson + <kevina@users.sourceforge.net> + The part-of-speech.txt file contains is a combination of + "Moby (tm) Part-of-Speech II" and the WordNet database (see above and + pos-readme.txt). + + 3) 2of12inf wordlist, released to public domain, + see 12dicts-readme.html. + + 4) Public domain Moby wordlists were used also for generating + POS tag information for common proper names. + + For more information, see the scripts in the source directory + en/resource/. |