summaryrefslogtreecommitdiffstats
path: root/JLanguageTool/README.txt
diff options
context:
space:
mode:
Diffstat (limited to 'JLanguageTool/README.txt')
-rw-r--r--JLanguageTool/README.txt276
1 files changed, 276 insertions, 0 deletions
diff --git a/JLanguageTool/README.txt b/JLanguageTool/README.txt
new file mode 100644
index 0000000..dc9207f
--- /dev/null
+++ b/JLanguageTool/README.txt
@@ -0,0 +1,276 @@
+LanguageTool, a proof-reading tool for English, German, Polish,
+French, Dutch, Slovenian, Russian, Romanian, Italian, Danish, and Catalan with
+initial support for Belarusian, Esperanto, Galician, Icelandic, Lithuanian,
+Malayalam, Slovak, Spanish, Swedish, and Ukrainian
+
+Copyright (C) 2005-2011 Daniel Naber (naber at danielnaber de)
+Version ###VERSION###, ###DATE###
+Homepage: http://www.languagetool.org
+
+Requirements:
+ -Java 1.5 or later (Sun Java or IcedTea; GIJ is not supported)
+ -For OpenOffice.org integration, OpenOffice 3.0.1 or later.
+
+Usage:
+ -To integrate LanguageTool into OpenOffice.org, you
+ can use two methods:
+
+ 1. Double-click LanguageTool-###VERSION###.oxt. If you
+ have OpenOffice.org 3.0.1 integrated into the environment,
+ the extension should start installing. Follow the on-screen
+ instructions.
+
+ 2. If the above method doesn't work, call Tools > Extension
+ Manager > Add... in OpenOffice.org and browse for the
+ LanguageTool-###VERSION###.oxt file.
+
+ Close and restart OpenOffice.org Writer. Remember to close the
+ OpenOffice.org QuickStarter as well if you use it. Type text with
+ an error, e.g. "This is an test." - make sure the text language
+ is set to English for this example.
+ You should see a blue underline under the word "an". Opening
+ the context menu with the right mouse button offers you a
+ description of the error and, if available, a correction.
+
+ Note that there will also be a new menu item "LanguageTool"
+ under the "Tools" menu which you might need to use if
+ on-the-fly checking doesn't properly work. If the native
+ spelling and grammar dialog doesn't check grammar, make
+ sure that the check box "Check Grammar" is checked in it
+ (if the window closes because of no mistakes in the document,
+ simply make any spelling mistake to make it open for a longer
+ time, and check the box). Check also if LanguageTool is visible
+ under "Grammar" in Tools > Options > Language Settings > Spelling
+ for your language. Note: you can disable the grammar check without
+ uninstalling LanguageTool simply by clearing the check box next to
+ LanguageTool in the same dialog.
+
+ Please see http://www.languagetool.org/#commonproblems if you
+ experience problems
+
+ -To use the simple demo GUI, first rename the .oxt file
+ to zip, then unzip it to a new directory and double click on
+ the LanguageToolGUI.jar file or call
+ java -jar LanguageToolGUI.jar
+
+ -To check plain text files from the command line:
+ java -jar LanguageTool.jar <filename>
+
+Known bugs:
+ -OpenOffice.org integration:
+ -doesn't work correctly with documents that contain revisions
+ -general:
+ -for some rules there may be a lot of false alarms, i.e., LanguageTool complains
+ about text which is actually correct
+ -Java 1.5:
+ -you cannot display the configuration dialog box due to the bug
+ present in GridBagLayout that limits the number of items displayed;
+ the languages affected are currently French and Polish.
+
+TODO:
+ -see if java.text.RuleBasedBreakIterator would be better for word
+ tokenization than the current scheme (especially check performance)
+ -see http://papyr.com/hypertextbooks/grammar/gramchek.htm
+ -update languagetool.xml.update automatically (i.e. replace @version@)
+ -add more redundancy rules, see e.g.
+ http://grammar.about.com/od/words/a/redundancies.htm?p=1
+ -use hunspell via jni, see http://tkltrans.sourceforge.net/spell.htm and
+ http://tkltrans.sourceforge.net/magyar/huncheck.tar.gz
+ -finish "Add language..." work
+ -put licenses in extra subdir
+ -check if some rules from xml-copy-editor.sourceforge.net are useful for us
+ (see its source in the xmlcopyeditor-1.0.9.5/src/rulesets directory)
+ -make the dist-src work (= compile out of the box)
+ -add a layer to use the simple XML so the LanguageTool GUIs can use An Gramadoir?
+ -stand-alone GUI: mark errors in upper part of window
+ -Auto-reload rules if file timestamp has changed?
+ -enable style registers and/or rule classes
+ -clean up rule descriptions so that they coherently contain the error or the rule
+ (e.g., "did + baseform" vs. "did + non-baseform")
+ -add more rules, especially agreement stuff
+ -fix AvsAnRule: use 'an' before any abbreviation that begins with a vowel
+ sound (like 'an MSc').
+ -add a simple sentence/word complexity test like that: http://www.ooomacros.org/user.php#111318
+ -German rule: Vergleichs vs Vergleiches etc -> only one variant per document should be used
+ -create abstract SentenceRule and TextRule classes to get rid of reset() method?
+ -check if there's a nice design that lets us extend PatternRule and PatternRuleLoader
+ to make them more powerful, but without having all features in these classes
+ -add more docs and examples
+ -Make adding language possible without changing the LanguageTool core code:
+ -make rule loading dynamic by using reflection (in progress)
+ -create the list of languages using reflection (add a LanguageInformation
+ interface that each language needs to implement)
+ -Add an "Add language pack..." menu to both the stand-alone version and the
+ OpenOffice.org version
+ -create a general mechanism for setting and storing rule parameters (including
+ Java rules and XML rules) like sensitivity level
+ -create a Firefox/Thunderbird extension using some of SpellBound extension code
+ -German:
+ "*Ich kaufe den Hund einen Knochen" (den -> dem), aber:
+ "*Ich kaufe dem Hund." (dem -> den)
+ -see if it's feasible to check bitexts (especially looking for false friends),
+ for example for checking translation files in xliff format
+ -see "TODO" / "FIXME" in the source:
+ find . -iname "*.java" -exec egrep -H "TODO|FIXME" {} \;
+ -...
+
+------------------------------------------------
+
+Using LanguageTool from .NET:
+
+ Thanks to IKVM (http://www.ikvm.net/) you can easily turn LanguageTool
+ into a .NET exe or dll (without the GUI and the OpenOffice.org integration).
+ Just adapt these commands to you local path names (this example shows using mono):
+
+ export MONO_PATH=/path/to/ikvm/bin
+ mono /path/to/ikvm/bin/ikvmc.exe -target:library -r:/path/to/ikvm/bin/IKVM.GNU.Classpath.dll libs/morfologik-stemming-nodict-1.1.14.jar
+ mono /path/to/ikvm/bin/ikvmc.exe -target:library -r:/path/to/ikvm/bin/IKVM.GNU.Classpath.dll libs/jWordSplitter.jar
+ mono /path/to/ikvm/bin/ikvmc.exe -r:/path/to/ikvm/bin/IKVM.GNU.Classpath.dll -r:morfologik-stemming-nodict-1.1.14.dll -r:jWordSplitter.dll LanguageTool.jar
+
+ However, the resulting LanguageTool.exe has not been tested much yet. You can expect
+ problems with resource loading (path names are not recognized properly).
+
+------------------------------------------------
+
+License:
+
+ Unless otherwise noted, this software is distributed under
+ the LGPL, see file COPYING.txt
+
+ See README-license.txt for the copyright of the external libraries
+
+ German:
+ The German data for part-of-speech tagging is taken from Morphy
+ (http://www.wolfganglezius.de/doku.php?id=public:cl:morphy)
+ under Creative Commons Attribution-Share Alike 3.0
+
+ Polish:
+ The Polish data for part-of-speech tagging is from Morfologik project,
+ licensed as LGPL (see http://morfologik.blogspot.com).
+
+ Italian:
+ The Italian data for part-of-speech tagging is taken from Morph-it!,
+ licensed under the Creative Commons Attribution ShareAlike 2.0 License
+ and the GNU Lesser General Public License (LGPL)
+ (see http://sslmitdev-online.sslmit.unibo.it/linguistics/morph-it.php).
+
+ Romanian:
+ The Romanian data for part-of-speech tagging is developed by Ionuț Păduraru
+ (http://www.archeus.ro). It's being released here on LGPL license.
+
+ Slovak:
+ The Slovak data were created by Zdenko Podobný based on Slovak National
+ Corpus data (http://korpus.juls.savba.sk/). They are released here on
+ LGPL license.
+
+ Spanish:
+ The dictionary was mainly obtained from the Freeling project.
+ http://devel.cpl.upc.edu/freeling/svn/latest/freeling/data/es/dicc.src
+ http://garraf.epsevg.upc.es/freeling/
+ It is released under the GNU General Public License.
+
+ Dutch:
+ The Dutch data are based on Alpino parser for Dutch by Gertjan van
+ Noord and is released on LGPL license. Alpino is available at
+ http://www.let.rug.nl/~vannoord/alp/Alpino/.
+
+ Russian:
+ Russian dictionary originally developed by www.aot.ru and licensed under LGPL.
+ http://www.aot.ru/download.php file rus-src-morph.tar.gz
+ It was partially converted to fsa format in 2008 by Yakov.
+
+ Swedish:
+ The Swedish data are based on DSSO. The Initial Developer of the Original Code is Göran Andersson.
+ Contributor(s):
+ Tom Westerberg <tweg@welho.com>
+ Niklas Johansson <sleeping.pillow@gmail.com>
+ The Swedish Dictionary may be used under the terms of the GNU Lesser General Public License Version 2.1 or later
+ (the "LGPL").
+ http://dsso.se
+
+ French:
+ The French data for part-of-speech tagging are from the Dicollecte project.
+ They are made available here under LGPL. See detailed information in
+ resource/fr/README_lexique.txt
+
+ Galician:
+ The Galician data for part-of-speech tagging were created by Susana Sotelo
+ Docio based on Freeling dictionary and henceforth licensed under GPL.
+
+------------------------------------------------
+
+ English:
+ The English data for part-of-speech tagging are based on:
+
+ 1) Automatically Generated Inflection Database (AGID) version 4,
+ Copyright 2000-2003 by Kevin Atkinson <kevina@gnu.org>
+ The part-of-speech database is taken from Alan Beale 2of12id
+ and the WordNet database which is under the following copyright:
+
+ This software and database is being provided to you, the LICENSEE, by
+ Princeton University under the following license. By obtaining, using
+ and/or copying this software and database, you agree that you have
+ read, understood, and will comply with these terms and conditions.:
+
+ Permission to use, copy, modify and distribute this software and
+ database and its documentation for any purpose and without fee or
+ royalty is hereby granted, provided that you agree to comply with
+ the following copyright notice and statements, including the disclaimer,
+ and that the same appear on ALL copies of the software, database and
+ documentation, including modifications that you make for internal
+ use or for distribution.
+
+ WordNet 1.6 Copyright 1997 by Princeton University. All rights reserved.
+
+ THIS SOFTWARE AND DATABASE IS PROVIDED "AS IS" AND PRINCETON
+ UNIVERSITY MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR
+ IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, PRINCETON
+ UNIVERSITY MAKES NO REPRESENTATIONS OR WARRANTIES OF MERCHANT-
+ ABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE OR THAT THE USE
+ OF THE LICENSED SOFTWARE, DATABASE OR DOCUMENTATION WILL NOT
+ INFRINGE ANY THIRD PARTY PATENTS, COPYRIGHTS, TRADEMARKS OR
+ OTHER RIGHTS.
+
+ The name of Princeton University or Princeton may not be used in
+ advertising or publicity pertaining to distribution of the software
+ and/or database. Title to copyright in this software, database and
+ any associated documentation shall at all times remain with
+ Princeton University and LICENSEE agrees to preserve same.
+
+ Alan Beale 2of12id.txt is indirectly derived from the Moby part-of-speech
+ database and the WordNet database. The Moby part-of-speech is in the
+ public domain:
+
+ The Moby lexicon project is complete and has
+ been place into the public domain. Use, sell,
+ rework, excerpt and use in any way on any platform.
+
+ Placing this material on internal or public servers is
+ also encouraged. The compiler is not aware of any
+ export restrictions so freely distribute world-wide.
+
+ You can verify the public domain status by contacting
+
+ Grady Ward
+ 3449 Martha Ct.
+ Arcata, CA 95521-4884
+
+ grady@netcom.com
+ grady@northcoast.com
+
+ For more information on wordlists used, see agid-readme.txt.
+
+ 2) Part Of Speech Database, compiled by Kevin Atkinson
+ <kevina@users.sourceforge.net>
+ The part-of-speech.txt file contains is a combination of
+ "Moby (tm) Part-of-Speech II" and the WordNet database (see above and
+ pos-readme.txt).
+
+ 3) 2of12inf wordlist, released to public domain,
+ see 12dicts-readme.html.
+
+ 4) Public domain Moby wordlists were used also for generating
+ POS tag information for common proper names.
+
+ For more information, see the scripts in the source directory
+ en/resource/.