summaryrefslogtreecommitdiffstats
path: root/JLanguageTool/src/resource/it/readme-morph-it.txt
diff options
context:
space:
mode:
Diffstat (limited to 'JLanguageTool/src/resource/it/readme-morph-it.txt')
-rw-r--r--JLanguageTool/src/resource/it/readme-morph-it.txt415
1 files changed, 415 insertions, 0 deletions
diff --git a/JLanguageTool/src/resource/it/readme-morph-it.txt b/JLanguageTool/src/resource/it/readme-morph-it.txt
new file mode 100644
index 0000000..009a42f
--- /dev/null
+++ b/JLanguageTool/src/resource/it/readme-morph-it.txt
@@ -0,0 +1,415 @@
+===================================================================
+ Morph-it!
+
+ A free morphological lexicon for the Italian Language
+===================================================================
+
+ version 0.4.8
+ February 23 2009
+
+*******************************************************************
+ THIS README IS NOT REALLY UP TO DATE
+ A NEW VERSION OF THIS
+ README FILE WILL BE
+ RELEASED (HOPEFULLY) SOON
+ (BUT I WOULDN'T COUNT ON THAT...)
+*******************************************************************
+
+ Copyright (c) 2004-2009
+ Marco Baroni (marco.baroni@unitn.it)
+ Eros Zanchetta (eros@sslmit.unibo.it)
+
+ http://sslmit.unibo.it/morphit
+
+
+Morph-it! is a free (as in free speech and in free beer) morphological
+resource for the Italian language.
+
+Morph-it! is a lexicon of inflected forms with their lemma and
+morphological features. For example:
+
+gattini gattino NOUN-M:p
+andarono andare VER:ind+past+3+p
+fastidiosetto fastidioso ADJ:dim+m+s
+
+As of version 0.4.7 the list contains 504,906 entries and 34,968
+lemmas.
+
+Morph-it! can be used as a data source for an Italian lemmatizer /
+morphological analyzer / morphological generator.
+
+As example applications, on the Morph-it! site you can download the
+lexicon compiled for the SFST [1] and Finite State Utilities [2]
+packages.
+
+The data for Morph-it! were prepared by Marco Baroni and Eros
+Zanchetta using a mixture of corpus-based methods,
+regular-expression-based rules and manual checking. We are currently
+writing a paper that describes the procedure we used to build the
+resource.
+
+Morph-it! is still under development and there may still be gaps,
+unlikely forms, etc. We will be very grateful if you let us know
+about missing forms, problems, and ideas/resources that can help
+us expanding or cleaning the list (sslmitdevonline@sslmit.unibo.it).
+
+Notice in particular that, since we extracted data from an Italian
+newspaper corpus (the la Repubblica corpus, also accessible from our
+site), we have many gaps in basic, every-day vocabulary.
+
+Also, the current version does not distinguish between coordinative
+and subordinative conjunctions. We plan to do this in the near
+future. More in general, we are not fully satisfied with our current
+features for function words, and we plan to revise them.
+
+A more ambitious plan we would like to pursue is the identification
+of derivational structure and derivationally related lemmas. Then, we
+will add full semantic representations. Then, we will take over the
+world and reign supreme for the next 100 years.
+
+The remainder of this document contains a commented list of the
+morphological features used in the lexicon, licensing information and
+aknowledgments.
+
+
+FEATURES
+========
+
+We distinguish between derivational features, that pertain to the
+lemma, and inflectional features, that pertain to the wordform.
+
+Derivational and inflectional features are separated by a colon.
+
+The derivational features are in upper case and they are
+dash-delimited. The inflectional features are in lower case and they
+are plus-sign-delimited.
+
+For example, we represent gender as a derivational feature of nouns
+(we take "cameriere" and "cameriera" to belong to different lemmas),
+whereas we treat number as an inflectional feature of nouns. Thus,
+gender and number are represented as in the following examples:
+
+cameriere cameriera NOUN-F:p
+cameriera cameriera NOUN-F:s
+camerieri cameriere NOUN-M:p
+cameriere cameriere NOUN-M:s
+
+For adjectives, gender is considered an inflectional feature. Thus,
+gender is represented differently in adjectives and nouns:
+
+azzurre azzurra NOUN-F:p
+azzurra azzurra NOUN-F:s
+azzurri azzurro NOUN-M:p
+azzurro azzurro NOUN-M:s
+
+azzurra azzurro ADJ:pos+f+s
+azzurri azzurro ADJ:pos+m+p
+azzurro azzurro ADJ:pos+m+s
+azzurre azzurro ADJ:pos+f+p
+
+Changes that are purely orthographical/phonological but do not affect
+morphology/syntax/meaning are not reflected in the features. For
+example, the following variants of "cento" share the same lemma and
+the same features:
+
+cent' cento DET-NUM-CARD
+cento cento DET-NUM-CARD
+
+We now present the full list of features we used, organized by major
+syntactic categories.
+
+ABL
+
+Abbreviated locutions, such as "a.C.", "ecc." and "i.e."
+
+ADJ
+
+Adjectives, with the following inflectional features:
+
+pos/comp/sup
+
+Thas is: positive, comparative, superlative. Although these are not
+true inflectional features, given their high productivity we decided
+to represent them as properties of inflected forms.
+
+f/m
+
+That is: feminine, masculine.
+
+s/p
+
+Thas is: singular, plural.
+
+ADV
+
+Adverbs.
+
+ART
+
+Articles, with gender as a derivational feature (F/M) and number as an
+inflectional feature (s/p).
+
+ARTPRE
+
+Preposition+article compounds ("col", "della", "nei"...), with gender
+as a derivational feature (F/M) and number as an inflectional feature
+(s/p).
+
+ASP
+
+Aspectuals ("stare" in "stare per"). Same inflectional features as VER
+(see below).
+
+AUX
+
+Auxiliaries ("essere", "avere", "venire"). Same inflectional features
+as VER (see below).
+
+CAU
+
+Causatives ("fare" in "far sapere"). Same inflectional features as VER
+(see below).
+
+CE
+
+Clitic "ce" as in "ce l'ho fatta".
+
+CI
+
+Clitic "ci" as in "ci prova".
+
+CON
+
+Conjunctions.
+
+DET-DEMO
+
+Demonstrative determiners (such as "questa" in "questa sera"), with
+inflectional gender (f/s) and number (s/p) features.
+
+DET-INDEF
+
+Indefinite determiners (such as "molti" in "molti amici") with
+inflectional gender (f/s) and number (s/p) features.
+
+DET-NUM-CARD
+
+Cardinal number determiners (e.g., "cinque" in "cinque
+amici"). Pure-digit numbers are not included (i.e., the list includes
+"100mila" but not "100000" nor "100,000", "100.000", etc.)
+
+DET-POSS
+
+Possessive determiners (e.g., "mio", "suo"), with inflectional gender
+(f/s) and number (s/p) features.
+
+DET-WH
+
+Wh determiners (e.g., quale in "quale amico"), with inflectional
+gender (f/s) and number (s/p) features.
+
+INT
+
+Interjections.
+
+MOD
+
+Modal verbs (e.g. "dover" in "dover ricostruire"). Same inflectional
+features as VER (see below).
+
+NE
+
+Clitic "ne" (as in: "ne hanno molte").
+
+NOUN
+
+Nouns, with gender as a derivational feature (F/M) and number as an
+inflectional feature (s/p).
+
+PON
+
+Non-sentential punctuation marks (e.g. , " $).
+
+PRE
+
+Prepositions.
+
+PRO-DEMO
+
+Demonstrative pronouns (e.g. "questa" in "voglio questa"), with both
+gender and number as derivational features (F/M, S/P).
+
+PRO-INDEF
+
+Indefinite pronouns (e.g., "molti" in "vengono molti"), with both
+gender and number as derivational features (F/M, S/P).
+
+PRO-NUM
+
+Numeral pronouns (e.g., "cinque" in "cinque sono
+sopravvissuti"). Pure-digit numbers are not included (e.g., the list
+includes "100mila" but not 100000 nor 100,000, 100.000, etc.)
+
+PRO-PERS
+
+Personal pronouns, such as "lui" and "loro". Clitic possessive
+pronouns (such as pronominal "lo" and "si") are marked by the
+derivational feature CLI. Person, gender and number are also encoded
+as derivational features (1/2/3, F/M, S/P).
+
+PRO-POSS
+
+Possessive pronouns, such as "loro" in "non era uno dei loro"), with
+gender and number encoded as derivational features (F/M, S/P).
+
+PRO-WH
+
+Wh-pronouns, such as "quale" in "quale e' venuto?"
+
+SENT
+
+End of sentence marker (! . ... : ?).
+
+SI
+
+Clitic "si" as in "di cui si discute".
+
+TALE
+
+"Tale" in constructions such as "una fortuna tale che...", "la tal
+cosa", "tali amici", ecc. Gender (f/m) and number (s/p) as
+inflectional features.
+
+VER
+
+Verbs, with the following inflectional features:
+
+cond/ger/impr/ind/inf/part/sub
+
+Conditional, gerundive, imperative, indicative, infinitive,
+participle, subjunctive.
+
+pre/past/impf/fut
+
+Present, past, imperfective, future.
+
+1/2/3
+
+Person.
+
+s/p
+
+Number.
+
+f/m
+
+Gender (only relevant for participles).
+
+cela/cele/celi/celo/cene/ci/gli/gliela/gliele/glieli/glielo/gliene/la/
+le/li/lo/mela/mele/meli/melo/mene/mi/ne/sela/sele/seli/selo/sene/si/
+tela/tele/teli/telo/tene/ti/vela/vele/veli/velo/vene/vi
+
+Clitics attached to the verb.
+
+WH
+
+Wh elements ("come", "qualora", "quando"...)
+
+WH-CHE
+
+"Che" as a wh element (e.g., "l'uomo che hai visto", "hai detto che").
+
+
+LICENSING INFORMATION
+======================
+
+This program is dual-licensed free software; you can redistribute it
+and/or modify it under the terms of the under the Creative Commons
+Attribution ShareAlike 2.0 License and the GNU Lesser General Public
+License.
+
+***********************************************
+* Creative Commons Attribution ShareAlike 2.0 *
+***********************************************
+
+Morph-it! is licensed under the Creative Commons Attribution
+ShareAlike 2.0 License.
+
+You are free:
+
+- to copy, distribute and display the resource;
+- to make derivative works;
+- to make commercial use of the resource;
+
+under the following conditions:
+
+- you must give the original authors credit;
+- if you alter, transform, or build upon this work, you may distribute
+ the resulting work only under a license identical to this one;
+- for any reuse or distribution, you must make clear to others the
+ license terms of this work;
+- any of these conditions can be waived if you get permission from the
+ copyright holders.
+
+Your fair use and other rights are in no way affected by the above.
+
+You can find a link to the full license from the Morph-it! website.
+
+Copyright (C) 2004-2007 Marco Baroni and Eros Zanchetta.
+
+*************************************
+* GNU Lesser General Public License *
+*************************************
+
+Morph-it! A free morphological lexicon for the Italian Language
+Copyright (C) 2004-2007 Marco Baroni and Eros Zanchetta
+
+This program is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 2 of the License, or
+(at your option) any later version.
+
+This program is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License along
+with this program; if not, write to the Free Software Foundation, Inc.,
+51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+
+AKNOWLEDGMENTS
+==============
+
+The main data source for the Morph-it! lexicon was the "la Repubblica"
+corpus. Thus, we would like to thank the colleagues who developed this
+resource with us: Lorenzo Piccioni, Guy Aston, Silvia Bernardini,
+Federica Comastri, Alessandra Volpi, Marco Mazzoleni.
+
+We would like to thank the developers of the tools we used to tag,
+lemmatize and index the Repubblica corpus: the (Italian) TreeTagger
+(Helmut Schmid, Achim Stein), the ACOPOST taggers (Ingo Schroeder) and
+the IMS Corpus WorkBench (Oli Christ, Arne Fitschen and Stefan Evert).
+
+Thanks to Helmut Schmid also for converting the Morph-it! lexicon into
+a SFST transducer.
+
+We would like to thank Aldo Calpini, who developed the perl module
+Lingua:IT:Conjugate.
+
+We are also very grateful to Jan Daciuk for creating his finite-state
+utilities and for helping us learn to use them.
+
+Finally, a big thanks to the members of the FoLUG, SannioLUG and
+Scuola (software libero nella scuola) mailing lists, for advice about
+licensing and dissemination.
+
+...and kudos to Lorenzo for creating and maintaining the SSLMITDev
+site!
+
+
+FOOTNOTES
+=========
+
+[1] http://www.ims.uni-stuttgart.de/projekte/gramotron/SOFTWARE/SFST.html
+[2] http://juggernaut.eti.pg.gda.pl/~jandac/fsa.html