David Alfter
Jürgen Knauth
18 September 2015
Source: https://commons.wikimedia.org/wiki/File:BoreanLanguageTree.png
Morphological information added by affigation
No 1:1 correspondence
naccagītavāditavisūkadassanamālāgandhavilepanadhāraṇamaṇḍanavibhūsanaṭṭhānā
naccagītavāditavisūka-dassanamālāgandhavilepanadhāraṇamaṇḍanavibhūsana-ṭṭhānā
dancing singing music show-watching garland perfume cosmetics wearing decoration decoration
naccagītavāditavisūka-dassanamālāgandhavilepanadhāraṇamaṇḍanavibhūsana-ṭṭhānā
dancing, singing, music, going to see entertainments, wearing garlands, using perfumes, and beautifying the body with cosmetics
naccagītavāditavisūkadassanamālāgandhavilepanadhāraṇamaṇḍanavibhūsanaṭṭhānā veramaṇi sikkhāpadaṃ samādiyāmi
I adopt the precept of refraining from ...
evaṃ ca (and thus) → evañca
paca + ti → pacati (he cooks)
paca + mi → pacāmi (I cook)
canda (moon) + udayo (rising) → candodayo (rising of the moon)
paca + ti → pacati (he cooks)
paca + mi → pacāmi (I cook)
canda (moon) + udayo (rising) → candodayo (rising of the moon)
Credit: http://iflizwerequeen.com
Written in different scripts
Written in different scripts
Introduces variation!
Scarce and not exhaustive
and Overgeneration
Dictionary lookup
Rule based generation:
Lemma => Stem
Stem + Ending => Form
Dictionary lookup
Compiled Morphological Information
<paradigms> <paradigm type="noun"> <number type="singular"> <declension type="a"> <gender type="masculine"> <case type="nominative"> <ending>o</ending> <ending type="Drare">e</ending> </case> <case type="vocative"> <ending>a</ending> <ending>ā</ending> <ending type="Drare">e</ending> <ending type="Drare">o</ending> </case> <case type="accusative"> <ending>aṃ</ending> </case>
<paradigms>
<paradigm type="noun">
<number type="singular">
<declension type="a">
<gender type="masculine">
<case type="nominative">
<ending>o</ending>
<ending type="Drare">e</ending>
</case>
<case type="vocative">
<ending>a</ending>
<ending>ā</ending>
<ending type="Drare">e</ending>
<ending type="Drare">o</ending>
</case>
<case type="accusative">
<ending>aṃ</ending>
</case>
<paradigms> <paradigm type="noun"> <number type="singular"> <declension type="a"> <gender type="masculine"> <case type="nominative"> <ending>o</ending> <ending type="Drare">e</ending> </case> <case type="vocative"> <ending>a</ending> <ending>ā</ending> <ending type="Drare">e</ending> <ending type="Drare">o</ending> </case> <case type="accusative"> <ending>aṃ</ending> </case>
<paradigms> <paradigm type="noun"> <number type="singular"> <declension type="a"> <gender type="masculine"> <case type="nominative"> <ending>o</ending> <ending type="Drare">e</ending> </case> <case type="vocative"> <ending>a</ending> <ending>ā</ending> <ending type="Drare">e</ending> <ending type="Drare">o</ending> </case> <case type="accusative"> <ending>aṃ</ending> </case>
deva => dev-
dev- + -o => devo
Lemma => Stem
Stem + Ending => Form
<declension type="ant"> <gender type="masculine"> <case type="nominative"> <ending>aṃ</ending> <ending>ā</ending> <ending type="Cm2">anto</ending> <ending type="Drare">o</ending> <ending>ato</ending> </case>
I make
I cook
stem: bhav-
ending: -anto
form: bhavanto
bhanto
(to make)
(to make)
(to cook)
(to fight)
core-, coraya-
(to steal)
rundha-, rundhi-, rundhī-, rundhe-, rundho-
(to obstruct)
Full/Partial Irregularity
Key:Value pairs
Receiver can decide what information to use
{" lemma":"eka","forms ":{"numeral":[{
"gender ":"masculine", "number ":" singular",
"word ":" eko", "case":" nominative"},
{"gender ":"masculine", "number ":" singular","word ":"ekassa", "case":" genitive"},...
Dictionary/Table lookup
Identify paradigmatic ending
→ Morphological Analysis
→ Separation Stem-Ending
buddhe
<gender type="masculine"> <case type="nominative"> <ending>o</ending> <ending type="Drare">e</ending> </case> <case type="vocative"> <ending>a</ending> <ending>ā</ending> <ending type="Drare">e</ending> <ending type="Drare">o</ending> </case> <case type="accusative"> <ending>aṃ</ending> </case>
buddhe
<gender type="masculine"> <case type="nominative"> <ending>o</ending> <ending type="Drare">e</ending> </case> <case type="vocative"> <ending>a</ending> <ending>ā</ending> <ending type="Drare">e</ending> <ending type="Drare">o</ending> </case> <case type="accusative"> <ending>aṃ</ending> </case>
if (ends(lemma, "a", "ā", "i", "ī", "u", "ū", "ant", "vā", "mā", "at")) {
guesses.add("adjective");
}
if (ends(lemma, "a", "i", "aṃ", "ma", "ya")) {
guesses.add("numeral");
}
if (ends(lemma, "uṃ")) {
guesses.add("indeclinable");
}
Code Excerpt
Accuracy | |
---|---|
Nouns-Adjectives | 99.96% |
Pronouns | 88.57% |
Numerals | 76.62% |
Verbs | 63.37% |
Regular Expressions
Replacement rules | |
---|---|
\bpañca\b | X |
ñca\b | ṃ ca |
X | pañca |
ñhi\b | ṃ hi |
ñpi\b | ṃ pi |
Replacement rules | |
---|---|
\bpañca\b | X |
ñca\b | ṃ ca |
X | pañca |
ñhi\b | ṃ hi |
ñpi\b | ṃ pi |
[n. ag. fr. abhijjhita in med. function] one who covets M <smallcaps>i.</smallcaps> 287 (T. abhijjhātar, v. l. °itar) = A <smallcaps>v.</smallcaps> 265 (T. °itar, v. l. °ātar).
[n. ag. fr. abhijjhita in med. function] one who covets M <smallcaps>i.</smallcaps> 287 (T. abhijjhātar, v. l. °itar) = A <smallcaps>v.</smallcaps> 265 (T. °itar, v. l. °ātar).
Pacati,[Ved.pacati,Idg.*peqǔō,Av.pac-; Obulg.peka to fry,roast,Lith,kepū bake,Gr.pέssw cook,pέpwn ripe] to cook,boil,roast Vin.IV,264; fig.torment in purgatory (trs.and intrs.):Niraye pacitvā after roasting in N.S.II,225,PvA.10,14.-- ppr.pacanto tormenting,Gen.pacato (+Caus.pācayato) D.I,52 (expld at DA.I,159,where read pacato for paccato,by pare daṇḍena pīḷentassa).-- pp.pakka (q.v.).‹-› Caus.pacāpeti & pāceti (q.v.).-- Pass.paccati to be roasted or tormented (q.v.).(Page 382)
Attested forms only
Splitting Internal Sandhi
"When two vowels meet, one may be elided."
When two vowels meet:
8 vowels
n-vowel-word
(DENTAL) (CONSONANT) : duplicate($2)
224 possibilities
151 rules
Sandhi merge rules
151 rules
Sandhi merge rules
Sandhi split rules
1103 rules
Morphological
analyzer
and generator
Dictionary
Morphological
analyzer
and generator
Dictionary
Server
Morphological
analyzer
and generator
Dictionary
Server
Dictionary
GUI
Data processor
and scripting
engine
Corpus management and processing
tool
Questions?