Language Segmentation of Twitter Tweets using Weakly Supervised Language Model Induction
David Alfter
15 September 2015
@daalft
The Problem
[n. ag. fr. abhijjhita in med. function] one who covets M <smallcaps>i.</smallcaps> 287 (T. abhijjhātar, v. l. °itar) = A <smallcaps>v.</smallcaps> 265 (T. °itar, v. l. °ātar).
[n. ag. fr. abhijjhita in med. function] one who covets M <smallcaps>i.</smallcaps> 287 (T. abhijjhātar, v. l. °itar) = A <smallcaps>v.</smallcaps> 265 (T. °itar, v. l. °ātar).
Pacati,[Ved.pacati,Idg.*peqǔō,Av.pac-; Obulg.peka to fry,roast,Lith,kepū bake,Gr.pέssw cook,pέpwn ripe] to cook,boil,roast Vin.IV,264; fig.torment in purgatory (trs.and intrs.):Niraye pacitvā after roasting in N.S.II,225,PvA.10,14.-- ppr.pacanto tormenting,Gen.pacato (+Caus.pācayato) D.I,52 (expld at DA.I,159,where read pacato for paccato,by pare daṇḍena pīḷentassa).-- pp.pakka (q.v.).‹-› Caus.pacāpeti & pāceti (q.v.).-- Pass.paccati to be roasted or tormented (q.v.).(Page 382)
Abbha, (nt.) [Vedic abhra nt. & later Sk. abhra m. \"dark cloud\"; Idg. *m̊bhro, cp. Gr. <at>a)fro\\s</at> scum, froth, Lat. imber rain; also Sk. ambha water, Gr. <at>o)/mbros</at> rain, Oir ambu water]. A (dense & dark) cloud, a cloudy mass A <smallcaps>ii.</smallcaps> 53 = Vin <smallcaps>ii.</smallcaps> 295 = Miln 273 in list of to things that obscure moon-- & sunshine, viz. <b>abbhaŋ mahikā</b> (mahiyā A) <b>dhū- marajo</b> (megho Miln), <b>Rāhu</b> . This list is referred to at SnA 487 & VvA 134. S <smallcaps>i.</smallcaps> 101 (°sama pabbata a mountain like a thunder--cloud); J <smallcaps>vi.</smallcaps> 581 (abbhaŋ rajo acchādesi); Pv <smallcaps>iv.</smallcaps> 3 <superscript>9</superscript> (nīl° = nīla--megha PvA 251). As f. <b>abbhā</b> at Dhs 617 & DhsA 317 (used in sense of adj. \"dull\"; DhsA expl <superscript>s.</superscript> by valāhaka); perhaps also in <b>abbhāmatta</b> . <br /><b>--kūṭa</b> the point or summit of a storm--cloud Th 1, 1064; J <smallcaps>vi.</smallcaps> 249, 250; Vv 1 <superscript>1</superscript> (= valāhaka--sikhara VvA 12). <b>--ghana</b> a mass of clouds, a thick cloud It 64; Sn 348 (cp. SnA 348). <b>--paṭala</b> a mass of clouds DhsA 239. <b>--mutta</b> free from clouds Sn 687 (also as abbhāmutta Dh 382). <b>--saŋvilāpa</b> thundering S <smallcaps>iv.</smallcaps> 289.
The Intuition
LM 1
LM 2
[n. ag. fr. abhijjhita in med. function] one who covets M <smallcaps>i.</smallcaps> 287 (T. abhijjhātar, v. l. °itar) = A <smallcaps>v.</smallcaps> 265 (T. °itar, v. l. °ātar).
LM 3
LM 1
LM 2
[] M
<smallcaps>i.</smallcaps>
287 (T. v. l.) = A
<smallcaps>v.</smallcaps>
265 (T., v. l.).
LM 3
n. ag.
fr. in med.
function one who
covets
abhijjhita
abhijjhātar
°itar °ātar
The Approach
N-GRAM Language Model
{
n-gram probability
N-GRAM Language Model
word probability
word word mot palabra word palabra
word word mot palabra word palabra
word word mot palabra word palabra
word word mot palabra word palabra
0.74
word word mot palabra word palabra
word word mot palabra word palabra
word word mot palabra word palabra
0.02
word word mot palabra word palabra
word word mot palabra word palabra
0.01
word word mot palabra word palabra
0.02
word word mot palabra word palabra
word word mot palabra word palabra
0.89
word word mot palabra word palabra
0.41
word word mot palabra word palabra
0.35
word word mot palabra word palabra
word word
word
mot
palabra
palabra
The Catch
mot palabra palabra palabra palabras...
mot palabra palabra palabra palabras...
mot palabra palabra palabra palabras...
mot palabra palabra palabra palabras...
mot palabra palabra palabra palabras...
mot palabra palabra palabra palabras...
mot palabra palabra palabra palabras...
mot palabra palabra palabra palabras...
mot palabra palabra palabra palabras...
mot palabra palabra palabra palabras...
mot palabra palabra palabra palabras...
Let's reverse it!
mot palabra palabra palabra palabras...
mot palabra palabra palabra palabras...
mot palabra palabra palabra palabras...
mot palabra palabra palabra palabras...
mot palabra palabra palabra palabras...
mot palabra palabra palabra palabras...
mot palabra palabra palabra palabras...
mot palabra palabra palabra palabras...
mot palabra palabra palabra palabras...
Forward/Backwards generation
Merge most similar models
Similarity measure: Unigram distribution
Final models
word
word
mot
palabra
palabra
palabra
word
word-model assignment
The Results
[n. ag. fr. abhijjhita in med. function] one who covets M <smallcaps>i.</smallcaps> 287 (T. abhijjhātar, v. l. °itar) = A <smallcaps>v.</smallcaps> 265 (T. °itar, v. l. °ātar).
Μόλις ψήφισα αυτή τη λύση Internet of Things, στο διαγωνισμό BUSINESS IT EXCELLENCE.
Demain #dhiha6 Keynote 18h @dhiparis "The collective dynamics of science-publish or perish; is it all that counts?" par David
Food and breuvages in Edmonton are ready to go, just waiting for the fans #FWWC2015 #bilingualism
Buna dabo naw (coffee is our bread).
Thank you for your attention
Thank you for your attention
Questions?
Tweet MT Slides
By daalft
Tweet MT Slides
- 791