Development of a Lexical Platform

for Distant Languages


#TCLT8

2014-jun-07

@edouard_lopez


  • French Web Engineer

                       ↖That's where I live and work


  • Background:
    • Cognitive Science,
    • NLP (Natural Language Processing),
    • Web Accessibility


  • Languages: French, English, Spanish, Chinese, Japanese


  • I'm not a linguist (sorry), but
    • I taught FFL,
    • and coding to kids

INALCO

  • Best French University to study foreign languages.
    • ~93 languages
    • poorly structured/accessible data


  • Collaboration with one of their PhD student.
    • (actually my brother :)


  • Project in early stage
    • using Chinese as pilot language

Importance of Lexicon


  • Lexical competence
    • elementary bricks ;
    • critical in foreign language acquisition.


  • Chinese (Foreign) Language
    • can be opaque;
    • difficulties to maintain expansion for
      advanced learners.


  • Great need to provide the right stimulus at the right moment.

But how?


  • Limited Human resources
    • Class of 30+ persons
    • Each different from the others
    • Limited time
    • Limited knowledge (we are humans after all)


  • Using technologies, right ?
    • There is lack of electronic resources in French
      • Dictionary as .doc ? Please don't, ask IT guy they can help

  • Writing a dictionary, what format to use ?
    • does NOT matter to end-user
 

Platform Stack


  • Disclaimer
    • Work in Progress ;
    • Throw away version ;
    • we are exploring, testing, failing… to improve!


  • Some technical principles we want:
    • cross-platform ;
    • flexibility ;
    • openness ;
    • plug-n-play ;
    • user tailored content.

Back-end


  • Database (MySQL)
    • pros: widespread, easy to deploy
    • cons: inconsistent behaviors (types, UTF-8), lack of FTS (<5.6), slow (300k entries).

  • Server Application (PHP/CodeIgniter)
    • pros: widespread, light, fast, easy to deploy
    • cons: slow & dirty

  • API (JSON)
    • pros: loose-coupling (REST), human & machine friendly, widespread, light weight.
    • cons: v1.0

Front-end


  • Web Browser
    • we use AngularJS by Google
      • goal: provide high level DSL for end-user

  • Mobile
    • Default webapp is mobile-friendly
      • but work online
    • PhoneGap (see Hugo presentation)
      • web → native
      • works offline

  • Others
    • Whatever you want we got an API!


DSL


 <span>{{entry.def}}</span>
  • chambre, salle
 <span>{{entry.ort}}</span>
 <span>{{entry.ortx1}}</span>
 <span>{{entry.pho}}</span>
  • jian1
 <span>{{entry.pho | pinyin}}</span>
  • jiān




Web-app

Demo





Q&A





mail: dev+cfdict@edouard-lopez.com
Twitter: @edouard_lopez

Made with Slides.com