Maze of Frequent Questions

What do we consider “new languages?”

 

Why is diversifying NLP important?

 

We seek scholars eager to contribute “new languages — which include those with few existing resources, such as Mauritian Creole, Plains Cree, Gaelic, and Guadeloupean Creole. New languages also include domain-specific languages that currently lack models, such as early modern Portuguese or the literary Russian diction used by Leo Tolstoy.

 

What do we consider “new languages?”

 

Why is diversifying NLP important?

 

Why is diversifying NLP important?

 

NLP has revolutionized our ability to analyze texts at scale. However, of the world's more than 7,500 languages, the major NLP resources only support eighty-five. While large linguistic datasets exist for high-resource languages such as English or German, text mining, topic modeling and other methods of computational text analysis are unavailable for the vast majority of languages —especially those that are minority, regional or endangered.

We are acutely aware of the risks to research—and to culture more broadly—if language technologies continue to lack diversity. The proliferation of data and tools in several dominant languages will perpetuate and deepen the existing structural inequalities on both local and global scales.

Who is eligible to apply?

 

Who is eligible to apply?

 

Scholars with language or domain expertise from any field of the humanities and humanistic social sciences are eligible to apply. Applicants may be researchers in any professional role (e.g. faculty, graduate student, independent research scholar, librarian, curator, information professional). We especially welcome proposals from researchers from less-resourced institutions or from those in contingent or non-affiliated roles. Applicants must have a clear research question and an existing machine-readable corpus in their language.

Applicants may apply as individuals or pairs. Pairs may include a language or domain expert and a data scientist or computational linguist, or digital humanities expert. Team members may be from different institutions.

Non-US individuals and those based at non-US institutions are eligible to apply.

New Language for NLP

By Andrew Janco

New Language for NLP

  • 477