Daniel W. Hieber
University of California, Santa Barbara
May 24, 2019
Slides available at:
Digital Linguistics (DLx) is the science of the digital data management for linguistics, including the digital storage, representation, manipulation, and dissemination of linguistic data. It concerns itself with how to represent linguistic data in digital form, as well as best practices for working with that data, while being attentive to best practices and ethical concerns in language documentation, sociocultural linguistics, and language revitalization.
Types of things called "data" in linguistics:
Data that describes another set of data.
Backup and/or archive at every stage
Backup and/or archive at every version
Data are in "binary" format files (i.e. non-text files)
Must have specialized software to read
|Images||.jpg, .jpeg, .png, .svg|
|Scans / Documents||.pdf, .docx|
|Audio||.wav, .mp3, .wma|
|Video||.mpeg, .avi, .mov, .mp4|
|Databases||.xlsx, .accdb, .fmp|