Daniel W. Hieber
University of California, Santa Barbara
May 24, 2019
Slides available at:
https://slides.com/dwhieb/digital-linguistics-for-language-documentation
Digital Linguistics (DLx) is the science of the digital data management for linguistics, including the digital storage, representation, manipulation, and dissemination of linguistic data. It concerns itself with how to represent linguistic data in digital form, as well as best practices for working with that data, while being attentive to best practices and ethical concerns in language documentation, sociocultural linguistics, and language revitalization.
Types of things called "data" in linguistics:
Data that describes another set of data.
Different tools utilize different metadata formats, or just use their own
Backup and/or archive at every stage
Backup and/or archive at every version
Data are in "binary" format files (i.e. non-text files)
Must have specialized software to read
Not human-readable
Images | .jpg, .jpeg, .png, .svg |
Scans / Documents | .pdf, .docx |
Audio | .wav, .mp3, .wma |
Video | .mpeg, .avi, .mov, .mp4 |
Databases | .xlsx, .accdb, .fmp |
Markup
Non-Proprietary
Proprietary
Problems
Recommendations