Open Richly Annotated Cuneiform Corpus
RSDG Team Meeting - 9th October 2018
Raquel Alegre
Metadata: project info, lang, protocols...
Transliteration and lemmatization
Translation
Comments
Descriptions:
rulings, blank, ...
Sections:
object, parts. ...
Breaks the input text into a stream of tokens and matches with RE:
#
project
:
cams/gkab
+
+
+
+
[new line]
t_HASH
r'\#'
PROJECT
t_COLON
r'\:'
t_ID
t_NEWLINE
r'\/n'
r'[a-zA-Z0-9]+[/]?[a-zA-Z0-9]+'
Yacc parses and does semantic processing on the stream of tokens produced by Lex, following a grammar description:
expression : expression + term
| expression - term
| expression * term
| expression / term
| term
3 * 5 + 1
def p_document(self, p):
"""document : text
| object
| composite"""
def p_text_language(self, p):
"text : text language_protocol"
p[0] = Text()
p[0].language = p[2]
def p_language_protocol(self, p):
"language_protocol : ATF LANG ID newline"
p[0] = p[3]
> pip install pyoracc
Graphical User Interface for edition of ATF texts
Questions?