Language and/or domain experts evaluates a source version, and provides feedback about potential problems and suggestion for better wording.
The result of that evaluation is a TA report, which is sent to the client and used to improve the source version.
>> The information in those TA reports can be (re)usedto some extent to automate that kind of evaluation.
>> The automation does not mean to replace the experts, but to help them avoid manual rework, e.g. avoid having the same issues being flagged manually over and over.
DRY principle
DRY: Don't Repeat Yourself = duplication is waste
Originally used in software development, applied to business logic of programs, where duplication should be eliminated by means of abstraction, this principle can also be applied to any process, where duplication (i.e. rework) should be eliminated via automation.
Creating rules
We evaluate each issue identified in manual TA reports to see whether a rule can be created to check for that issue automatically.
Rules created that way may be simply literal expressions or a bit more complex patterns that entail some degree of abstraction, to flag not only that specific issue but a type of issue.
Rules also include the feedback that must be included in the report should that potential issue be found.
Rules are grouped in client- or project-specific rulesets, so that each report only includes checks that are relevant.
How does it work
The project manager or the linguist submits the file with the source text to the VF4TA utility, or runs the utility on OmegaT
The utility runs all checks available for that client or project and produces the automated TA report, including:
Translatability category
Comment (about what the problem is)
Suggestion for rewording
The project manager or linguist examines that TA report, removes that is not relevant, and adds (after possible edits) the feedback that is useful to the final TA report to the client.
Limitations and future dev.
Manual TA reports do not aim at creating issue types that can be generalised to other texts. Creating rules based on that is not feasible for every issue and can be time-consuming.
The checks rely on regular expressions (i.e. pattern matching technique), which can be very powerful but also difficult to write and maintain for a common user. The system relies heavily on technical expertise. Machine learning to be explored to overcome this.
No language processing is done at the moment. Future development should include lemmatisation and POS tagging.
Results always include some deal of noise. Improving this relies on feedback from consumers of the TA report.