An OmegaT-based TMS for simple translation workflows
Powered by
Bash, OmegaT, Github, Nextcloud and Python
Surpass Delta
Client: Prometric
Project: localization of the UI for Surpass Delta product
Periodicity: roughly once a quarter
Task: to translate new strings of text for new features that are being released on Delta
Languages: English to 59 languages (always the same)
Format: Excel (only some sheets/columns)
Hints
- Repetitive workflow
- Cumbersome management
- File format is constant
- Translation scope:
- not all files for all language versions
- not all parts of the (Excel) file
Challenges
- How (each) PMs organize things
- Sending emails
-
Booking subcontractors
-
Assigning jobs
-
-
File management
-
Uploading files
-
Downloading files
-
Putting files in the right folder
-
-
Reviewing deliverables
-
Checking completion
-
Checking tags
-
What are the manual steps that take most of PM's time?
- Sending emails
-
Booking subcontractors
-
Assigning jobs
-
-
File management
-
Uploading files -
Downloading files -
Putting files in the right folder
-
-
Reviewing deliverables
-
Checking completion -
Checking tags
-
What are the manual steps that take most of PM's time?
/glossary
/tm
/source
/target
/
/omegat
working TM
master TM
reference TM(s)
terminology
original docs
translated docs
user input
文
extraction
text
skeleton
merge
concordances
leverage (matches)
saved
bilingual
PROJECT ├── dictionary ├── glossary ├── omegat │ ├── filters.xml │ └── filter@configuration.frpm ├── omegat.project ├── source │ └── file.txt ├── target │ └── file.txt └── tm └── file.tmx
OmegaT project
PROJECT ├── dictionary ├── glossary ├── omegat │ ├── filters.xml │ └── filter@configuration.frpm ├── omegat.project ├── source │ └── file.txt ├── target │ └── file.txt └── tm └── file.tmx
PROJECT ├── dictionary ├── glossary ├── omegat │ ├── filters.xml │ └── filter@configuration.frpm ├── omegat.project ├── source │ └── file.txt ├── target │ └── file.txt └── tm └── file.tmx
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<omegat>
<project version="1.0">
<source_dir>__DEFAULT__</source_dir>
<source_dir_excludes>
<mask>**/.svn/**</mask>
<mask>**/.git/**</mask>
<mask>**/.hg/**</mask>
<mask>**/.repositories/**</mask>
<mask>**/Thumbs.db</mask>
<mask>**/.DS_Store</mask>
<mask>**/~$*</mask>
</source_dir_excludes>
<target_dir>__DEFAULT__</target_dir>
<tm_dir>__DEFAULT__</tm_dir>
<glossary_dir>__DEFAULT__</glossary_dir>
<glossary_file>__DEFAULT__</glossary_file>
<dictionary_dir>__DEFAULT__</dictionary_dir>
<source_lang>en</source_lang>
<target_lang>bg-BG</target_lang>
<source_tok>org.omegat.tokenizer.LuceneEnglishTokenizer</source_tok>
<target_tok>org.omegat.tokenizer.LuceneBulgarianTokenizer</target_tok>
<sentence_seg>true</sentence_seg>
<support_default_translations>true</support_default_translations>
<remove_tags>false</remove_tags>
</project>
</omegat>
Project structure
← the working TM sits here (not displayed)
← the reference TM(s) sit here.
← the master TMs are generated inside the project folder (root)
← these are the project settings
Folder structure in server
~/02_Clients/[CLIENT]/01_PROJECTS/[PROJECT]/01_Translation$ tree -L 1 . ├── 00_Admin ├── 10_History ├── 20_Automation ├── 30_Incoming ├── 40_Jobs ├── 50_Repos ├── 80_Deliverables └── 90_Assets
~/02_Clients/[CLIENT]/01_PROJECTS/[PROJECT]/01_Translation$ tree -L 1 . ├── 00_Admin ├── 10_History ├── 20_Automation ├── 30_Incoming ├── 40_Jobs ├── 50_Repos ├── 80_Deliverables └── 90_Assets
Folder structure in server
~/02_Clients/[CLIENT]/01_PROJECTS/[PROJECT]/01_Translation$ tree -L 1 . ├── 00_Admin ├── 10_History ├── 20_Automation ├── 30_Incoming ├── 40_Jobs ├── 50_Repos ├── 80_Deliverables └── 90_Assets
input
Folder structure in server
~/02_Clients/[CLIENT]/01_PROJECTS/[PROJECT]/01_Translation$ tree -L 1 . ├── 00_Admin ├── 10_History ├── 20_Automation ├── 30_Incoming ├── 40_Jobs ├── 50_Repos ├── 80_Deliverables └── 90_Assets
Folder structure in server
~/02_Clients/[CLIENT]/01_PROJECTS/[PROJECT]/01_Translation$ tree -L 1 . ├── 00_Admin ├── 10_History ├── 20_Automation ├── 30_Incoming ├── 40_Jobs ├── 50_Repos ├── 80_Deliverables └── 90_Assets
output
Folder structure in server
~/02_Clients/[CLIENT]/01_PROJECTS/[PROJECT]/01_Translation$ tree -L 1
.
├── 00_Admin
├── 10_History
├── 20_Automation
├── 30_Incoming
├── 40_Jobs
├── 50_Repos
├── 80_Deliverables
└── 90_Assets
output
input
Folder structure in server
Application modules
- Initiation (common for all versions)
- Create repositories for each version
- Harvesting translations
Application modules
- Initiation (common for all versions)
- Create repositories for each version
- Harvesting translations
1. Initiation
- Precondition: File format is constant and predictable
- The client drops a batch of files for translation (in file drop area, connected to 30_Incoming)
- A job folder is created for the batch of files under the PM folder (e.g. 40_Jobs > 2022_AUG01 > 01_Source) and the original files are moved there
- Pre-processing (convert Excel to JSON): extract translatable text and key columns
- A job folder is created for the batch in the common files repository (e.g. 50_Repos > 01_Common > PROJ_common_files > files > 2022_AUG01) and source (JSON) files are saved there
Job/batch folder
2022_AUG02
current year
current month
current job within the month
~/02_Clients/[CLIENT]/01_PROJECTS/[PROJECT]/01_Translation$ tree -L 1 . ├── 00_Admin ├── 10_History ├── 20_Automation ├── 30_Incoming ├── 40_Jobs ├── 50_Repos ├── 80_Deliverables └── 90_Assets
file drop
Folder structure in server
. ├── 30_Incoming ├── 40_Jobs │ ├── 2022_AUG01
│ │ ├── 00_Admin
│ │ ├── 01_Source
│ │ │ ├── file2_en.xls
│ │ │ └── file2_en.xls
│ │ ├── 02_Target
│ │ └── 03_Review
│ │ ├── Clean_Files │ │ └── Notes_Files ├── 50_Repos ├── 80_Deliverables └── 90_Assets
Folder structure in server
. ├── 30_Incoming ├── 40_Jobs ├── 50_Repos │ ├── 01_Common │ │ └── PROJ_common_files │ │ ├── files │ │ │ ├── 2022_AUG01 │ │ │ │ ├── file1_en.xls.json │ │ │ │ └── file2_en.xls.json │ │ └── settings │ ├── 02_Versions │ ├── 03_Harvest │ └── repo_urls.txt ├── 80_Deliverables └── 90_Assets
Folder structure in server
> org="capstan-PROJ"
> common_repo="PROJ_common_files"
> team="translators"
> # ---
> cd /path/to/PROJ_common_files
> git init
> git add . && git commit -m "initial commit"
> gh repo create $org/$common_repo --private --source=.
--remote=origin --team $team
> git push --set-upstream origin master
Create common files repo
> org="capstan-PROJ"
> common_repo="PROJ_common_files"
> team="translators"
> job_dname="2022_AUG01" # for example
> # add pre-processed json files
> cd /path/to/PROJ_common_files
> git add .
> git commit -m "New files added for job $job_dname"
> git push
Push new batch
Application modules
- Initiation (common for all versions)
- Create repositories for each version
- Harvesting translations
2. Create version repos
- Required: Version specifications
- For each version:
- Create Github repository (and clone it)
- Initialize OmegaT project in the local clone
- Add repository mappings
- to source files
- to settings
- to TMs
- Mask files that don't need to be translated
- Push files to the repo
- Write the repo's URL for the PM
50_Repos/ ├── 01_Common │ └── Delta_common_files ├── 02_Versions │ ├── Delta_amh-ETH_OMT │ ├── Delta_ara-ZZZ_OMT │ ├── Delta_bul-BGR_OMT │ └── _tech └── repo_urls.txt
Folder structure in server
> org="capstan-PROJ"
> omtprj_dname="PROJ_VERSION_files"
> team="translators"
> cd /path/to/version/omegat_project_dir
> gh repo create $org/$omtprj_dname --private
--clone --team $team
> # add repository mappings, mask files out of scope
> git add .
> git commit -m "Initial commit -- creating omegat
team project repo"
> git push --set-upstream origin master
Create each version's repo
PROJECT ├── dictionary ├── glossary ├── omegat │ ├── filters.xml │ └── filter@config.frpm ├── omegat.project ├── source │ └── file.txt ├── target └── tm └── file.tmx
common for all
language versions
repository mapping
common files repo
PROJECT ├── dictionary ├── glossary ├── omegat │ ├── filters.xml │ └── filter@config.frpm ├── omegat.project ├── source │ └── file.txt ├── target └── tm └── file.tmx
OmegaT project
PROJECT ├── dictionary ├── glossary ├── omegat │ ├── filters.xml │ └── filter@configuration.frpm ├── omegat.project ├── source │ └── file.txt ├── target └── tm └── file.tmx
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<omegat>
<project version="1.0">
<source_dir>__DEFAULT__</source_dir>
<source_dir_excludes>
<mask>**/.svn/**</mask>
<mask>**/.git/**</mask>
<mask>**/.hg/**</mask>
<mask>**/.repositories/**</mask>
<mask>**/Thumbs.db</mask>
<mask>**/.DS_Store</mask>
<mask>**/~$*</mask>
</source_dir_excludes>
<target_dir>__DEFAULT__</target_dir>
<tm_dir>__DEFAULT__</tm_dir>
<glossary_dir>__DEFAULT__</glossary_dir>
<glossary_file>__DEFAULT__</glossary_file>
<dictionary_dir>__DEFAULT__</dictionary_dir>
<source_lang>en</source_lang>
<target_lang>bg-BG</target_lang>
<source_tok>org.omegat.tokenizer.LuceneEnglishTokenizer</source_tok>
<target_tok>org.omegat.tokenizer.LuceneBulgarianTokenizer</target_tok>
<sentence_seg>true</sentence_seg>
<support_default_translations>true</support_default_translations>
<remove_tags>false</remove_tags>
</project>
</omegat>
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<omegat>
<project version="1.0">
<source_dir>__DEFAULT__</source_dir>
<source_dir_excludes>
<mask>**/.svn/**</mask>
<mask>**/.git/**</mask>
<mask>**/.hg/**</mask>
<mask>**/.repositories/**</mask>
<mask>**/Thumbs.db</mask>
<mask>**/.DS_Store</mask>
<mask>**/~$*</mask>
</source_dir_excludes>
<target_dir>__DEFAULT__</target_dir>
<tm_dir>__DEFAULT__</tm_dir>
<glossary_dir>__DEFAULT__</glossary_dir>
<glossary_file>__DEFAULT__</glossary_file>
<dictionary_dir>__DEFAULT__</dictionary_dir>
<source_lang>en</source_lang>
<target_lang>bg-BG</target_lang>
<source_tok>org.omegat.tokenizer.LuceneEnglishTokenizer</source_tok>
<target_tok>org.omegat.tokenizer.LuceneBulgarianTokenizer</target_tok>
<sentence_seg>true</sentence_seg>
<support_default_translations>true</support_default_translations>
<remove_tags>false</remove_tags>
<repositories>
<repository type="git" url="https://github.com/capstanlqc-delta/Delta_common_files.git">
<mapping local="source" repository="files"/>
<mapping local="omegat/okf_json@delta.fprm" repository="settings/okf_json@delta.fprm"/>
<mapping local="omegat/filters.xml" repository="settings/filters.xml"/>
</repository>
</repositories>
</project>
</omegat>
Repository mappings
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<omegat>
<project version="1.0">
<source_dir>__DEFAULT__</source_dir>
<source_dir_excludes>
<mask>**/.svn/**</mask>
<mask>**/.git/**</mask>
<mask>**/.hg/**</mask>
<mask>**/.repositories/**</mask>
<mask>**/Thumbs.db</mask>
<mask>**/.DS_Store</mask>
<mask>**/~$*</mask>
</source_dir_excludes>
<target_dir>__DEFAULT__</target_dir>
<tm_dir>__DEFAULT__</tm_dir>
<glossary_dir>__DEFAULT__</glossary_dir>
<glossary_file>__DEFAULT__</glossary_file>
<dictionary_dir>__DEFAULT__</dictionary_dir>
<source_lang>en</source_lang>
<target_lang>bg-BG</target_lang>
<source_tok>org.omegat.tokenizer.LuceneEnglishTokenizer</source_tok>
<target_tok>org.omegat.tokenizer.LuceneBulgarianTokenizer</target_tok>
<sentence_seg>true</sentence_seg>
<support_default_translations>true</support_default_translations>
<remove_tags>false</remove_tags>
<repositories>
<repository type="git" url="https://github.com/capstanlqc-delta/Delta_bul-BGR_OMT.git">
<mapping local="/" repository="/"/>
</repository>
<repository type="git" url="https://github.com/capstanlqc-delta/Delta_common_files.git">
<mapping local="source" repository="files"/>
<mapping local="omegat/okf_json@delta.fprm" repository="settings/okf_json@delta.fprm"/>
<mapping local="omegat/filters.xml" repository="settings/filters.xml"/>
</repository>
</repositories>
</project>
</omegat>
Repository mappings
Version specs
Config
Application modules
- Initiation (common for all versions)
- Create repositories for each version
- Harvesting translations
3. Harvest translations
- For each version:
- Clone repo or fetch files from repo
- Check if target files have been committed, & if so:
- Run OmegaT on the project to get latest version of target files and get word counts
- Check if all segments have been translated, & if so:
- Post-process target files: extract translation from target JSON and put it in original Excel format
- Put JSON and Excel in the PM folder (e.g. 40_Jobs > 2022_AUG01 > 02_Target > [VERSION] > 20220819-151217)
- Put JSON and Excel in the deliverables folder (e.g. 80_Deliverables > 2022_AUG01 > [VERSION])
- Remove target files from repository
50_Repos/ ├── 01_Common │ └── Delta_common_files ├── 02_Versions │ ├── Delta_amh-ETH_OMT │ ├── Delta_ara-ZZZ_OMT │ ├── Delta_bul-BGR_OMT │ └── _tech ├── 03_Harvest │ ├── Delta_amh-ETH_OMT │ ├── Delta_ara-ZZZ_OMT │ └── Delta_bul-BGR_OMT └── repo_urls.txt
Folder structure in server
40_Jobs ├── 2022_AUG01
│ ├── 00_Admin
│ ├── 01_Source
│ ├── 02_Target
│ └── 03_Review
50_Repos ├── 01_Common ├── 02_Versions
├── 03_Harvest │ ├── Delta_amh-ETH_OMT │ ├── Delta_ara-ZZZ_OMT │ └── Delta_bul-BGR_OMT └── 80_Deliverables
Folder structure in server
target
JSON files
done
XLS files
~/02_Clients/[CLIENT]/01_PROJECTS/[PROJECT]/01_Translation$ tree -L 1
.
├── 00_Admin
├── 10_History
├── 20_Automation
├── 30_Incoming
├── 40_Jobs
├── 50_Repos
├── 80_Deliverables
└── 90_Assets
output
input
Folder structure in server
> org="capstan-PROJ"
> omtprj_dname="PROJ_VERSION_files"
> team="translators"
> cd /path/to/harvest/folder
# if never cloned:
> gh repo clone $org/$omtprj_dname
# if already cloned:
> cd /path/to/harvest/folder/$omtprj_dname
> git fetch --all
> git reset --hard origin/master
Pull version's target files
@todo
Automated notifications:
- To each translator, when a new batch is available
- To the PM, when a translator commits files but not all segments are translated
- ... what else??
Automated comment handling:
- TBD with PM .....................................................................
Automated access management:
- Revoke rights from translators before revision starts
- Grant rights to revisers before revision starts
PM's manual actions
- Prepare the config file and version specs file (once)
- Share URL of the file drop area with client (once)
- Share URL of the file retrieval area with client (once)
- Book translators (one per version + backup?)
- Send instructions and repo URLs to translators (once)
- Review deliverables (per job, per version)
- If necessary, ask linguists to make changes and redeliver
- Notify client that they can fetch all deliverables (per job)
- Optional: Delete job folder from 01_Common after delivery
In other words: no file uploads/downloads
In-house review
- PM reviews files in 40_Jobs > etc.
- PM asks for changes
- Translator makes changes and commits target files again
- Translations are harvested, files are post-processed again
Work in progress
Outsourced revision
Work in progress
Three options:
- PM grants simultaneous access to translators and revisers to the same repository,
- PM notifies revisers when they can start
- PM trusts translators that they will refrain from making changes after they have committed their work
- PM grants consecutive access to translators then to revisers, for each version
- Safer but requires more manual work
- Two repositories (one for translation, one for revision) accessible by different teams
Delivery to client
- PM notifies the client that they can fetch all target files for all versions
* bash 5.1
* omegat 5.7
* github
* python 3.8
* nextcloud
Questions?
An OmegaT-based TMS
By cApStAn LQC
An OmegaT-based TMS
An OmegaT-based TMS
- 164