PISA 2025

OmegaT team projects

* omegat
* version 5.7.2

* team projects

* infrastructure

Some key concepts

  • workflow: sequence of steps

  • step: each point in the workflow  were a task can be done

  • project: the omegat project that is used to produce or edit translations at a specific step -- one project per step

  • git repository: the location online where the omegat project is hosted

  • batch: a collection of files that are handled together at each step and move together through the workflow (it's a folder)

  • file: each translation unit is released as a file, and there are several files in each batch.

Some key concepts

  • pull: action to obtain files from a remote location

  • commit/push: action to add/post files to a remote location

  • key: a unique alphanumeric identifier assigned to each label (hash based on unit, string and position in file)
  • label: each element that contains a string of text used in the instrument that is extracted for translation
  • segment: each fragment in which text is split for translation in OmegaT, which normally corresponds to a sentence
<label key="646ffe1bc8e8f4.28201432_7c1a9ea132510c11eaac895940e82f28_2">
  <text><em>Tyrannosaurus rex</em> (<em>T. rex</em>) was a type of large carnivorous dinosaur.  Our knowledge of <em>T. rex</em> comes from fossils.</text>
</label>

New in PISA 2025: general

  • standard locale codes

    • users are not aware of the big improvement that this is, but the impact on technical tasks is huge

  • monolingual XML files, no bilingual XLIFF

    • this makes some things simpler (text extraction, match logic in omegat, key-binding rather than text-binding, etc.)

  • better file preparation (e.g. segmentation) in most cases

  • repository mappings

    • e.g. users always work on the released source version

  • storage of files and projects in git repositories

    • data is permanently saved in ACER's repos

    • secured transfer of files through git authentication

  • important enhancements and bug fixes in OmegaT

    • custom build 5.7.2 -- based on work done over the last 5y

    • ongoing, it keeps improving

OmegaT project

  • It's a folder, containing:

    • a settings file (omegat.project),

    • some subfolders, containing

      • some more files

  • It can be packed for transfer purposes, but it must be unpacked for OmegaT to open it.

  • OmegaT can use URLs to fetch files dynamically when it opens a project.

project contents (offline)

  • All files are included in the project.

project contents (git)

  • Some files are included in the project when the project is created:

    • working TM

  • Some files are pulled from the common repository

    • source files

    • config files

    • some TMs

  • Some files are added to the project when the moment arrives:

    • prev/batch: when a batch is moved forward from the previous step

Team projects are online

The project is downloaded directly from OmegaT

OmegaT will sync translations directly with the repo

New in PISA 2025: projects

  • Previous cycles:

    • projects were a self-contained collection of files and folders

    • projects had to be packed and uploaded

    • projects had to be downloaded and unpacked

    • there was no authentication

  • This cycle:

    • projects are hosted in a git repository

    • re-created in the machine of the user when the user downloads or opens the project

    • translations are constantly sync'ed with the repository and with other terminals/cilents

New in PISA 2025: repos

  • ACER has git repositories on AWS for different things:
    • One repository for each OmegaT project
      • OmegaT pulls (=downloads) files from it and pushes updates in working TM and target files
    • One common repository for all common files
      • source files, OmegaT configuration, language assets
      • all OmegaT projects can have links to those and OmegaT pulls the relevant files when the project loads
    • One final repository for the final versions of translated files

New in PISA 2025: preview

  • Previous cycles:

    • users generated target files locally and uploaded them one by one to the portal

    • preview looked for source text in the file and, if found, used the target text paired with it

  • This cycle:

    • users commit target files from OmegaT

    • preview looks for key in the file, if found, uses the text associated with it

New in PISA 2025: diff

  • Previous cycles:

    • I don't know how ETS' diff tool worked, but it was deficient

    • xDiff was used at some point
  • This cycle:

    • the diff tool compares between target files committed at two workflow steps

    • target files might not (!) necessarily reflect the latest version in the omegat project at that step

New in PISA 2025: fr-ZZ

  • Previous cycles:

    • there were two source versions

    • it was challenging to keep them in sync

    • it was problematic to have interchangeable TMs

  • Current cycle:

    • English is the only single source version

    • fr-ZZ is a target-language version

    • fr-ZZ is added to projects as a reference (so called second-source)

  • Workflows are a sequence of tasks that happen along a chain of steps

    • Remember: there's one OmegaT project at each step
  • Batches travel through the workflow:
    • Each batch is a folder in the common repo that contains source files which contains the source text (labels in English)
    • "traveling" here means:
      • when the task can start at a certain step, adding a repository mapping to a batch folder to the OmegaT project at that step
        • a TM with the translations of that batch is also added to this step (for reconciliation or editing)
      • when the task is completed, removing that repository mapping to a batch folder from the OmegaT project at that step

New in PISA 2025: workflows

Workflows

  • Previous cycles:

    • one single omegat project for each batch

    • the whole project traveled through the workflow, from step/user to step/user

    • any changes in the target version traveled with the project
  • Current cycle:

    • omegat projects do not travel, it's the batch folders that travel through steps/projects/users
    • translations of a batch from other steps/workflows travel with that batch but are independent files

New in PISA 2025: workflows

New in PISA 2025: workflows

project = batch

Previous cycles:

project = step

Current cycle:

2.3. batch1 released for reconciliation

When the reconciler re-opens the reconciliation project, it will now look like this:

   
 
 
 
 
 
 
 
 
 

├── source
│   └── batch1
│       ├── unit_A.xml
│       └── unit_B.xml


 
 
 
 
 
 
 
 
 

└── tm
    └── prev
        ├── T1
        │   └── batch1.tmx
        └── T2
            └── batch1.tmx

Match sorting revisited

  • Match sorting was a bit erratic in previous versions (4.x). Not it's much more predictable.

 

  • Criteria:
    • similarity score
    • auto-population tier
    • context binding
    • position of the file in the list of files
    • position of the match in the file

1. Similarity score

2. Auto-population tier

3. Context-binding

4. Path and position of file

5. Position of match in the TM

Workflow transitions

Given consecutive steps M and N in a workflow,

  • when the task on batch1 is completed at step M,
    • batch1 is removed from step M
    • batch1 can be added to step N (it is added or queued)
    • batch1 TM with step M translations is added to step N
    • batch2 can be added to step M (it is added or queued)
  • when the task on batch2 is completed at step M,
    • batch2 is removed from step M
    • batch2 can be added to step N (it is added or queued)
    • batch2 TM with step M translations is added to step N
    • batch3 can be added to step M (it is added or queued)...

demo

An alternative translation must be created in any segment which was translated differently elsewhere in the project, even if that was in another batch.

Workflow transitions

Advantage:

  • changes made by previous user (e.g. reconciler) to an existing translation in subsequent batches do not change next user (e.g. verifier) decisions

Disadvantage:

  • changes made by previous user (e.g. verifier) to an existing translation in subsequent batches do not change next user (e.g. reviewer) decisions

New criteria: recency

Short-term solution:

  • script that runs on project load:
    • compares all matches in TMs with the current translation of each segment
    • uses the most recent one as the new translation
    • updates the timestamp and aurthorship of the transaltino to the current user

Long-term solution:

  • new functionality in OmegaT:
    • /tm/update, matches from which will be enforced (but not locked) if they are more recent than the current translation

base TM precedence by step

tm
├── auto
│   └── trend
│       ├── PISA_es-AR_MAT_MS2022.tmx.zip
│       ├── PISA_es-AR_REA_MS2022.tmx.zip
│       ├── PISA_es-AR_SCI_MS2022.tmx.zip
│       ├── PISA_es-AR_UI_MS2022.tmx.zip
│       └── PISA_es-AR_XYZ_MS2022.tmx.zip
├── base
│   ├── 01_COS_SCI-A_N_es-CL.tmx
│   ├── 02_COS_SCI-B_N_es-CL.tmx
│   └── 03_COS_SCI-C_N_es-CL.tmx
└── enforce
    └── trend
        ├── PISA_es-AR_ICQ_MS2022.tmx.zip
        ├── PISA_es-AR_SCQ_MS2022.tmx.zip
        └── PISA_es-AR_STQ_MS2022.tmx.zip
es-AR@adaptation
 
tm
├── auto
│   ├── base
│   │   ├── 01_COS_SCI-A_N_es-CL.tmx
│   │   ├── 02_COS_SCI-B_N_es-CL.tmx
│   │   └── 03_COS_SCI-C_N_es-CL.tmx
│   └── trend
│       ├── PISA_es-AR_MAT_MS2022.tmx.zip
│       ├── PISA_es-AR_REA_MS2022.tmx.zip
│       ├── PISA_es-AR_SCI_MS2022.tmx.zip
│       ├── PISA_es-AR_UI_MS2022.tmx.zip
│       └── PISA_es-AR_XYZ_MS2022.tmx.zip
└── enforce
    └── trend
        ├── PISA_es-AR_ICQ_MS2022.tmx.zip
        ├── PISA_es-AR_SCQ_MS2022.tmx.zip
        └── PISA_es-AR_STQ_MS2022.tmx.zip

es-AR@verification
 

Questions?