PISA 2025

repository mappings
(fr-ZZ only)

Some key concepts

  • workflow: sequence of steps

  • step: each point in the workflow  were a task can be done

  • project: the omegat project that is used to produce or edit translations at a specific step -- one project per step

  • git repository: the location online where the omegat project is hosted

  • batch: a collection of files that are handled together at each step and move together through the workflow (it's a folder)

  • file: each translation unit is released as a file, and there are several files in each batch.

Some key concepts

  • pull: action to obtain files from a remote location

  • commit/push: action to add/post files to a remote location

  • key: a unique alphanumeric identifier assigned to each label (hash based on unit, string and position in file)
  • label: each element that contains a string of text used in the instrument that is extracted for translation
  • segment: each fragment in which text is split for translation in OmegaT, which normally corresponds to a sentence
<label key="646ffe1bc8e8f4.28201432_7c1a9ea132510c11eaac895940e82f28_2">
  <text><em>Tyrannosaurus rex</em> (<em>T. rex</em>) was a type of large carnivorous dinosaur.  Our knowledge of <em>T. rex</em> comes from fossils.</text>
</label>

New in PISA 2025: general

  • standard locale codes

    • users are not aware of the big improvement that this is, but the impact on technical tasks is huge

  • monolingual XML files, no bilingual XLIFF

    • this makes some things simpler (text extraction, match logic in omegat, key-binding rather than text-binding, etc.)

  • better file preparation (e.g. segmentation) in most cases

  • repository mappings

    • e.g. users always work on the released source version

  • storage of files and projects in git repositories

    • data is permanently saved in ACER's repos

    • secured transfer of files through git authentication

  • important enhancements and bug fixes in OmegaT

    • custom build 5.7.2 -- based on work done over the last 5y

    • ongoing, it keeps improving

OmegaT project

  • It's a folder, containing:

    • a settings file (omegat.project),

    • some subfolders, containing

      • some more files

  • It can be packed for transfer purposes, but it must be unpacked for OmegaT to open it.

  • OmegaT can use URLs to fetch files dynamically when it opens a project.

project contents (offline)

  • All files are included in the project.

project contents (git)

  • Some files are included in the project when the project is created:

    • working TM

  • Some files are pulled from the common repository

    • source files

    • config files

    • some TMs

  • Some files are added to the project when the moment arrives:

    • prev/batch: when a batch is moved forward from the previous step

Team projects are online

The project is downloaded directly from OmegaT

OmegaT will sync translations directly with the repo

New in PISA 2025: projects

  • Previous cycles:

    • projects were a self-contained collection of files and folders

    • projects had to be packed and uploaded

    • projects had to be downloaded and unpacked

    • there was no authentication

  • This cycle:

    • projects are hosted in a git repository

    • re-created in the machine of the user when the user downloads or opens the project

    • translations are constantly sync'ed with the repository and with other terminals/cilents

New in PISA 2025: repos

  • ACER has git repositories on AWS for different things:
    • One repository for each OmegaT project
      • OmegaT pulls (=downloads) files from it and pushes updates in working TM and target files
    • One common repository for all common files
      • source files, OmegaT configuration, language assets
      • all OmegaT projects can have links to those and OmegaT pulls the relevant files when the project loads
    • One final repository for the final versions of translated files

New in PISA 2025: preview

  • Previous cycles:

    • users generated target files locally and uploaded them one by one to the portal

    • preview looked for source text in the file and, if found, used the target text paired with it

  • This cycle:

    • users commit target files from OmegaT

    • preview looks for key in the file, if found, uses the text associated with it

New in PISA 2025: diff

  • Previous cycles:

    • I don't know how ETS' diff tool worked, but it was deficient

    • xDiff was used at some point
  • This cycle:

    • the diff tool compares between target files committed at two workflow steps

    • target files might not (!) necessarily reflect the latest version in the omegat project at that step

New in PISA 2025: fr-ZZ

  • Previous cycles:

    • there were two source versions

    • it was challenging to keep them in sync

    • it was problematic to have interchangeable TMs

  • Current cycle:

    • English is the only single source version

    • fr-ZZ is a target-language version

    • fr-ZZ is added to projects as a reference (so called second-source)

TRA and ADA workflows

  • Workflows are a sequence of tasks that happen along a chain of steps

    • Remember: there's one OmegaT project at each step
  • Batches travel through the workflow:
    • Each batch is a folder in the common repo that contains source files which contains the source text (labels in English)
    • "traveling" here means:
      • when the task can start at a certain step, adding a repository mapping to a batch folder to the OmegaT project at that step
        • a TM with the translations of that batch is also added to this step (for reconciliation or editing)
      • when the task is completed, removing that repository mapping to a batch folder from the OmegaT project at that step

Workflows

  • Previous cycles:

    • one single omegat project for each batch

    • the whole project traveled through the workflow, from step/user to step/user

    • any changes in the target version traveled with the project
  • Current cycle:

    • omegat projects do not travel, it's the batch folders that travel through steps/projects/users
    • translations of a batch from other steps/workflows travel with that batch but are independent files

New in PISA 2025: workflows

New in PISA 2025: workflows

project = batch

Previous cycles:

project = step

Current cycle:

0: Initial state

  • All OmegaT projects are empty:

     
     
     
     
     
     
     
     
     
    
    
    >  tree -L 2
    .
    ├── mapped
    ├── omegat
    │   ├── project_save.tmx
    │   └── step.txt
    ├── omegat.project
    ├── source
    ├── target
    └── tm
    

0: Initial state

The initial project settings for all steps point to batch zero (empty) in pisa_2025ft_translation_common/source/ft/, which means no files are available at that step.

 
 
<repository type="git" url="&DOMAIN;/pisa_2025ft_translation_common.git">
    <!-- source files -->
    <mapping local="source/" repository="source/ft/batch0_empty/" />
    <!-- .... -->
</repository>

0: Initial state

Specifically the project settings of the reconciliation step will get translations from the mapped folders of the translation steps (which for the time being are empty) into /tm/rec/T?:

   
<!-- double translation -->
<repository type="git" url="&DOMAIN;/pisa_2025ft_translation_fr-ZZ_translation1.git">
    <mapping local="tm/rec/T1/" repository="mapped/"/>
</repository>
<repository type="git" url="&DOMAIN;/pisa_2025ft_translation_fr-ZZ_translation2.git">
    <mapping local="tm/rec/T2/" repository="mapped"/>
</repository>

0: Initial state

At all other steps after reconciliation, project settings will get translations from the mapped folder of the previous steps (which for the time being is empty) into /tm/auto/<previous-step>/:

   
<repository type="git" url="&DOMAIN;/pisa_2025ft_translation_fr-ZZ_reconciliation.git">
    <!-- previous step -->
    <mapping local="tm/auto/<previous-step>/" repository="mapped/" />
</repository>  

1: batch1 released at T1

Translation steps' project settings are updated so that the /source folder maps from the batch1 folder in pisa_2025ft_translation_common/source/ft/, which means that batch1 is now available for translation.

<repository type="git" url="&DOMAIN;/pisa_2025ft_translation_common.git">
    <mapping local="source/batch1" repository="source/ft/batch1/" />
</repository>

A notification is sent to translators to download their project. Batch batch1 will be downloaded into the /source/batch1 folder of the translation projects.

1: batch1 released at T1

When translator1 downloads the project, it will look like this:

 
 
 
 
 
 
 
 
 

.
├── omegat
│   ├── project_save.tmx
│   └── step.txt
├── omegat.project
├── source
│   └── batch1
│       ├── unit_A.xml
│       └── unit_B.xml
├── target
└── tm

2: batch1 translation# done

When translator1 is done translating batch1, they must:

  1. close the translation1 project,
  2. finalize batch1 task in the workflow service

IFF all segments are translated in the batch (*), finalizing the task will trigger the following actions:

  1. batch1 translations are ready for reconciliation step
  2. batch1 is released at reconciliation step
  3. batch2 is released at translation1 step

The same applies for the translation2 step.

2.1. batch1 available for reconciliation

  • The working TM (file /omegat/project_save.tmx) of the translation1 project is copied to /tm/prev/T1/batch1.tmx of the reconciliation step.
  • The working TM (file /omegat/project_save.tmx) of the translation2 project is copied to /tm/prev/T2/batch1.tmx of the reconciliation step.
 
 
 
 
 
 
 
 
 

tm
└── prev
    ├── T1/batch1.tmx
    └── T2/batch1.tmx

2.2. batch2 released for translation1

The translation1 step's project settings are updated so that the /source folder maps from the next batch (batch2) folder in pisa_2025ft_translation_common/source/ft/, which means that batch2 is now available for translation.

<repository type="git" url="&DOMAIN;/pisa_2025ft_translation_common.git">
    <mapping local="source/batch2" repository="source/ft/batch2/" />
</repository>
  • When the user closes the project, a script will delete any source files in the project (i.e. batch1 files).
  • When translator1 re-opens the translation1 project, only files from batch2 will be available for translation.

When translator1 re-opens the translation1 project, it will now look like this:

 
 
 
 
 
 
 
 
 

├── mapped
│   └── batch1.tmx
├── omegat
│   ├── project_save.tmx
│   └── step.txt
├── omegat.project
├── source
│   └── batch2
│       ├── unit_C.xml
│       └── unit_D.xml
├── target
└── tm

2.2. batch2 released for translation1

When batch1 has been translated in both translation1 and translation2 steps, the reconciliation step's project settings are updated so that the /source folder maps from the batch1 folder in pisa_2025ft_translation_common/source/ft/, which means that batch1 is now available for reconciliation.

2.3. batch1 released for reconciliation

<repository type="git" url="&DOMAIN;/pisa_2025ft_translation_common.git">
    <!-- source files -->
    <mapping local="source/batch1" repository="source/ft/batch1/" />
    <!-- .... -->
</repository>

A notification is sent to the reconciler to download the reconciliation project.

2.3. batch1 released for reconciliation

When the reconciler re-opens the reconciliation project, it will now look like this:

   
 
 
 
 
 
 
 
 
 

├── source
│   └── batch1
│       ├── unit_A.xml
│       └── unit_B.xml


 
 
 
 
 
 
 
 
 

└── tm
    └── prev
        ├── T1
        │   └── batch1.tmx
        └── T2
            └── batch1.tmx

3: batch1 reconciliation done

When reconciler is done reconciling batch1, they must:

  1. close the reconciliation project,
  2. finalize batch1 task in the workflow service, and then
  3. open the reconciliation project again.

Iff all segments have a (reconciled) translation in the batch, finalizing the task will trigger the following actions::

  1. batch1 translations are ready for verification step
  2. batch2 is released at reconciliation step
  3. batch1 is released at verification step

3.1. batch1 available for verification

  • The working TM (file /omegat/project_save.tmx) of the translation1 project is copied to /mapped/batch1.tmx.
  • Files /omegat/project_save.tmx and /mapped/batch1.tmx are at this point identical in the reconciliation project.
  • The /mapped/batch1.tmx file could now be downloaded from the verification step.

3.2. batch2 released for reconciliation

The reconciliation step's project settings are updated so that the /source folder maps from the next batch (batch2) folder in pisa_2025ft_translation_common/source/ft/, which means that batch2 is now available for reconciliation.

<repository type="git" url="&DOMAIN;/pisa_2025ft_translation_common.git">
    <mapping local="source/batch2" repository="source/ft/batch2/" />
</repository>
  • When the reconciler closes the project, a script will delete any source files in the project (i.e. batch1 files).
  • When the reconciler re-opens the translation1 project, only files from batch2 will be available for translation.

When the reconciler re-opens the reconciliation project, it will now look like this:

 
 
 
 
 
 
 
 
 

├── mapped
│   └── batch1.tmx
├── omegat
│   ├── project_save.tmx
│   └── step.txt
├── omegat.project
├── source
│   └── batch2
│       ├── unit_C.xml
│       └── unit_D.xml
├── target
└── tm

3.2. batch2 released for reconciliation

The verification step's project settings are updated so that the /source folder maps from the batch1 folder in pisa_2025ft_translation_common/source/ft/, which means that batch1 is now available for verification.

3.3. batch1 released for verification

<repository type="git" url="&DOMAIN;/pisa_2025ft_translation_common.git">
    <!-- source files -->
    <mapping local="source/batch1" repository="source/ft/batch1/" />
    <!-- .... -->
</repository>

A notification is sent to the verifier to download the verification project.

3.3. batch1 released for reconciliation

When the verifier opens the verification project, it will now look like this:

   
 
 
 
 
 
 
 
 
 

├── mapped
├── omegat
│   ├── project_save.tmx
│   └── step.txt
├── omegat.project
├── source
│   └── batch1
│       ├── unit_A.xml
│       └── unit_B.xml
├── target

 
 
 
 
 
 
 
 
 

├── mapped
└── tm
    └── auto
        └── reconciliation
            └── batch1.tmx

4: batch2 translation# done

When translator1 is done translating batch2, they must:

  1. close the translation1 project,
  2. finalize batch2 task in the workflow service, and then
  3. open the translation1 project again.

IFF all segments are translated in the batch, finalizing the task will trigger the following actions:

  1. batch2 translations are ready for reconciliation step
  2. batch3 is released at translation1 step
  3. batch2 is released at reconciliation step

The same applies for the translation2 step.

4.1. batch2 available for reconciliation

  • The working TM (file /omegat/project_save.tmx) of the translation1 project is copied to /mapped/batch2.tmx.
  • Files /omegat/project_save.tmx and /mapped/batch2.tmx are at this point identical in the translation1 project.
  • The /mapped/batch2.tmx file could now be downloaded from the reconciliation step.

4.2. batch3 released for translation1

The translation1 step's project settings are updated so that the /source folder maps from the next batch (batch3) folder in pisa_2025ft_translation_common/source/ft/, which means that batch3 is now available for translation.

<repository type="git" url="&DOMAIN;/pisa_2025ft_translation_common.git">
    <mapping local="source/batch3" repository="source/ft/batch3/" />
</repository>
  • When the user closes the project, a script will delete any source files in the project (i.e. batch2 files).
  • When translator1 re-opens the translation1 project, only files from batch3 will be available for translation.

When translator1 re-opens the translation1 project, it will now look like this:

 
 
 
 
 
 
 
 
 

├── mapped
│   ├── batch1.tmx
│   └── batch2.tmx
├── omegat
│   ├── project_save.tmx
│   └── step.txt
├── omegat.project
├── source
│   └── batch3
│       ├── unit_E.xml
│       └── unit_F.xml
├── target
└── tm

4.2. batch2 released for translation1

When batch2 has been translated in both translation1 and translation2 steps, the reconciliation step's project settings are updated so that the /source folder maps from the batch2 folder in pisa_2025ft_translation_common/source/ft/, which means that batch2 is now available for reconciliation.

4.3. batch2 released for reconciliation

<repository type="git" url="&DOMAIN;/pisa_2025ft_translation_common.git">
    <!-- source files -->
    <mapping local="source/batch2" repository="source/ft/batch2/" />
    <!-- .... -->
</repository>

4.3. batch2 released for reconciliation

When the reconciler re-opens the reconciliation project, it will now look like this:

   
 
 
 
 
 
 
 
 
 

├── mapped
├── omegat
│   ├── project_save.tmx
│   └── step.txt
├── omegat.project
├── source
│   └── batch2
│       ├── unit_C.xml
│       └── unit_D.xml
├── target

 
 
 
 
 
 
 
 
 

├── mapped
└── tm
    └── rec
        ├── T1
        │   ├── batch1.tmx
        │   └── batch2.tmx
        └── T2
            ├── batch1.tmx        
            └── batch2.tmx

Match sorting revisited

  • Match sorting was a bit erratic in previous versions (4.x). Not it's much more predictable.

 

  • Criteria:
    • similarity score
    • auto-population tier
    • context binding
    • position of the file in the list of files
    • position of the match in the file

Workflow transitions

Given steps A and B in a workflow, when a batch

New criteria: recency

Given steps A and B in a workflow, when a batch

1. Similarity score

2. Auto-population tier

3. Context-binding

4. Path and position of file

5. Position of match in the TM

base TM precedence by step

tm
├── auto
│   └── trend
│       ├── PISA_es-AR_MAT_MS2022.tmx.zip
│       ├── PISA_es-AR_REA_MS2022.tmx.zip
│       ├── PISA_es-AR_SCI_MS2022.tmx.zip
│       ├── PISA_es-AR_UI_MS2022.tmx.zip
│       └── PISA_es-AR_XYZ_MS2022.tmx.zip
├── base
│   ├── 01_COS_SCI-A_N_es-CL.tmx
│   ├── 02_COS_SCI-B_N_es-CL.tmx
│   └── 03_COS_SCI-C_N_es-CL.tmx
└── enforce
    └── trend
        ├── PISA_es-AR_ICQ_MS2022.tmx.zip
        ├── PISA_es-AR_SCQ_MS2022.tmx.zip
        └── PISA_es-AR_STQ_MS2022.tmx.zip
es-AR@adaptation
 
tm
├── auto
│   ├── base
│   │   ├── 01_COS_SCI-A_N_es-CL.tmx
│   │   ├── 02_COS_SCI-B_N_es-CL.tmx
│   │   └── 03_COS_SCI-C_N_es-CL.tmx
│   └── trend
│       ├── PISA_es-AR_MAT_MS2022.tmx.zip
│       ├── PISA_es-AR_REA_MS2022.tmx.zip
│       ├── PISA_es-AR_SCI_MS2022.tmx.zip
│       ├── PISA_es-AR_UI_MS2022.tmx.zip
│       └── PISA_es-AR_XYZ_MS2022.tmx.zip
└── enforce
    └── trend
        ├── PISA_es-AR_ICQ_MS2022.tmx.zip
        ├── PISA_es-AR_SCQ_MS2022.tmx.zip
        └── PISA_es-AR_STQ_MS2022.tmx.zip

es-AR@verification
 

Batch transitions after verification are just like the transition from reconciliation to verification.

And so on and so forth :=)

Questions?

authoring

technical sign-off

A bit of back and forth...

PISA 2025 -- repository mappings (fr-ZZ only)

By cApStAn LQC

PISA 2025 -- repository mappings (fr-ZZ only)

PISA 2025 -- repository mappings (fr-ZZ only)

  • 66