Understanding bidirectionality in OmegaT

Some

Character

Permanent

 

Letter ع - U+0639
Letter ه - U+0647

 

 

 

[Hola!]
[أهلاً!‏]

Glyph

Context-dependent, font dependent.... examples

 

ه ههه

U+0647 U+0647U+0647U+0647

ع ععع

U+0647 U+0647U+0647U+0647

How

bidirectionality works

Cursor

Arabic or Hebrew are bidirectional languages

Logical order

We should not try to fix the visual appearance by changing the order of the text.

Figures, mathematical expressions, foreign names in Latin script, etc. are LTR blocks

The cursor shows the directionality, either through its movement (Word) or the flag (OmegaT)

Right-to-left

Left-to-right

Cursor

Arabic or Hebrew are bidirectional languages, but the actual text always runs RTL.

Logical order

We should not try to tweak the visual appearance by changing the order of the text.

Figures, mathematical expressions, foreign names in Latin script, etc. are LTR blocks

The cursor shows the directionality, either through its movement (Word) or the flag (OmegaT)

Right-to-left

Left-to-right

Levels and
degrees of directionality

 

Handling tags
in bidirectional segments

Using Unicode bidirectionality control characters

Examples:
embeddings

ara
formulas
https://recordit.co/0qIfYjW9Q2

Recommendations for formatting tags

In Math texts, don't use them, Insert them at the end after a linebreak (e.g. XYZ_ara-ARE)

Examples:
scopes and styles

https://recordit.co/qeoJ4en3Iv

We can only intervene at the segment level, not above.

Measurements

easily (see https://vimeo.com/387945710 and https://vimeo.com/387943549 - they will hopefully help you understand what to do in these cases). And this quick summaries: https://recordit.co/BX04TRcIuH and https://recordit.co/JV7fJkHR7G.

mail: Re[4]: PISA2021 FT COG  Math Batch 1 + XYZ [ara-ARE] Final Review

Font issues

Font issues

The target text will inherit the font settings of the source text, OmegaT does not modify the font.

Not all fonts can render all characters in all styles in all languages.

it can be a problem in the translation, or a technical glitch that messes directionality.

Watch out for double punctuation symbols (parenthesis, brackets, etc.)

The user wrote literally (but in Arabic):
"Code) 0 or (00"

Characters with neutral directionality might change their position and their glyph (representation).

References

  • https://www.w3.org/International/articles/inline-bidi-markup/uba-basics
  • http://www.i18nguy.com/markup/right-to-left.html
  • https://r12a.github.io/scripts/tutorial/part4#bidi