Cater for the next 1B users

with i18n and l10n

Tech talk 10/7/14

Pavel Bosin

Agenda

  • What are i18n and l10n?
  • Is it important?
  • What should be "globalized" in web apps?
  • Server side tools
  • Client side tools
  • Getting user's locale

What are i18n & L10n?

"i18n or Internationalization

is the process of designing a software application so that it can potentially be adapted to various languages and regions without engineering changes."

wikipedia

L10n or Localization

is the process of of adapting internationalized software for a specific region or language by adding locale-specific components and translating text."

wikipedia

Localization (which is potentially performed multiple times, for different locales) uses the infrastructure or flexibility provided by internationalization (which is ideally performed only once, or as an integral part of ongoing development).

The terms are frequently abbreviated to the numeronyms i18n (where 18 stands for the number of letters between the first i and last n in internationalization) and L10n respectively, due to the length of the words.

e.g. C5s.com

Importance?

Is English Still the Language of the Internet?

CLDR languages

What to consider during i18n & l10n

  • Language Translation Issues
    • Words length
      • fifty six => sechsundfünfzig
      • shut-off => abshcaltung
    • Words order and words number
      • we will deliver tomorrow => завтра доставим
      •  
  • Gender related issues 
    • <p>Welcome, {{ username }}</p>

      In spanish, one for each gender:

      <p>Bienvenido, {{ username }}</p>

      <p>Bienvenida, {{ username }}</p>

  • Capitalization, upper- and lower-casing

    • e.g. days of the week in English vs Russian
  • Sorting
    • can't be done at DB level
  • Quotes (", ', « »)
  • Fonts
    • Tofu characters
    • Some css becomes invalid in non-English
    • Font size may be way off in Chinese, etc.
  • Text direction LTR, RTL, Bidi
    • may require a different template layout
    • combination:
  • Cultural Issues
    • Personal names
      • How many
      • Gender dependent
    • Formal vs. informal
      • You => вы, ты
      • multiple names
    • Product names
      • be careful designing names and labels
    • Slang and specific cultural references
    • International laws
    • Images (cultural limitations)
      • check mark in Japanese 
  • UI Components Issues
    • Word wrap in messages, tooltips, non-table grids.
      • Description        Number of Offers      Value
        Savings                                3                       15
        Specials                               2                        22


        Description        Number of        Value 
        Savings                    Offers                 15
        Specials                           3                   22
                                                  2                

    • Date & time pickers
      • weekday and month names, week start
      • time zones (names, rules, UTC offsets, format) 
  • Pagination
    • in the Japanese translation of "Page 1 of 34" all elements in the phrase would be in reverse order 
  • Layers
  • Keyboard shortcuts
  • Inputs:
    • Masked input fields
    • Text input length
    • Input validation
  • Formatting Issues
    • Dates & Time
      • date format (order, delimiters, calendar)
      • time format (12 vs. 24 hrs, delimiters)
    • Numbers
      • decimal separator, digit grouping (comma may have different meaning)
    • Currency
      • symbol, position, fractions
    • Units of measure
      • can't we all go metric?!!
  • Formatting addresses
    • order of 'lines'
    • order of the fields
    • capitalization of fields
    • name (e.g. prefix, F, L, M, suffix)
    • department, organization
    • street (line 1, 2, ...)
    • city, (town, village, etc.), 
    • region , state, province
    • zip postal code
    • country
    • 121069 Москва, Бол.Ржевский 8 - 6 Босину П.А.
  • Phone numbers
    • NANP: 10 digits with area code
      • (NPA)  NXX-XXXX
      • 1-NPA-NXX-XXXX
    • Europe has many different formats in groups of 2, 3, 4 digits
      • France: A BB BB BB BB
      • Germany has 8 formats
    • ​India mobile: AAAAA-BBBBB
    • China variable lengths, e.g. (0XXX) YYYY YYYY
  • Printing
    • Paper size
    • Orientation
  • Legal Issues
    • Privacy laws
    • Disclaimers
    • Package labels
    • Usage of encryption
    • Accessibility
    • Taxes
    • Sensitivity to border disputes
  • Getting User locale Issues
    • "Unfortunately, there is no 100% reliable way in the browser of getting the user's locale information - unless you ask the user explicitly.” @marcosc
    • var date = new Date(); date.toLocaleDateString();

    • There are some unreliable hacks, like querying navigator.language in Chrome and Firefox, or navigator.browser Language in IE, or looking at the HTTP Accept-Language header using XHR.

      So, just ask if you need to. Or provide a way for the user to select their locale preferences.

Development Practices

  • Use UTF8 
  • Understand i18n support for your dev language
    • Java: good
    • PHP: poor
    • Python: good
    • Javascript: poor
  • Research i18n tools provided by your framework
  • Use i18n library / plugin; don't write your own
  • Externalize strings
    • full phrases, avoid string concatenation
    • minimize slang and abbreviations
    • avoid leading and trailing spaces

i18n Coding Layers

  • Data layer
    • DB
    • Config files
  • Data API Layer ("back end")
    • Messages from API
    • Serving data as strings
  • Front end server side
  • Front end client side

Front End Server Side Tools Python Django Example

Django has full support for translation of textformatting of dates, times and numbers, and time zones.

Essentially, Django does two things:

  • It allows developers and template authors to specify which parts of their apps should be translated or formatted for local languages and cultures.
  • It uses these hooks to localize Web apps for particular users according to their preferences.
  • Translated strings for a single language are placed in a plain-text "message file" with .po extension.

Text translation in Django python code

from django.utils.translation import ugettext as _
from django.http import HttpResponse

def my_greeting():
    greeting = _("Welcome to the site.")
    str = _('Today is %(month)s %(day)s.') % {'month': m, 'day': d}

 

Pluralization in Django python code

from django.utils.translation import ungettext

def hello_world(request, count):
    page = ungettext(
        'there is %(count)d object',
        'there are %(count)d objects',
    count) % {
        'count': count,
    }

Use lazy translation

... ungettext_lazy()

Text translation in Django templates

<title>{% trans "This is the title." %}</title>
<title>{% trans myvar %}</title>


{% trans "This is the title" as the_title %}
<title>{{ the_title }}</title>
<meta name="description" content="{{ the_title }}">


{% blocktrans %}This string will have {{ value }} inside.{% endblocktrans %}


{% blocktrans with amount=article.savings %}
Your savings value is $ {{ amount }}.
{% endblocktrans %}


{% blocktrans with amount=article.price count years=i.length %}
That will cost $ {{ amount }} per year.
{% plural %}
That will cost $ {{ amount }} per {{ years }} years.
{% endblocktrans %}

Front End Client Side Tools Python Django Example

Add i18n to url patterns (urls.py)

js_info_dict = { 'domain': 'djangojs',  'packages': ('paiweb',),  }
urlpatterns = patterns('',
   (r'^jsi18n/$', 'django.views.i18n.javascript_catalog', js_info_dict),
)

Include i18n file in html template

<script src="{% url "django.views.i18n.javascript_catalog" %}"></script>

Use gettext(), ngettext(), in JavaScript code

document.write(gettext('this is to be translated'));
var object_cnt = 1 // or 0, or 2, or 3, ...
s = ngettext('literal for the singular case',
        'literal for the plural case', object_cnt);
fmts = ngettext('There is %s object. Remaining: %s',
        'There are %s objects. Remaining: %s', 11);
s = interpolate(fmts, [11, 20]);
// s is 'There are 11 objects. Remaining: 20'

 

Front End Tools

Python Django 

  • Comments to translators
  • Support for rtl
  • Switching language
  • Language prefix in url patterns
  • Admin tools to create language files
  • Using translations outside views and templates
  • Discovering language preference (via middleware)
    1. language prefix in the url
    2. language session key in user session
    3. language cookie
    4. Accept-language HTTP header
    5. Global LANGUAGE_CODE setting

Front End Client Side Tools

AngularJS $locale

It provides localization rules for various Angular components.

It works well for number systems, formatting, grouping and precision as well as decimal marks.

It also does datetime, currency formatting.

Used through the built-in angular filters.

Front End Client Side Tools

ES Internationalization API

Edition 1 provides most of the services that are similar to $locale along with most notably, Collation, in two scenarios: sorting a set of strings and searching within a set of strings. Collation is parameterized by locale and aware of Unicode.

 

Intl object Currently supported in IE11, Chrome and Firefox Nightly Builds controlled via a flag. Compatibility Table

 

Upcoming edition 2 will support some more common use-cases like message formatting: format a string with placeholders, including plural and gender support. However, it will be based on Es6 and will not be backwards compatible anymore.

Front End Client Side Tools

Community-driven projects

angular-translate uses ICU's MessageFormat which uses a different kind of "interpolation", e.g. it uses single curly braces instead of double curly braces, which means one misses all features angular built-in interpolation brings (filters etc).

 

angular-gettext is too much magic, not enough control in the hands of the app developer. also, it bypasses certain useful angularjs functionality like the ngPluralize directive.

 

Front End Client Side Tools

Angular-localization

by Rahul Doshi

  • Message files in json {"no": "No",...}
  • directory structure: lang->locale->view
  • Lazy loading in SPA using views
  • Main methods for strings, pluralization, gender
  • Use also in angular filters {{ 'page.msg' | i18n }}
  • Use a Directive <p data-i18n="page.msg"></p>

Development Checklists

  • Microsoft:
    • http://msdn.microsoft.com/en-us/library/cc194756.aspx
  • Community
    • http://www.emreakkas.com/internationalization/10-internationalization-tips-for-developers-i18n-checklist

Testing i18n & L10n

  • Pseudolocalization:
    • Instead of translating the text of the software into a foreign language, as in the process of l10n, the textual elements of an application are replaced with an altered version of the original language.
    • Example:

      Account Settings [!!! Àççôûñţ Šéţţîñĝš !!!]

  • Pseudo-translation.

    • There are utilities for this.

    • Example: “Bad Command” can be translated in Japanese as [JA XXXXX Bad Command XXXXXX JA]. 

  • ​Testing L10n is mostly manual.

    By person fluent in the given language.

  • White box testing:

    • Starts with looking for all strings in the code at all levels

Q & A

These slides are available at

http://slides.com/pbosin/g11n/

g11n

By pbosin

g11n

Tech talk

  • 1,137