Szymon Teżewski
*odpowiem tylko na te, na które będę umiał
Legenda:
🐍 - arkana Pythona
🎸 - arkana Django
czyli wszystko to, co robimy aby program dało się przystosowywać do różnych grup użytkowników
czyli przystosowanie zinternacjonalizowanego wcześniej programu do konkretnej grupy
np. arabizacja, rusyfikacja
locale aspects
LANGUAGE
LC_ALL
LC_XX (LC_COLLATE, LC_MONETARY, etc.)
LANG
For example, assume you are a Swedish user in Spain, and you want your programs to handle numbers and dates according to Spanish conventions, and only the messages should be in Swedish.
Then you could create a locale named ‘sv_ES’ or ‘sv_ES.UTF-8’ by use of the localedef program.
But it is simpler, and achieves the same effect, to set the LANG variable to es_ES.UTF-8 and the LC_MESSAGES variable to sv_SE.UTF-8; these two locales come already preinstalled with the operating system.
pl_PL.ISO8859-2
wa_BE.iso885915@euro
ca_ES.utf8@valencia
tt_RU.utf8@iqtelif
pl-Cyrl-151-kociewie
ja-Latn-030-hepburn-heploc
sl-Cyrl-155-rozaj-biske-1994
sl-Cyrl-155-nedis-rozaj-biske-lipaw-njiva-osojs-solba-bohoric-dajnko-metelko-1994*
*narusza tylko should, żadnego must, niesprawdzalna poprawność
Unicode Common Locale Data Repository
ogromny zbiór bardzo wielu informacji dotyczących lokalizacji
używa lekko zmodyfikowanego BCP 47 i definiuje rozszerzenia -u i -t
🐍 tego używa Babel
sl-155-rozaj-biske-1994-u-ca-islamic-civil-co-gb2312han-cu-pln-em-default-fw-sun-hc-h11-ka-noignore-kb-false-kc-false-kf-lower-kh-false-kk-false-kn-false-ks-identic-kv-currency-lb-strict-lw-breakall-nu-mathbold-ms-ussystem-ss-standard-tz-gldkshvn-t-googlevk...
część Content Negotiation w HTTP (RFC 7231)
wartości i dopasowania opisane w RFC 4647:
pl-PL,pl;q=0.8,en-US;q=0.6,en;q=0.4
używa BCP 47
In [1]: from django.utils.translation.trans_real import to_locale, to_language
In [2]: to_locale('sr-Latn-RS') # sr_RS.UTF-8@latin ?
Out[2]: u'sr_Latn-rs'
In [3]: to_language('ca_ES.utf8@valencia') # ca-ES-valencia ?
Out[3]: u'ca-es.utf8@valencia'
*w dalszej części dlaczego w ogóle to żenimy
The HTTP Accept-Language header was originally only intended to specify the user's language. However, since many applications need to know the locale of the user, common practice has used Accept-Language to determine this information. It is not a good idea to use the HTTP Accept-Language header alone to determine the locale of the user.
(...)
A language preference of es-MX doesn't necessarily mean that a postal address form should be formatted or validated for Mexican addresses. The user might still live in the USA (or elsewhere).
https://www.w3.org/International/questions/qa-accept-lang-locales
Zazwyczaj robią tak:
URL > wybór użytkownika > Accept-Language
+ gdzieś GeoIP/lokalizacja do walut i czasu
# django/conf/locale/zh_Hans/LC_MESSAGES/django.po
msgid "Select a valid choice. That choice is not one of the available choices."
msgstr "选择一个有效的选项: 该选择不在可用的选项中。"
# przykład z dokumentacji gettext
msgid "One file removed"
msgid_plural "%d files removed"
msgstr[0] "%d slika je uklonjena"
msgstr[1] "%d datoteke uklonjenih"
msgstr[2] "%d slika uklonjenih"
The letters PO in .po files means Portable Object, to distinguish it from .mo files, where MO stands for Machine Object.
This paradigm, as well as the PO file format, is inspired by the NLS standard developed by Uniforum, and first implemented by Sun in their Solaris system.
https://www.gnu.org/software/gettext/manual/html_node/Files.html
{gender_of_host, select,
female {
{num_guests, plural, offset:1
=0 {{host} does not give a party.}
=1 {{host} invites {guest} to her party.}
=2 {{host} invites {guest} and one other person to her party.}
other {{host} invites {guest} and # other people to her party.}}}
male {
{num_guests, plural, offset:1
=0 {{host} does not give a party.}
=1 {{host} invites {guest} to his party.}
=2 {{host} invites {guest} and one other person to his party.}
other {{host} invites {guest} and # other people to his party.}}}
other {
{num_guests, plural, offset:1
=0 {{host} does not give a party.}
=1 {{host} invites {guest} to their party.}
=2 {{host} invites {guest} and one other person to their party.}
other {{host} invites {guest} and # other people to their party.}}}}
International Components for Unicode
nie tylko do tłumaczeń, coś jak Babel + gettext + więcej
<brandShortName {
*nominative: "Aurora",
genitive: "Aurore",
dative: "Aurori",
accusative: "Auroro",
locative: "Aurori",
instrumental: "Auroro"
}>
<aboutOld "O brskalniku {{ brandShortName }}">
<about "O {{ brandShortName.locative }}">
mamy wbudowany moduł gettext
🐧
xgettext --keyword=_ --output=messages.pot `find html/ -name "*.html"`
msginit --input=messages.pot --locale=zh_TW -o locale/zh_TW/LC_MESSAGES/messages.po
msgfmt locale/zh_TW/LC_MESSAGES/messages.po -o locale/zh_TW/LC_MESSAGES/messages.mo
🐍
pybabel extract -F babel.cfg -o messages.pot .
pybabel init -i messages.pot -d locale -l zh_TW
pybabel compile -i locale/zh_TW/LC_MESSAGES/messages.po -d locale -l zh_TW
🎸
django-admin makemessages -l zh_TW
django-admin compilemessages
A co z bazą?
from django.utils.translation import string_concat
from django.utils.translation import ugettext_lazy
name = ugettext_lazy('John Lennon')
instrument = ugettext_lazy('guitar')
result = string_concat(name, ': ', instrument)
John Lennon: gitara
John Lennon : guitare
#django/conf/locale/fr/LC_MESSAGES/django.po
#. Translators: This is the default suffix added to form field labels
msgid ":"
msgstr " :"
# django/conf/locale/zh_Hans/LC_MESSAGES/django.po
msgid "Select a valid choice. That choice is not one of the available choices."
msgstr "选择一个有效的选项: 该选择不在可用的选项中。"
Now, how do these functions solve the problem of the plural forms? Without the input of linguists (which was not available) it was not possible to determine whether there are only a few different forms in which plural forms are formed or whether the number can increase with every new supported language.
# angielski
nplurals=2;
plural=(n != 1);
# polski
nplurals=3;
plural=(n==1 ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2);
# rosyjski
nplurals=3;
plural=(n%10==1 && n%100!=11 ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2);
# arabski
nplurals=6;
plural=(n==0 ? 0 : n==1 ? 1 : n==2 ? 2 : n%100>=3 && n%100<=10 ? 3 : n%100>=11 ? 4 : 5);
Django does not support custom plural equations in po files. As all translation catalogs are merged, only the plural form for the main Django po file (in django/conf/locale/<lang_code>/LC_MESSAGES/django.po) is considered. Plural forms in all other po files are ignored. Therefore, you should not use different plural equations in your project or application po files.
nplurals=4;
plural=(n%10==1 && n%100!=11 ? 0 : n%10>=2 && n%10<=4 && (n%100<12 || n%100>14) ? 1 \
: n%10==0 || (n%10>=5 && n%10<=9) || (n%100>=11 && n%100<=14)? 2 : 3);
android
Borowinka
kalafiory w sosie
Złoża uranu
Älbercik
łódź w ogniu
Entry.objects.order_by(Lower('headline'))
Nie można "dobrze" posortować nie znając języka użytkownika.
Nie można "dobrze" posortować nie znając użycia wewnątrz języka.
*ale zawsze można próbować obrazić jak najmniejszą ich grupę
Unicode collation algorithm
Default Unicode Collation Element Table
Zdefiniowane w Unicode Technical Report #10
W CLDR są dodatkowe zmiany per język i wariant.
Najpopularniejsza implementacja UCA + danych z CLDR.
Problem solved! (sic!)
http://demo.icu-project.org/icu-bin/locexp?_=pl_PL&d_=en&x=col
https://wiki.postgresql.org/wiki/Todo:ICU
https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-icu-collation.html
٠ | ١ | ٢ | ٣ | ٤ | ٥ | ٦ | ٧ | ٨ |
٩
|
零 一 二 三 四 五 六 七 八 九 十
To zależy gdzie i kogo pytasz
9 Marca
AD 2016
19 Esfand
1394 SH
29 Jumada al-awwal
1437 AH
Strefy czasowe doprowadzają mnie do szaleństwa.
Zwłaszcza strefa UTC+05:45 w Nepalu.
I UTC+12:45 na Wyspach Chatham.
Accept-Datetime to rozszerzenie do historycznych stron
Ludzie robią GeoIP, obrzydliwy javascript albo
pytają użytkownika.
# Poland
# The 1919 dates and times can be found in Tygodnik Urzędowy nr 1 (1919-03-20),
# <http://www.wbc.poznan.pl/publication/32156> pp 1-2.
# Rule NAME FROM TO TYPE IN ON AT SAVE LETTER/S
Rule Poland 1918 1919 - Sep 16 2:00s 0 -
Rule Poland 1919 only - Apr 15 2:00s 1:00 S
Rule Poland 1944 only - Apr 3 2:00s 1:00 S
# Whitman gives 1944 Nov 30; go with Shanks & Pottenger.
Rule Poland 1944 only - Oct 4 2:00 0 -
# For 1944-1948 Whitman gives the previous day; go with Shanks & Pottenger.
Rule Poland 1945 only - Apr 29 0:00 1:00 S
Rule Poland 1945 only - Nov 1 0:00 0 -
# For 1946 on the source is Kazimierz Borkowski,
# Toruń Center for Astronomy, Dept. of Radio Astronomy, Nicolaus Copernicus U.,
# http://www.astro.uni.torun.pl/~kb/Artykuly/U-PA/Czas2.htm#tth_tAb1
# Thanks to Przemysław Augustyniak (2005-05-28) for this reference.
# He also gives these further references:
# Mon Pol nr 13, poz 162 (1995) <http://www.abc.com.pl/serwis/mp/1995/0162.htm>
# Druk nr 2180 (2003) <http://www.senat.gov.pl/k5/dok/sejm/053/2180.pdf>
Rule Poland 1946 only - Apr 14 0:00s 1:00 S
Rule Poland 1946 only - Oct 7 2:00s 0 -
Rule Poland 1947 only - May 4 2:00s 1:00 S
Rule Poland 1947 1949 - Oct Sun>=1 2:00s 0 -
Rule Poland 1948 only - Apr 18 2:00s 1:00 S
Rule Poland 1949 only - Apr 10 2:00s 1:00 S
Rule Poland 1957 only - Jun 2 1:00s 1:00 S
Rule Poland 1957 1958 - Sep lastSun 1:00s 0 -
Rule Poland 1958 only - Mar 30 1:00s 1:00 S
Rule Poland 1959 only - May 31 1:00s 1:00 S
Rule Poland 1959 1961 - Oct Sun>=1 1:00s 0 -
Rule Poland 1960 only - Apr 3 1:00s 1:00 S
Rule Poland 1961 1964 - May lastSun 1:00s 1:00 S
Rule Poland 1962 1964 - Sep lastSun 1:00s 0 -
# Zone NAME GMTOFF RULES FORMAT [UNTIL]
Zone Europe/Warsaw 1:24:00 - LMT 1880
1:24:00 - WMT 1915 Aug 5 # Warsaw Mean Time
1:00 C-Eur CE%sT 1918 Sep 16 3:00
2:00 Poland EE%sT 1922 Jun
1:00 Poland CE%sT 1940 Jun 23 2:00
1:00 C-Eur CE%sT 1944 Oct
1:00 Poland CE%sT 1977
1:00 W-Eur CE%sT 1988
1:00 EU CE%sT
from datetime import datetime, timedelta
import pytz
warsaw = pytz.timezone('Europe/Warsaw')
# zgodnie z dokumentacją to nie działa
datetime(2002, 10, 27, 12, 0, 0, tzinfo=warsaw).isoformat()
# 2002-10-27T12:00:00+01:24 LOL
utc_dt = datetime(2002, 10, 27, 12, 0, 0, tzinfo=pytz.utc)
loc_dt = utc_dt.astimezone(warsaw)
loc_dt.isoformat()
# 2002-10-27T13:00:00+01:00 great success!!!
🇨🇳 QQ/QZone - 700+ mln
🇨🇳 Sina Weibo - 400+ mln
🇷🇺 VK - 350+ mln
🇷🇺 Odnoklassniki - 200+ mln
🇨🇳 Tencent Weibo, WeiXin, Douban, Renren - 100+ mln
🇦🇷 Taringa! - 27 mln
🇱🇻 Draugiem - 2,4 mln
🇮🇷 Facenama - ?
Imiona, japońskie grupy krwi, polskie adresy na wsiach i wiele innych.
drobnostki jak dźwięki H i B