Tokenize
Japanese
Address
Using
Machine
Learning
Github: lulalala
Twitter: lulalala_it
I also draw doujinshi because I am otaku.
From To
{
"prefecture"=>"東京都",
"gun"=>"西多摩郡",
"municipality"=>"秋多町"
}
"東京都西多摩郡秋多町"
I think it is difficult to use
if statements
and
regular expressions
gem install pbf_parser
台北市 南港區 研究院路二段 128號
city suburb street housenumber
["台 city", "北 city", "市 city",
"南 suburb", "港 suburb", "區 suburb",
"研 street", "究 street", "院 street", "路 street"...]
Input --> CRF training -->
model file
Address text --> model file -->
parsed result
兵庫県南あわじ市5丁目69-25
{"prefecture"=>"兵庫県", "gun"=>"南", "municipality"=>"あわじ市", "other"=>"5丁目69-25"}
鹿児島県南九州市1丁目65-89
{"prefecture"=>"鹿児島県", "gun"=>"南", "municipality"=>"九州市", "other"=>"1丁目65-89"}
岐阜県中津川市4丁目32-100
{"prefecture"=>"岐阜県", "gun"=>"中", "municipality"=>"津川市", "other"=>"4丁目32-100"}
If you are anime otaku, please chat with me :D
If you are anime otaku, please chat with me :D