Unihandecode is a fork project of unidecode to provide transliterations of Unicode text by its readings in each native languages in Python environment.
Now I'm planning to move github.com to host source code(5, Aug, 2012). See : http://
Unihandecode is a fork project of unidecode to provide transliterations of Unicode text by its readings in each native languages in Python environment.
There is a description in original unidecode(http://
"It often happens that you have non-Roman text data in Unicode, but you can't display it -- usually because you're trying to show it to a user via an application that doesn't support Unicode, or because the fonts you need aren't accessible. You could represent the Unicode characters as "???????" or "\15BA\
What unihandecode provide is a decode(...) function that takes Unicode data and tries to represent it in US-ASCII characters. There is a simple but big problem for China, Japanese and Korean characters. In some black history, CJK characters in Unicode are share same code blocks for similar(but not same figure, pronounce and meanings) characters.
This is why I want to add a feature on unidecode that can recognize user's preferable language and transliterate it based on its readings.
Sean M. Burke, an original unidecode auther, said that;
"Unidecode, in other words, is quick and dirty. Sometimes the output is not so dirty at all... But sometimes the output is very dirty: Unidecode does quite badly on Japanese and Thai."
I am Japanese and feel bad for output of unidecode because of limitations as Sean said.
Unihandecode provide good functionality over unidecode code base even for Japanese, Korean, Thai and more.
There are only Python bindings now. It is based on python port of unidecode (http://
The first target application is 'calibre' (http://
Release 0.2x is licensed under GPLv3/Perl license. After Release 0.3, it is licensed under GPLv3 because of inclusion of KAKASI (GPLv2 and later) logics.
View full history Series and milestones
trunk series is the current focus of development.
All bugs Latest bugs reported
All blueprints Latest blueprints
-
Thai support
Registered -
Vietnamese characters native support
Registered