Unihandecode - US-ASCII transliterations of Unicode text

Registered 2010-12-19 by Hiroshi Miura

Unihandecode is a fork project of unidecode to provide transliterations of Unicode text by its readings in each native languages in Python environment.

Now I'm planning to move github.com to host source code(5, Aug, 2012). See : http://miurahr.github.io/unihandecode

Unihandecode is a fork project of unidecode to provide transliterations of Unicode text by its readings in each native languages in Python environment.
There is a description in original unidecode(http://search.cpan.org/~sburke/Text-Unidecode-0.04/lib/Text/Unidecode.pm) said that;

"It often happens that you have non-Roman text data in Unicode, but you can't display it -- usually because you're trying to show it to a user via an application that doesn't support Unicode, or because the fonts you need aren't accessible. You could represent the Unicode characters as "???????" or "\15BA\15A0\1610...", but that's nearly useless to the user who actually wants to read what the text says."

What unihandecode provide is a decode(...) function that takes Unicode data and tries to represent it in US-ASCII characters. There is a simple but big problem for China, Japanese and Korean characters. In some black history, CJK characters in Unicode are share same code blocks for similar(but not same figure, pronounce and meanings) characters.
This is why I want to add a feature on unidecode that can recognize user's preferable language and transliterate it based on its readings.

Sean M. Burke, an original unidecode auther, said that;

"Unidecode, in other words, is quick and dirty. Sometimes the output is not so dirty at all... But sometimes the output is very dirty: Unidecode does quite badly on Japanese and Thai."

I am Japanese and feel bad for output of unidecode because of limitations as Sean said.
Unihandecode provide good functionality over unidecode code base even for Japanese, Korean, Thai and more.

There are only Python bindings now. It is based on python port of unidecode (http://pypi.python.org/pypi/Unidecode).

The first target application is 'calibre' (http://calibre-ebook.com) that is used unidecode to generate filename from ebook's title and author.

Release 0.2x is licensed under GPLv3/Perl license. After Release 0.3, it is licensed under GPLv3 because of inclusion of KAKASI (GPLv2 and later) logics.

Project information

Maintainer:
Hiroshi Miura
Driver:
Hiroshi Miura
Development focus:

trunk series 

lp:unihandecode 
Browse the code

Programming Languages:
Python
Licences:
GNU GPL v3
()

RDF metadata

View full history Series and milestones

Unihandecode trunk series is the current focus of development

All blueprints Latest blueprints

Get Involved

Downloads

Latest version is release-0.3
released on 2011-02-19

All downloads

Announcements

  • release 0.31 on 2011-08-18
    The 0.3x release now does not depend on kakasi library. Unihandecode now have...
  • release 0.20 on 2010-12-29
    Unihandecode version 0.20 has been released. now it works well on Windows an...