Rosetta import fails with POInvalidInputError regarding an unknown charset

Bug #2892 reported by Christian Reis
6
Affects Status Importance Assigned to Milestone
Launchpad itself
Fix Released
High
Jeroen T. Vermeulen

Bug Description

10:36:05 WARNING Error importing Indonesian (id) translation of faqguide in Ubuntu
Breezy Badger package "ubuntu-docs":

Traceback (most recent call last):
  File "/srv/launchpad.net/production/launchpad/cronscripts/../lib/canonical/launchpad/database/pofile.py", line 579, in doRawImport
    errors = import_po(self, file, self.rawfilepublished)
  File "/srv/launchpad.net/production/launchpad/cronscripts/../lib/canonical/launchpad/components/poimport.py", line 76, in import_po
    parser.write(file.read())
  File "/srv/launchpad.net/production/launchpad/cronscripts/../lib/canonical/launchpad/components/poparser.py", line 521, in write
    self.parse_line(l)
  File "/srv/launchpad.net/production/launchpad/cronscripts/../lib/canonical/launchpad/components/poparser.py", line 606, in parse_line
    self._make_header()
  File "/srv/launchpad.net/production/launchpad/cronscripts/../lib/canonical/launchpad/components/poparser.py", line 558, in _make_header
    self.header.finish()
  File "/srv/launchpad.net/production/launchpad/cronscripts/../lib/canonical/launchpad/components/poparser.py", line 319, in finish
    self.__setitem__(field, value, False)
  File "/srv/launchpad.net/production/launchpad/cronscripts/../lib/canonical/launchpad/components/poparser.py", line 388, in __setitem__
    v = self._decode(v)
  File "/srv/launchpad.net/production/launchpad/cronscripts/../lib/canonical/launchpad/components/poparser.py", line 335, in _decode
    raise POInvalidInputError(msg='Unknown charset %s' % self.charset)
POInvalidInputError: Unknown charset

That %s should be a %r (I'm fixing it now), but this is a rare and nasty little bug.

Changed in rosetta:
assignee: nobody → carlos
Changed in rosetta:
assignee: carlos → nobody
status: New → Accepted
Revision history for this message
Dafydd Harries (daf) wrote :

I don't understand why the log doens't contain the message from the exception raised. We can only confirm that this is a bug if we know what charset the file claimed to be using.

Changed in rosetta:
status: Accepted → NeedInfo
Revision history for this message
Carlos Perelló Marín (carlos) wrote :

As far as I know, the charset seems to be null and that's why it doesn't appear.

Changed in rosetta:
status: Needs Info → Confirmed
Revision history for this message
Christian Reis (kiko) wrote :

Yes, I think that's what it is too. But how can the charset be empty?

Revision history for this message
Carlos Perelló Marín (carlos) wrote :

As easy as someone without clue editing the .po header outside Rosetta by hand leaving it empty and importing it later into Rosetta.

Revision history for this message
Christian Reis (kiko) wrote : Re: [Bug 2892] Re: Rosetta import fails with POInvalidInputError regarding an unknown charset

Could we ignore the problem and try using a sane default?

Revision history for this message
Carlos Perelló Marín (carlos) wrote :

Only if there is a way to guess an encoding based on a stream of bytes with a low rate of error (1%). If the error rate is high, we could pollute our translation memory database.

Revision history for this message
Данило Шеган (danilo) wrote :

We can easily accept ASCII and UTF-8 as the only fallbacks, since they are both easily detected.

Changed in rosetta:
assignee: nobody → jtv
milestone: none → 1.2.2
Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

The code in question has changed drastically since this report was filed. I believe the code works sanely now:

 * The POHeader constructor fills out the charset field. There's no way to skip this.

 * A default is provided if charset is missing or empty: UTF-8. This means we already have the "try ASCII or UTF-8" feature.

 * The error happens if the Python library does not recognize the chosen charset, which is probably the right thing to do.

 * In 1.2.2, more useful errors will be reported to the uploader.

Changed in rosetta:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.