UnicodeDecodeError in +filebug: unexpected code byte

Bug #453203 reported by Ursula Junque
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
Won't Fix
Undecided
Unassigned
apport (Ubuntu)
Fix Released
High
Martin Pitt
Nominated for Karmic by Gavin Panella

Bug Description

As seen in OOPS-1384C2001:
  UnicodeDecodeError: 'utf8' codec can't decode byte <h> in position <n>: unexpected code byte

According to salgado this is a different thing than bug 61171

More occurrences: OOPS-1384H2543, OOPS-1384F286

Tags: lp-bugs oops
Revision history for this message
Ursula Junque (ursinha) wrote :

I've asked allenap to take a look on this one.

Revision history for this message
Gavin Panella (allenap) wrote :

There is a problem with the data that's been uploaded by apport I
would guess, though I don't know if it's apport's fault; it may well
not be.

This code that's breaking is:

    if disposition == 'inline':
        assert part_headers.get_content_type() == 'text/plain', (
            "Inline parts have to be plain text.")
        charset = part_headers.get_content_charset()
        assert charset, (
            "A charset has to be specified for text parts.")
        inline_content = part_file.read().rstrip()
        part_file.close()
--> inline_content = inline_content.decode(charset)

The RFC822 data that apport uploaded is being parsed. This part is
declared as text/plain with a charset of "utf8" (seen in the OOPS
report), but inline_content does not contain valid utf8 data. My guess
is that there is binary data in there, and the disposition should have
been 'attachment'.

We need to get one of these blobs, or a repeatable test case.

Revision history for this message
Gavin Panella (allenap) wrote :

All three example OOPses are from karmic, filing a bug against /ubuntu/+source/linux.

Revision history for this message
Ursula Junque (ursinha) wrote :

We had 53 occurrences on 11/01 and 71 on 11/02, on lpnet. Is there something we can do about it?

Revision history for this message
Deryck Hodge (deryck) wrote :

Gavin, can you comment further on this bug? Last we discussed, I think there was uncertainty about if we can do anything about this or if it's something with what apport is sending, IIRC?

Changed in malone:
status: New → Incomplete
Revision history for this message
Gavin Panella (allenap) wrote :

I am still to talk to Martin Pitt about this. I've been preoccupied with test suite homunculation, er, parallelisation and have let this one slide. Next week, promise.

Revision history for this message
Gavin Panella (allenap) wrote :

Okay, the cause is that the blob uploaded has invalid data in it. Here's a fragment of one part of the report:

  ...
  dmi.board.name: LENOVO
  dmi.board.vendor: LENOVO
  dmi.chassis.asset.tag: <<< gibberish here >>>
  dmi.chassis.type: 6
  dmi.chassis.vendor: LENOVO
  ...

The Python repr of characters 790 to 840 of this part of the report is:

  'asset.tag: \xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\ndmi.chassis.t'

In other words, the dmi.chassis.asset.tag value is 25 bytes of ff.

We can work around this on Launchpad by doing:

  inline_content.decode(charset, 'ignore')

However, apport should probably be fixed to ensure the report contains
only content that's valid for the declared charset. It may even be
interesting to record that the value for a field in the report is
invalid.

Revision history for this message
Gavin Panella (allenap) wrote :

The blob used for my last comment was found from OOPS-1409N1340.

Martin Pitt (pitti)
Changed in apport (Ubuntu):
status: New → In Progress
importance: Undecided → High
assignee: nobody → Martin Pitt (pitti)
Revision history for this message
Diogo Matsubara (matsubara) wrote :

An user emailed me about this bug. OOPS-1417N2600

Revision history for this message
Gavin Panella (allenap) wrote :

Assuming I understood correctly, Martin said he was going to fix this in apport. There are things we could do in Launchpad to work around this, but it seems likely that Martin will have a fix ready before the next Launchpad release - because he's superman :) - so it doesn't make sense for Bugs to spend time on this.

Changed in malone:
status: Incomplete → Won't Fix
Revision history for this message
Martin Pitt (pitti) wrote :

I now added a test to apport's test suite which reproduces this and potential other situations: invalid UTF-8 in description and attachment at both initial bug filing (+storeblob) and later update (launchpadlib addAttachment()).

Revision history for this message
Martin Pitt (pitti) wrote :

trunk r1648

Changed in apport (Ubuntu):
status: In Progress → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package apport - 1.9.6-0ubuntu1

---------------
apport (1.9.6-0ubuntu1) lucid; urgency=low

  [ Brian Murray ]
  * debian/local/apport-collect: Strongly encourage collectors who are not
    the bug reporter to file a new bug report.

  [ Marco Rodrigues ]
  * debian/control: Fix lintian warnings. Move python-distutils-extra
    to b-d-i and add misc:Depends to apport-qt.

  [ Martin Pitt ]
  * New upstream version 1.9.5 and 1.9.6:
    - apport-retrace: Fix crash if InterpreterPath/ExecutablePath do not
      exist.
    - hookutils.py, attach_alsa(): Attach /proc/cpuinfo too, for CPU flags.
    - Fix crash if InterpreterPath does not exist any more at the time of
      reporting. (LP: #428289)
    - apport-gtk: Connect signals properly, to repair cancel/window close
      buttons. (LP: #427814)
    - Update German translations and fix "konnre" typo. (LP: #484119)
    - launchpad.py: Ensure that text attachments on initial bug filing are
      valid UTF-8. (LP: #453203)
    - man/apport-retrace.1: Document -R option.
    - Add pm-utils hook to record current operation, so that apportcheckresume
      can check it. Before this was kept in Ubuntu's pm-utils package.
    - general-hooks/generic.py: Check if using ecryptfs, and which directory.
      (LP: #444656)
  * data/general-hooks/ubuntu.py: Add distro release codename tag.
    (LP: #404250)
  * debian/local/apport-chroot: Fix last occurrence of "--no-dpkg" to be
    "--no-pkg". (LP: #487056)
  * debian/local/apport-collect: Use "apport-collect data" as comment for the
    apport-collect attachments to enable bug mail filtering. Thanks to Bryce
    Harrington for the suggestion.
 -- Martin Pitt <email address hidden> Wed, 02 Dec 2009 00:01:06 +0100

Changed in apport (Ubuntu):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.