qrunner crashes on invalid unicode sequence

Bug #1462755 reported by Thijs Kinkhorst
26
This bug affects 2 people
Affects Status Importance Assigned to Milestone
GNU Mailman
Fix Released
Low
Mark Sapiro
mailman (Ubuntu)
Fix Released
Wishlist
Unassigned

Bug Description

When a message contains an invalud unicode sequence in its header, qrunner flat out crashes on that:

May 17 15:32:20 2015 (981) Uncaught runner exception: 'utf8' codec can't decode byte
0xe9 in position 18: invalid continuation byte
May 17 15:32:20 2015 (981) Traceback (most recent call last):
  File "/var/lib/mailman/Mailman/Queue/Runner.py", line 119, in _oneloop
    self._onefile(msg, msgdata)
  File "/var/lib/mailman/Mailman/Queue/Runner.py", line 190, in _onefile
    keepqueued = self._dispose(mlist, msg, msgdata)
  File "/var/lib/mailman/Mailman/Queue/IncomingRunner.py", line 130, in _dispose
    more = self._dopipeline(mlist, msg, msgdata, pipeline)
  File "/var/lib/mailman/Mailman/Queue/IncomingRunner.py", line 153, in _dopipeline
    sys.modules[modname].process(mlist, msg, msgdata)
  File "/var/lib/mailman/Mailman/Handlers/CookHeaders.py", line 239, in process
    i18ndesc = uheader(mlist, mlist.description, 'List-Id', maxlinelen=998)
  File "/var/lib/mailman/Mailman/Handlers/CookHeaders.py", line 65, in uheader
    return Header(s, charset, maxlinelen, header_name, continuation_ws)
  File "/usr/lib/python2.7/email/header.py", line 183, in __init__
    self.append(s, charset, errors)
  File "/usr/lib/python2.7/email/header.py", line 267, in append
    ustr = unicode(s, incodec, errors)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe9 in position 18: invalid
continuation byte

May 17 15:32:20 2015 (981) SHUNTING:
1431869540.389822+156779307d54473d0eb732994bb67eee95733285

A solution for this specific case is to have Mailman/Handlers/CookHeaders.py pass the erorrs='replace' parameter.

I would say that this is actually a bug in python-email, since I think it doesn't make sense to set errors to "strict" rather than something like "replace" when the intention is to parse stuff so free-formed, under-specd
and user-controlled as email. Nonetheless, Mailman already sets errors='replace' in some places so it might aswell add it here.

Related branches

Revision history for this message
Mark Sapiro (msapiro) wrote :

Actually, the traceback says what's happening is CookHeaders is trying to create the List-Id: header to be added to the message.

It tries to create a header of the form:

List-Id: list description <list.example.com>

And the exception occurs when trying to rfc 2047 encode the list's description in the charset of the list's preferred language. This exception should be occurring on every list post. Is that the case?

Also, what is the list's preferred_language and what is the raw value of the list's description attribute. Obtain this info with something like:

$ bin/withlist list1
Loading list list1 (unlocked)
The variable `m' is the list1 MailList instance
>>> m.preferred_language
'en'
>>> m.description
'My List one'
>>>

(of course the list name and responses will be different in your case.)

Changed in mailman:
assignee: nobody → Mark Sapiro (msapiro)
importance: Undecided → Medium
milestone: none → 2.1.21
status: New → Incomplete
Revision history for this message
Thijs Kinkhorst (kink) wrote :

I received this response:

root@barbershop:~# /usr/lib/mailman/bin/withlist caljente
Loading list caljente (unlocked)
The variable `m' is the caljente MailList instance
>>> m.preferred_language
'nl'
>>> m.description
'Lijst voor Caljent\xe9-leden'
>>>

Not sure what encoding that is. I've changed it to "Caljente" for now, which
should be a reasonable workaround.

Revision history for this message
Mark Sapiro (msapiro) wrote :

It appears the underlying issue is someone has changed Mailman's character set for 'nl' (Dutch) from iso-8859-1 to utf-8. Possibly whoever did this did the appropriate things such as recoding the message catalog and templates to utf-8, but in any case, the strings in the attributes of this list weren't recoded. This is one of the major problems that make it difficult to change Mailman's encoding for a language. See the definitions of the recode(), doitem() and convert() functions in Mailman/versions.py in Mailman 2.1.19 or later.

So basically, this issue appears to be a 'shot oneself in the foot' thing and probably could be fixed by setting the list's description to 'Lijst voor Caljent\xc3\xa9-leden', although I would be concerned that there are other iso-8859-1 strings in list attributes.

Anyway, I see this as an issue worth fixing. The fix I would propose is in Mailman/Handlers/CookHeaders.py replace the line at the end of the definition of uheader which is currently

    return Header(s, charset, maxlinelen, header_name, continuation_ws)

with

    try:
        return Header(s, charset, maxlinelen, header_name, continuation_ws)
    except UnicodeError:
        syslog('error', 'list: %s: can\'t decode "%s" as %s', mlist.internal_name(), s, charset)
        return Header('', charset, maxlinelen, header_name, continuation_ws)

Changed in mailman:
importance: Medium → Low
status: Incomplete → In Progress
Mark Sapiro (msapiro)
Changed in mailman:
status: In Progress → Fix Committed
Revision history for this message
Thijs Kinkhorst (kink) wrote :

Thanks for the fix! Although arguably a misconfiguration, it's good that it doesn't crash the qrunner.

Revision history for this message
Mark Sapiro (msapiro) wrote :

Actually, IncomingRunner doesn't actually "crash"; it does encounter an unanticipated exception causing it to log the exception and shunt the message, and yes, the underlying issue is definitely a "misconfiguration", but catching the exception and dealing with it more gracefully without shunting the message wasn't hard, so I thought it worthwhile.

Mark Sapiro (msapiro)
Changed in mailman:
milestone: 2.1.21 → 2.1.21rc1
status: Fix Committed → Fix Released
Revision history for this message
Mark Sapiro (msapiro) wrote :

For more information on the causes of this issue and the fallout from what turns out to be Debian's changing of the character set for several languages, see the thread "Encoding problem with 2.15 to 2.18 upgrade with Finnish" beginning at <https://mail.python.org/pipermail/mailman-users/2015-December/080221.html> and continuing at <https://mail.python.org/pipermail/mailman-users/2016-January/080275.html>. There is a script mentioned in that thread at <https://www.msapiro.net/scripts/recode_list> (mirrored at <http://fog.ccsf.edu/~msapiro/scripts/recode_list>) that can programmatically recode the strings in a list's configuration to "fix" this issue.

Revision history for this message
Paul Collins (pjdc) wrote :

I just ran into this problem following an upgrade from 12.04 LTS to 16.04 LTS. recode_list fixed the problem (thank you, Mark!) but this seems like something Ubuntu should detect and offer to correct during the upgrade.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Setting the task for a whishlist item, since it is essentially a config change that breaks it (as Mark said 'shot oneself in the foot'.

I'm personally not so keen about on-upgrade detection+warning since that (in general) has a history of too many false-positives leading people to config-break their system without a reason.
But then as I read Mark this is due to Debian intentionally changing some encodings, so maybe it should be done ...

Changed in mailman (Ubuntu):
status: New → Confirmed
importance: Undecided → Wishlist
Revision history for this message
Athos Ribeiro (athos-ribeiro) wrote :

This has been fixed upstream since 2.1.21rc1.

bionic ships 2.1.26. I am closing this one.

Changed in mailman (Ubuntu):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.