topics don't work with national(russian) languages

Bug #891676 reported by Roman Sokolkov
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
GNU Mailman
Fix Released
Medium
Mark Sapiro

Bug Description

Hi

I use mailman-2.1.14-6 on fedora 15

When I use english regex topics, all works good.

If I use russian regex, subscribers-members of a topic don't receive e-mails. Bacause subject doesn't match with regex.

In source code(/usr/lib/mailman/Mailman/Handlers/Tagger.py) I found that subject doesn't modified from MIME format.

If it english all seem good, but in my case I saw:

pattern is "<C8><D5><CA>"
line is "=?KOI8-R?Q?=C8=D5=CA?="

I create simple patch that use decode_header function from email module

Revision history for this message
Roman Sokolkov (rsokolkov) wrote :
Revision history for this message
Mark Sapiro (msapiro) wrote :

Thanks for your report.

This issue affects messages with RFC 2047 encoded Subject: and/or Keywords: headers as well as messages with encoded bodies containing Subject: and/or Keywords: pseudo-headers. The patch at comment #1 is incomplete as it doesn't take into account pseudo-headers and only decodes the first encoded word in an RFC 2047 encoded header.

Changed in mailman:
assignee: nobody → Mark Sapiro (msapiro)
importance: Undecided → Medium
milestone: none → 2.1.15
status: New → In Progress
Revision history for this message
Mark Sapiro (msapiro) wrote :

Attached is a tentative patch for this issue. There will still be problems if the encoding of the regexp doesn't match that of the message header. E.g., if the characters in the regexp are encoded as koi8-r and the Subject header is encoded as utf-8.

What do you think of this patch?

Revision history for this message
Mark Sapiro (msapiro) wrote :

I have revised the patch to decode the headers in the character set of the list's preferred language which should match the character set of the topics regexps.

Revision history for this message
Mark Sapiro (msapiro) wrote :

I've committed the fix in Comment #4. Note that this still does not address the issue with pseudo-headers in the message body if the charset of the message body is not compatible with that of the list's preferred language. I don't intend to fix that unless it proves to be an issue in practice.

Changed in mailman:
status: In Progress → Fix Committed
Revision history for this message
Roman Sokolkov (rsokolkov) wrote :

Thanks for a patch, Mark! all works good.

Revision history for this message
Steffen Petersen (spet) wrote :

May I add, that this affects non-national characters as well, notably brackets as used in the documentation (http://www.gnu.org/software/mailman/mailman-member/node30.html). Depending on mail client, topics such as [Networking] (as taken from the documentation) are downright impossible to define if the subject as a whole contains any characters the client decides to encode.

I don't know the roadmap for 2.1.15, but I think this bug calls for a speedy release of a patched version.

Mark Sapiro (msapiro)
Changed in mailman:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.