'from_is_list' does not RFC2047 encode correctly when translation contains non-ascii char

Bug #1643210 reported by Yasuhito FUTATSUKI at POEM
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
GNU Mailman
Fix Released
High
Mark Sapiro

Bug Description

If from_is_list feature is used, From: header's `realname' field is composed by original realname
and translation of '%(realname)s via %(lrn)s' which may contain non-ascii character.

The realname field is encoded before compose if nessesary, but translation part is not.
So From header may contain raw non-ascii character.

To fix this, do RFC 2047 encode after compose.

(There is another bug..., if servers language setting and mail list preferred language is differ,
translation has taken from servers language, not from mail list one. Attached patch contains
fix of it)

Related branches

Revision history for this message
Yasuhito FUTATSUKI at POEM (futatuki) wrote :
Revision history for this message
Yasuhito FUTATSUKI at POEM (futatuki) wrote :

My #1 patch try to adjust to charset/encoding list's preferred. But after I realize my misunderstanding, it is better to adjust to sender's preference. Anyways original senders `realname' charset and translation of '%(realname)s via %(lrn)s' charset and list's preference can differ each other.

Revision history for this message
Mark Sapiro (msapiro) wrote :

I'm having trouble understanding what the problem is. There are 3 possible sources of realname. In order of preference, the display name in the message's From: header if any; if not and the From: address is a list member, the list member's username if any, and if not, the local part of the sender's email address.

In the first case, the real name should already be RFC 2047 encoded in the incoming message and that will be the resultant value in the munged From: header. In the other cases if the name contains non-ascii, it will be RFC 2047 encoded in the character set of the list's preferred language (or maybe utf-8 if the real name is a unicode).

It seems all this should be OK.

Please provide and actual From: header and possibly relevant list settings that illustrate the problem.

Or is the issue that the list's real_name (lrn in the code)is not rfc 2047 encoded. I see that, but I think the fix is simply to replace the one line

lrn = mlist.real_name

with

lrn = str(uheader(mlist, mlist.real_name))

Changed in mailman:
status: New → Incomplete
Revision history for this message
Mark Sapiro (msapiro) wrote :

I see some issues with the simple 'fix' I suggested above. Namely the translation of 'via' is not RFC 2047 encoded and there would probably be missing whitespace issues due to mixing of text and RFC 2047 encoded words, but it still seems to me that something like the attached should do.

Revision history for this message
Mark Sapiro (msapiro) wrote :

Further testing of my suggested patch shows it doesn't work in all cases. I'll continue to look at this.

Revision history for this message
Yasuhito FUTATSUKI at POEM (futatuki) wrote :

Suppose sender is a member who set preference language ja, the list language is french and original From: is 'From: =?UTF-8?B?5LqM5pyoIOmdluS7gQ==?= <email address hidden>'.
The translation of '%(realname)s via %(lrn)s' is taken from sender's language context, ja, and its charset is euc-jp. The display name of original from cannot encode to iso-8859-1 entirely and translation of 'via' part is miss interpreted as iso-8859-1. For the latter problem, we should encode it to encoding of sender's preference language (at least the translation of 'via' part).

I don't want to break the realname even if it cannot be encoded to his/her preferred language's encoding, so I select to abandon to translate 'via' part for simple fix. This will occur in language settings that of charset/encoding is not UTF-8.

Revision history for this message
Yasuhito FUTATSUKI at POEM (futatuki) wrote :

The lrn part is always us-ascii so it is vallid string in all encodings that currently supported by Mailman.

Revision history for this message
Yasuhito FUTATSUKI at POEM (futatuki) wrote :

It seems my patch in #2 also doesn't work if sender is not a list member.

Revision history for this message
Mark Sapiro (msapiro) wrote :

I need to look at this further, and I won't have time for a day or two, but I will. Note however that the sender's display name in the message is already RFC 2047 encoded in some character set which may not be either of the character sets of the list's preferred language or the sender's preferred language if the sender is even a list member. I will be looking at this further when I have time. I appreciate your input and we will get it right.

Revision history for this message
Mark Sapiro (msapiro) wrote :

I have attached the results of my latest effort. For the From:, this gets the i18n translation of the '%(realname)s via %(lrn)s' string with dummy substitutions, converts it to unicode and substitutes unicode values for the substitutions and encodes it all as utf-8. I have tested this to some extent and I think it's good, but I would appreciate additional testing. Please try this and see if it works in your environment or report any issues.

As far as the encoding of _('(no subject)') is concerned, my change ensures this is translated in the list's language, not the poster's.

Again, thanks for your help with this.

Changed in mailman:
status: Incomplete → In Progress
assignee: nobody → Mark Sapiro (msapiro)
Revision history for this message
Yasuhito FUTATSUKI at POEM (futatuki) wrote :

Thank you for your better fix. The fix in #10 also works fine for my environment, except a small issue that it always encodes non-ascii to UTF-8 even if sender's preferred language is same as list's but its encoding is not UTF-8.

A test case.
  list's language : fr (iso-8859-1)
  sender's language : fr (iso-8859-1)
  sender's display name : =?iso-8859-1?q?G=E9n=E9rales?=
(results)
  From: =?utf-8?q?G=C3=A9n=C3=A9rales_via_Mailman-test?= <...>

Another case.
  list's language : ja (euc-jp, out going messages are encoded to iso-2022-jp)
  sender's language : ja (euc-jp, out going messages are encoded to iso-2022-jp)
  sender's display name : =?ISO-2022-JP?B?GyRCRnNMWkx3P04bKEI=?=
(results)
  From: =?utf-8?b?5LqM5pyo6Z2W5LuBIChNYWlsbWFuLXRlc3Qg57WM55SxKQ==?= <...>

It seems to be no problem for almost all MUAs nowadays except some l10n MUAs (and those MUAs will treat such encoded strings as raw ascii string, as discribed in RFC, so I think the problem is small).

Revision history for this message
Yasuhito FUTATSUKI at POEM (futatuki) wrote :

How about using "dn = str(Header(uvia, lcs))" instead of "dn = str(Header(uvia, 'utf-8'))" ?
As variable uvia is always unicode, there is no afraid to be mistaken encodings. Header() treats charset parameter only for a hint, so it uses 'utf-8' as the fall back if it fail to encode to lcs.

test case 1.
  list's language : fr (iso-8859-1)
  sender's language : fr (iso-8859-1)
  sender's display name : =?iso-8859-1?q?G=E9n=E9rales?=
(results)
  From: =?iso-8859-1?q?G=E9n=E9rales_via_Mailman-test?= <...>

test case 2.
  list's language : ja (euc-jp, out going messages are encoded to iso-2022-jp)
  sender's language : ja (euc-jp, out going messages are encoded to iso-2022-jp)
  sender's display name : =?ISO-2022-JP?B?GyRCRnNMWkx3P04bKEI=?=
(results)
  From: =?iso-2022-jp?b?GyRCRnNMWkx3P04bKEIgKE1haWxtYW4tdGVzdCAbJEI3UE0zGyhCKQ==?= <...>

test case 3.
  list's language : en (us-ascii)
  sender's language : en (us-ascii)
  sender's display name : Yasuhito FUTATSUKI
(results)
  From: Yasuhito FUTATSUKI via Mailman-test <...>

test case 4.
  list's language : fr (iso-8859-1)
  sender's language : ja (euc-jp, out going messages are encoded to iso-2022-jp)
  sender's display name : =?UTF-8?B?5LqM5pyoIOmdluS7gQ==?=
(results)
  From: =?utf-8?b?5LqM5pyoIOmdluS7gSB2aWEgTWFpbG1hbi10ZXN0?= <...>

in all of above, it looks fine.

Revision history for this message
Mark Sapiro (msapiro) wrote :

This is an area where there is no one right answer. I have the display name as a unicode, so how do I encode it for the header. I don't think it should ever be encoded in the character set of the poster's language because this is for a message to be sent to all list members, plus there is no guarantee that the poster's display name as encoded by the sending MUA was even encoded in Mailman's charset for the poster's language if the poster is even a member, and there is no guarantee that the translation of the 'via' can even be properly encoded in the charset of the poster's language.

Further, there is no guarantee that the poster's display name can be properly encoded in the charset of the list's preferred language either.

The most reasonable encoding of unicode that guarantees no loss of information is utf-8, and any MUA that recognizes RFC 2047 encodings at all should be able to handle utf-8 encodings.

Even if there are MUA's that can properly decode RFC 2047 encodings in, e.g., iso-2022-jp but not utf-8, I think there are as many problems with trying to encode the original display name in the list's charset as there are with utf-8 encoding. I recognize that what I've done is a compromise, but I think it's as good as any.

Revision history for this message
Mark Sapiro (msapiro) wrote :

I wrote https://bugs.launchpad.net/mailman/+bug/1643210/comments/13 before I saw https://bugs.launchpad.net/mailman/+bug/1643210/comments/12. It looks like your suggestion is good. I'll investigate that.

Revision history for this message
Yasuhito FUTATSUKI at POEM (futatuki) wrote :

test case 5.
  list's language : fr (iso-8859-1)
  sender's language : ja (euc-jp, out going messages are encoded to iso-2022-jp)
  sender's display name: =?ISO-2022-JP?B?GyRCRnNMWkx3P04bKEI=?=
(results)
  From: =?utf-8?b?5LqM5pyo6Z2W5LuBIHZpYSBNYWlsbWFuLXRlc3Q=?= <...>

is a rational results, I think.

Revision history for this message
Mark Sapiro (msapiro) wrote :

I have committed a fix which is essentially https://bugs.launchpad.net/mailman/+bug/1643210/+attachment/4782228/+files/CookHeaders.py.diff.txt with "dn = str(Header(uvia, lcs))" as suggested at https://bugs.launchpad.net/mailman/+bug/1643210/comments/12.

Thanks very much to Yasuhito FUTATSUKI for the report and all the helpful suggestions.

Changed in mailman:
importance: Undecided → High
milestone: none → 2.1.24
status: In Progress → Fix Committed
Mark Sapiro (msapiro)
Changed in mailman:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.