GNU Mailman

Bug #1779445
Comment #5

Comment 5 for bug 1779445

Revision history for this message

Yasuhito FUTATSUKI at POEM (futatuki) wrote on 2018-07-08:

I understand that your fix is to preserve character entity reference in the text of TextArea through the post method and I made sure it have been fixed in Rev 1788. Thank you.

I think one more problem about charset of query strings from Text or TextArea which is not restricted to ascii text for all language. If a text contains raw non-ascii character, its charset depends on implementation of browsers, even if the HTML 4.01 specification mentions its default is "UNKNOWN", which means "User agents may interpret this value as the character encoding that was used to transmit the document containing this FORM element." (https://www.w3.org/TR/html401/interact/forms.html)

It seems that it is not a problem in most case on browsers nowadays respecting the specification, but it is still problem in some case. At least I put into non-breaking space ('\xa0' in iso-8859-1) character in Text field in us-ascii form using Firefox 61 on FreeBSD, it encoded as '%A0' in query string although characters in Unicode are encoded as numeric character references. The code to handle this special care for 'us-ascii' is found in Utils.canonstr(), so it may be needed to use it in some place including TextArea in edithtml.py (Though using non-ascii characters in us-ascii form is irregular, of course)