Bad characters in rsyslog output

Bug #1029116 reported by Aimon Bustardo
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
openstack-common
Invalid
Undecided
Unassigned

Bug Description

In latest nova (and earlier versions) when logging to syslog using rsyslog on Ubuintu there are bad characters entered. This causes problems when trying to pattern match the logs for parsing. Example output:

Jul 25 13:36:03 mc 2012-07-25 13:36:03 INFO nova.api.openstack.wsgi [req-48a555a5-6d2a-4a38-8384-3b4684357e72 19f932a5b0b34655989f4cb761522bb3 2617e657fdf84569a6be7977318e46c8] http://mc.la-1-11.morphlabs.net:8774/v1.1/2617e657fdf84569a6be7977318e46c8/os-hosts/cn32.la-1-11.morphcloud.net.json?ignore_awful_caching1343248563 returned with HTTP 200

Note the ' ' before the date field.

Revision history for this message
Aimon Bustardo (aimonb) wrote :

Version currently used: 2012.1+stable~20120612-3ee026e-0ubuntu1.2

tags: added: nova
tags: removed: nova
Revision history for this message
Aimon Bustardo (aimonb) wrote :

Note that I have changed order of fields by using the log formatting nova.conf options. The bad data is always at the start of the message coming from openstack.. Note that the python logger adds everything prior to the bad characters.

Revision history for this message
Aimon Bustardo (aimonb) wrote :

Interesting find on issues from another site:

"Yes, "" is the Byte Order Mark (BOM) of the Unicode Standard. Specifically it is the hex bytes EF BB BF, which form the UTF-8 representation of the BOM, misinterpreted as ISO 8859/1 text instead of UTF-8.

Probably what it means is that you are using a text editor that is saving files in UTF-8 with the BOM, when it should be saving without the BOM. It could be PHP files that have the BOM, in which case they'd appear as literal text on your page. Or it could be translated text you pasted into Joomla! edit windows.

The Unicode Consortium's FAQ on the Byte Order Mark is at http://www.unicode.org/faq/utf_bom.html#BOM ."

Revision history for this message
Aimon Bustardo (aimonb) wrote :

More digging shows in /usr/lib/python2.7/logging/handlers.py:

        # Message is a string. Convert to bytes as required by RFC 5424
        if type(msg) is unicode:
            # Morph
            msg = msg.encode('utf-8')
            if codecs:
                msg = codecs.BOM_UTF8 + msg
        msg = prio + msg

I will try removing the BOM addition.

Revision history for this message
Aimon Bustardo (aimonb) wrote :

ok that worked.. Diff:

--- /usr/lib/python2.7/logging/handlers.py 2012-07-25 20:25:24.545451192 -0700
+++ handlers.py 2012-07-25 20:25:09.237569831 -0700
@@ -797,9 +797,10 @@
                                             self.mapPriority(record.levelname))
         # Message is a string. Convert to bytes as required by RFC 5424
         if type(msg) is unicode:
+ # Morph
            msg = msg.encode('utf-8')
- if codecs:
- msg = codecs.BOM_UTF8 + msg
+ #if codecs:
+ # msg = codecs.BOM_UTF8 + msg
         msg = prio + msg
         try:
             if self.unixsocket:

Revision history for this message
Jason Kölker (jason-koelker) wrote :

Sounds like whatever is parsing the logs isn't able to understand UTF-8. You might want to look into using the JSONFormatter.

Revision history for this message
Aimon Bustardo (aimonb) wrote :

It's:

ii rsyslog 5.8.6-1ubuntu8 reliable system and kernel logging daemon

Revision history for this message
Aimon Bustardo (aimonb) wrote :

Rsyslog expects UTF8. However it does not expect the BOM prepended.

Revision history for this message
Jason Kölker (jason-koelker) wrote :

Hrm, sounds like a bug in rsyslog then. RFC5424 Appendix A.8 is pretty clear:

If the syslog application has a good indication that the content of the message is encoded in UTF-8, then it should include the BOM.

You might also want to use a custom handler if rsyslog does not follow the RFC.

Changed in openstack-common:
status: New → Invalid
Revision history for this message
Aimon Bustardo (aimonb) wrote :

Jason, I would have to agree. Or a bug in Python logger (perhaps its sending in wrong format.. or sending BOM twice.).. I am going to pursue this with rsyslog and Python and see what I can come up wit. In mean ime the total Kludge of a patch above works. I will leave this ticket open and update it with my finding to benefit other users who are having same issue.

summary: - Bad characters in syslog output
+ Bad characters in rsyslog output
description: updated
Revision history for this message
Aimon Bustardo (aimonb) wrote :
Revision history for this message
booi (booi) wrote :

Closed in python 2.7, 3.2 and 3.3.

Unfortunately something we have to live with in python 2.6 (the recommended python for Openstack)

http://bugs.python.org/issue14452

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.