lp.services.apachelogparser.base.get_files_to_parse doesn't like gzipped files over 4GiB

Bug #1020785 reported by William Grant
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
Triaged
High
Unassigned

Bug Description

lp.services.apachelogparser.base.get_files_to_parse fails to correctly handle gzipped logs with an original size of more than 4GiB. The relevant bits:

        fd, file_size = get_fd_and_file_size(file_path)
        [...]
        if parsed_file.bytes_read >= file_size:
            # There's nothing new in it for us to parse, so just skip it.
            fd.close()
            continue

get_fd_and_file_size uses the ISIZE field from the gzip trailer, but that's defined as "the size of the original (uncompressed) input data modulo 2^32." So it's going to start ignoring the file as soon as it passes the 4GiB mark.

This affects lots of our logs, as they're well into the tens of gigabytes uncompressed. This is preventing the logs missed during the outage described in bug #1006323 from being fully parsed.

Tags: ppa
Revision history for this message
Curtis Hovey (sinzui) wrote :

Does this related to Bug #1006323?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.