Comment 8 for bug 1442674

Revision history for this message
Bill Cole (ubuntu-20150819) wrote :

Note that the "-L" flag is hardcoded in the sa1 calls of sadc, so whatever locking that does is not fixing the issue. Seems like a possible upstream bug?

I've had this issue in sysstat data files generated on machines ( n=9 out of 48 sa?? files from 3 different machines) where the sysstat resolution has been changed to every 2 minutes by changing this line in /etc/cron.d/sysstat:

     5-55/10 * * * * root command -v debian-sa1 > /dev/null && debian-sa1 1 1

To:

    */2 * * * * root command -v debian-sa1 > /dev/null && debian-sa1 1 1

I've been able to salvage the files by looking for repeating binary patterns to figure out the size and offsets of sadc records, finding the one which is a runt, and rebuilding the file by splicing together the sections before and after it, making sure the resulting file is identical in size to the other full-day files that are parseable. This is not an easily documented process, as it requires eyeballing hexdump output and making educated guesses, but there are useful tips:

1. With "-S XALL" in SADC_OPTIONS, each record is 8-9KB, and for some reason they seem to alternate between 2 sizes (!) 16 bytes different, e.g. 8528 bytes & 8544 bytes.
2. There is a header in the file between 850-900 bytes, so a good place to start looking for patterns at the end of records (useful!) is ~9K.
3. EVERY time I've had this problem, the 2nd record has been a runt, 500-1500 bytes shorter than the normal records.

Because excising the runt record yields what seems to be a perfectly good file, my guess is that the root cause is collision between the 23:59 run and the 00:00 run, with the sadc -L flag for some reason failing to do its job.