sysstat log corruption, time isn't changing

Bug #1442674 reported by Dale Harris
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
sysstat (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

I'm seeing some log file corruption in sysstat, the time stamp occasionally just stops changing. So the output I typically see is like:

# sar

12:00:00 AM all 0.00 0.00 0.00 0.00 0.00 0.00
12:00:00 AM all 0.00 0.00 0.00 0.00 0.00 0.00
12:00:00 AM all 0.00 0.00 0.00 0.00 0.00 0.00
End of system activity file unexpected

Typical output should be something like:

11:58:01 PM all 0.11 0.00 0.02 0.00 0.00 99.87
11:59:01 PM all 0.06 0.00 0.02 0.00 0.00 99.92
12:00:01 AM all 0.03 0.00 0.02 0.00 0.00 99.95
Average: all 0.10 0.00 0.04 0.00 0.00 99.86

I'm attaching some example logs files, sa09 is good, sa10 is corrupted.

ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: sysstat 10.2.0-1
ProcVersionSignature: Ubuntu 3.13.0-46.77-generic 3.13.11-ckt15
Uname: Linux 3.13.0-46-generic x86_64
ApportVersion: 2.14.1-0ubuntu3.8
Architecture: amd64
Date: Fri Apr 10 14:44:10 2015
SourcePackage: sysstat
UpgradeStatus: No upgrade log present (probably fresh install)
mtime.conffile..etc.cron.d.sysstat: 2015-03-25T18:44:29.274292
mtime.conffile..etc.sysstat.sysstat: 2015-03-25T18:44:29.198292

Revision history for this message
Dale Harris (rodmur-u) wrote :
Revision history for this message
Dale Harris (rodmur-u) wrote :

This continues to happen, I did increase the interval of the "debian-sa1 1 1" line in /etc/cron.d/sysstat to */5, but this problem still pops up occasionally.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in sysstat (Ubuntu):
status: New → Confirmed
Revision history for this message
Dale Harris (rodmur-u) wrote :

I finally noticed that sadc has optional file locking with the -L flag, so I suspect that will likely fix my problems. Just going to add -L to the debian-sa1 lines in the /etc/cron.d/sysstat.

Revision history for this message
Robie Basak (racb) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better.

> I finally noticed that sadc has optional file locking with the -L flag, so I suspect that will likely fix my problems.

So it turns out that this is a configuration error rather than a bug in Ubuntu? Marking as Incomplete on that basis. If this is incorrect then please explain and change the bug status back to New. Thanks!

Changed in sysstat (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Dale Harris (rodmur-u) wrote :

Yeah, I think is mostly a misconfiguration on my part. You can't overlap the execution of those two crontab lines without having problems, even with the -L flag. I saw some weirdness even with the file locking in place on some of my systems.

Revision history for this message
Robie Basak (racb) wrote :

OK, thanks. Marking this as Invalid then, but feel free to change it back to New if your understanding changes.

Changed in sysstat (Ubuntu):
status: Incomplete → New
status: New → Invalid
Revision history for this message
Bill Cole (ubuntu-20150819) wrote :

Note that the "-L" flag is hardcoded in the sa1 calls of sadc, so whatever locking that does is not fixing the issue. Seems like a possible upstream bug?

I've had this issue in sysstat data files generated on machines ( n=9 out of 48 sa?? files from 3 different machines) where the sysstat resolution has been changed to every 2 minutes by changing this line in /etc/cron.d/sysstat:

     5-55/10 * * * * root command -v debian-sa1 > /dev/null && debian-sa1 1 1

To:

    */2 * * * * root command -v debian-sa1 > /dev/null && debian-sa1 1 1

I've been able to salvage the files by looking for repeating binary patterns to figure out the size and offsets of sadc records, finding the one which is a runt, and rebuilding the file by splicing together the sections before and after it, making sure the resulting file is identical in size to the other full-day files that are parseable. This is not an easily documented process, as it requires eyeballing hexdump output and making educated guesses, but there are useful tips:

1. With "-S XALL" in SADC_OPTIONS, each record is 8-9KB, and for some reason they seem to alternate between 2 sizes (!) 16 bytes different, e.g. 8528 bytes & 8544 bytes.
2. There is a header in the file between 850-900 bytes, so a good place to start looking for patterns at the end of records (useful!) is ~9K.
3. EVERY time I've had this problem, the 2nd record has been a runt, 500-1500 bytes shorter than the normal records.

Because excising the runt record yields what seems to be a perfectly good file, my guess is that the root cause is collision between the 23:59 run and the 00:00 run, with the sadc -L flag for some reason failing to do its job.

Revision history for this message
Dale Harris (rodmur-u) wrote : Re: [Bug 1442674] Re: sysstat log corruption, time isn't changing
Download full text (4.3 KiB)

Oh okay, I didn't know about -L being hard coded in there, missed
that. I just ended up making sure the two cron jobs won't conflict
with each other, which is why I was saying it was a configuration
error on my part.

I did have:

*/5 * * * * root command -v debian-sa1 > /dev/null && debian-sa1 -L 1 1
59 23 * * * root command -v debian-sa1 > /dev/null && debian-sa1 -L 60 2

But I've gone to:

2-58/2 * * * * root command -v debian-sa1 > /dev/null && debian-sa1 -L 1 1
59 23 * * * root command -v debian-sa1 > /dev/null && debian-sa1 -L 60 2

I'm pretty sure the two were conflicting. I haven't seen any problems
since making that change.

On Wed, Aug 19, 2015 at 8:39 PM, Bill Cole
<email address hidden> wrote:
> Note that the "-L" flag is hardcoded in the sa1 calls of sadc, so
> whatever locking that does is not fixing the issue. Seems like a
> possible upstream bug?
>
> I've had this issue in sysstat data files generated on machines ( n=9
> out of 48 sa?? files from 3 different machines) where the sysstat
> resolution has been changed to every 2 minutes by changing this line in
> /etc/cron.d/sysstat:
>
> 5-55/10 * * * * root command -v debian-sa1 > /dev/null && debian-
> sa1 1 1
>
> To:
>
> */2 * * * * root command -v debian-sa1 > /dev/null && debian-sa1 1 1
>
> I've been able to salvage the files by looking for repeating binary
> patterns to figure out the size and offsets of sadc records, finding the
> one which is a runt, and rebuilding the file by splicing together the
> sections before and after it, making sure the resulting file is
> identical in size to the other full-day files that are parseable. This
> is not an easily documented process, as it requires eyeballing hexdump
> output and making educated guesses, but there are useful tips:
>
> 1. With "-S XALL" in SADC_OPTIONS, each record is 8-9KB, and for some reason they seem to alternate between 2 sizes (!) 16 bytes different, e.g. 8528 bytes & 8544 bytes.
> 2. There is a header in the file between 850-900 bytes, so a good place to start looking for patterns at the end of records (useful!) is ~9K.
> 3. EVERY time I've had this problem, the 2nd record has been a runt, 500-1500 bytes shorter than the normal records.
>
> Because excising the runt record yields what seems to be a perfectly
> good file, my guess is that the root cause is collision between the
> 23:59 run and the 00:00 run, with the sadc -L flag for some reason
> failing to do its job.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1442674
>
> Title:
> sysstat log corruption, time isn't changing
>
> Status in sysstat package in Ubuntu:
> Invalid
>
> Bug description:
> I'm seeing some log file corruption in sysstat, the time stamp
> occasionally just stops changing. So the output I typically see is
> like:
>
>
> # sar
>
> 12:00:00 AM all 0.00 0.00 0.00 0.00 0.00 0.00
> 12:00:00 AM all 0.00 0.00 0.00 0.00 0.00 0.00
> 12:00:00 AM all 0.00 0.00 0.00 0.00 0.00 0.00
> End of system activity file unexpected
>...

Read more...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.