Errors

Bug #1046269
Comment #3

Comment 3 for bug 1046269

Revision history for this message

Matthew Paul Thomas (mpt) wrote on 2012-11-15:

A machine that never reboots would have been running all of the past 24 hours, so its error count over those 24 hours would be multiplied by 1. A machine that had been running for 6 of the past 24 hours would have its error count multiplied by 6/24 -- regardless of how many reboots were required to reach that total of 6 hours. So just knowing uptime since the most recent reboot wouldn't be enough; we need to know uptime over the past day.

That we can't get data from machines that never crash is indeed a difficulty. Currently we assume that "all machines that would report an error if they had one" is roughly equal to "all machines that have reported any errors in the past 90 days". So there would be a selection bias if the error rate ever declined anywhere close to 1/90, and/or if the error rate was strangely distributed amongst machines, such that a substantial proportion of reporting machines wouldn't have reported in the past 90 days. Fixing bug 1077122 should fix that. A bigger problem at the moment is a sudden increase in the number of reporting machines, which is bug 1069827.