appear to be failing to record oops for all +translate HTTP 503 errors
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Launchpad itself |
Invalid
|
High
|
Unassigned |
Bug Description
In the last two days we've been getting a large increase in 5xx errors reported against the production LP app servers:
https:/
Analysis of the logs on vanadium suggests that the vast majority of these are to URL's with "+translate".
A quick breakdown of type & quantity of 5xx error for the 1st and 2nd of Sept. (at time of writing this was only 3.5 hours of the 2nd, with a full 24 hours from the 1st). This is measured from the apache logs on vanadium.
9912 x 503
78 x 502
2 x 500
A Q&D hunt thru the oops directories over the similar period suggests an order of magnitude difference in the number of OOPS recorded:
spm@devpad:
==> 1015
Thus when we come to look at the oops' reports, this class of failure doesn't appear to be anywhere near as significant as it actually is.
Even HTTP 404 errors are under-reported as we logged on vanadium 1709 404's against the translations.
affects: | oops-tools → launchpad-foundations |
Changed in launchpad-foundations: | |
status: | New → Triaged |
importance: | Undecided → Low |
tags: | added: oops |
tags: | added: canonical-losa-lp |
Changed in launchpad: | |
importance: | Low → Critical |
tags: | removed: oops |
Changed in launchpad: | |
importance: | Critical → High |
Looks like those 500s are requests failing at the apache level which makes sense for them to not generate an OOPS report. We need to debug why apache is returning the 500. This looks like a dupe of bug 193062