Connection reaper killed connection due to a SoftRequestTimeout
Bug #335172 reported by
Diogo Matsubara
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Launchpad itself |
Won't Fix
|
Critical
|
Unassigned |
Bug Description
As seen in OOPS-1152EA162 a DisconnectionError was logged due to a previous query taking too much time to be executed.
In OOPS-1152EA161 one can see that the preceding request to the DisconnectionError took 174946 ms of SQL time.
https:/
Changed in launchpad-foundations: | |
status: | New → Triaged |
importance: | Undecided → Low |
Changed in launchpad: | |
importance: | Low → Critical |
To post a comment you must log in.
From today's meeting:
<matsubara> herb, anything happened to the DB during the time of this OOPS-1152EA162? lp-errors list/
<matsubara> or maybe stub might know ^
<herb> matsubara: nothing in the incident log.
<stub> matsubara: That is one of the connection reaper scripts kicking in
<herb> matsubara: I think that's also on the void between LOSAs.
<herb> ah, there we go.
<stub> We kill connections idle in a transaction more than a few hours (and should be more agressive), and appserver connections that have been in a transaction for more than 2 minutes.
<Ursinha> stub, I see
<matsubara> stub, ok. so if we start seeing too many of those, we have a problem somewhere and a few is kinda normal?
<stub> The notification gets sent to the error-reports list (where we can confirm that this is indeed what happened)
<matsubara> stub, aha. that's better. I'll chase the lp-errors for that one
<matsubara> s/lp-errors/
<stub> If we see many of them, we have a problem. One is probably a problem - appserver requests taking two minutes on the db means we need to investigate why the normal timeout mechanisms didn't work.
<matsubara> right. thanks for the explanation
<stub> -1 second non-sql time, 0 seconds total time indicates a problem at the appserver? The request never got started?
<matsubara> I'll file a bug about that one and we can discuss there
<stub> hmm... might be a reconnection bug - perhaps the previous request handled by that thread got killed?
<stub> I don't know if we Retry on DisconnectionError exceptions, or if it is a good idea in all cases.
<matsubara> ok