So, to possibly state the obvious, the problem here was that the conch process ran out of file handles, followed by poor error recovery?
In which case there are two more or less questions: 1) why did conch run out of file handles? 2) can we handle this situation better?
For 1), is it just that there is an extra fd open per connection, or is there a leak? If the former, then we're presumably pretty close to hitting the fd limit in production from time to time!
So, to possibly state the obvious, the problem here was that the conch process ran out of file handles, followed by poor error recovery?
In which case there are two more or less questions: 1) why did conch run out of file handles? 2) can we handle this situation better?
For 1), is it just that there is an extra fd open per connection, or is there a leak? If the former, then we're presumably pretty close to hitting the fd limit in production from time to time!