daisy is causing swift tracebacks

Bug #1298683 reported by David Ames
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Daisy
Fix Released
Undecided
Unassigned

Bug Description

Digging through our swift logs we found tracebacks that only occur with daisy:

https://pastebin.canonical.com/107314/

ValueError: invalid literal for int() with base 16: '' (txn: tx9894b9bb7e3448f0bdc146a6f6fe7baf)

This seems like it is a software problem on daisy's part. Please take a look.

David Ames (thedac)
tags: added: canonical-webops-eng
Revision history for this message
James Troup (elmo) wrote :

And for the avoidance of doubt this has been happening for days/weeks; it's unrelated to today's swift
outage.

Revision history for this message
Brian Murray (brian-murray) wrote :

I'm not sure this is necessarily related but the graph of swift_client_exception_count is growing steadily for the daisy production machines. Additionally, looking at the daisy log files we can see multiple occurrences of the following which is directly related to the client_exception_count.

Exception when trying to add to bucket: put_object('daisy-production-cores', 'a3ed9184-ca49-11e3-bd0a-fa163e373683', ...) failure and no ability to reset contents for reupload.
Exception when trying to add to bucket: put_object('daisy-production-cores', 'a4202914-ca49-11e3-bd0a-fa163e373683', ...) failure and no ability to reset contents for reupload.

Revision history for this message
Brian Murray (brian-murray) wrote :

Looking again at the daisy-error log files I noticed the following.

swift token: fc7c7c1a825b4c15ae02110f3d19bcd5
Exception when trying to add to bucket: put_object('daisy-production-cores', 'eb67b504-ca7a-11e3-b23f-fa163e373683', ...) failure and no ability to reset contents for reupload.
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/gevent/pywsgi.py", line 438, in handle_one_response
    self.run_application()
  File "/usr/lib/python2.7/dist-packages/gevent/pywsgi.py", line 425, in run_application
    self.process_result()
  File "/usr/lib/python2.7/dist-packages/gevent/pywsgi.py", line 418, in process_result
    self.write('')
  File "/usr/lib/python2.7/dist-packages/gevent/pywsgi.py", line 373, in write
    self.socket.sendall(msg)
  File "/usr/lib/python2.7/dist-packages/gevent/socket.py", line 509, in sendall
    data_sent += self.send(_get_memory(data, data_sent), flags)
  File "/usr/lib/python2.7/dist-packages/gevent/socket.py", line 483, in send
    return sock.send(data, flags)
error: [Errno 32] Broken pipe
<PyWSGIServer fileno=7 address=0.0.0.0:8080>: Failed to handle request:
  request = POST /eb67b504-ca7a-11e3-b23f-fa163e373683/submit-core/i386/101e2f7fd822eecf90db5b01253243180b330b543441c96e8cb0839b90f78c07795ca4a3a9c621d40dba57797608607c11a771bbeeeea519de87576d007ce1fe HTTP/1.1 from ('10.33.17.130', 33425)
  application = <function oops_middleware at 0x308fc80>

Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/gevent/greenlet.py", line 390, in run
    result = self._run(*self.args, **self.kwargs)
  File "/usr/lib/python2.7/dist-packages/gevent/pywsgi.py", line 571, in handle
    handler.handle()
  File "/usr/lib/python2.7/dist-packages/gevent/pywsgi.py", line 186, in handle
    self.socket.sendall(response_body)
  File "/usr/lib/python2.7/dist-packages/gevent/socket.py", line 509, in sendall
    data_sent += self.send(_get_memory(data, data_sent), flags)
  File "/usr/lib/python2.7/dist-packages/gevent/socket.py", line 483, in send
    return sock.send(data, flags)
error: [Errno 32] Broken pipe
<Greenlet at 0x2fb6af0: <bound method PyWSGIServer.handle of <PyWSGIServer at 0x308c510 fileno=7 address=0.0.0.0:8080>>(<socket at 0x32d2e10 fileno=[Errno 9] Bad file des, ('10.33.17.130', 33425))> failed with error

I'm not certain this actually an error with daisy.

Revision history for this message
Brian Murray (brian-murray) wrote :

Looking at the rate of swift_client_exception it seems to drop during the night and then spike again during the day, however looking at the oopses rate the same thing happens so it is surely just related to the volume of oopses received.

seelaman updated client_timeout for swift proxy from 60 to 90 but that seems to have no effect on the quantity of switch_client_exceptions

Revision history for this message
Brian Murray (brian-murray) wrote :

The error in comment #3 was resolved in daisy with a change to daisy to write the corefile to a temporary file on disk, so that we could get retry behavior from swift and know the content length to send to swift. This has landed in production and has stopped the swift_client_exception counter from incrementing. I'm not certain this resolves the original issue seen in the swift log files though.

------------------------------------------------------------
revno: 452
committer: Brian Murray <email address hidden>
branch nick: trunk
timestamp: Thu 2014-06-12 10:56:33 -0700
message:
  submit_core.py: use a temporary file for the core dump to get retry behavior, also log core file size
------------------------------------------------------------
revno: 455
committer: Brian Murray <email address hidden>
branch nick: trunk
timestamp: Fri 2014-06-13 10:25:41 -0700
message:
  submit_core.py: pass the content_length to swift's put_object
------------------------------------------------------------

Revision history for this message
Brian Murray (brian-murray) wrote :

07:38 <brian> Could you comment on bug 1298683? Is there
              still more work to be done?
07:44 <thedac> yes, and yes. I will double check but last
               I checked that specific traceback still
               occurred in the swift logs

Revision history for this message
David Ames (thedac) wrote :

We are still seeing the original problem on the swift side:

https://pastebin.canonical.com/107314/

ValueError: invalid literal for int() with base 16: '' (txn: tx9894b9bb7e3448f0bdc146a6f6fe7baf)

Revision history for this message
Brian Murray (brian-murray) wrote :

David - I believe we ended up resolving this sometime in July. Is the error still being seen in swift?

Revision history for this message
David Ames (thedac) wrote :

Just checked the logs and there are no occurances in the last week of the logs.
This actually suprises me at the time of the 2014-06-19 post we still saw the issue.
However, the logs do not lie.

Revision history for this message
Brian Murray (brian-murray) wrote :

Setting to Fixed based on the veracity of the logs.

Changed in daisy:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.