pycurl reports "select/poll returned error"

Bug #225020 reported by Martin Pool
6
Affects Status Importance Assigned to Milestone
Bazaar
Fix Released
High
Vincent Ladeuil

Bug Description

I just got this backtrace. It's currently occurring in all recent branches on my Hardy laptop. On my desktop machine (which does not have pycurl at present) it does not occur.

ERROR: test_open_containing (bzrlib.tests.test_bzrdir.ChrootedTests)

vvvv[log from bzrlib.tests.test_bzrdir.ChrootedTests.test_open_containing]----
46.869 got pycurl error: 55, select/poll returned error, (55, 'select/poll returned error'), url: http://localhost:34823/.bzr/smart
46.878 opening working tree '/tmp/testbzr-EsYveJ.tmp'

^^^^[log from bzrlib.tests.test_bzrdir.ChrootedTests.test_open_containing]----
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/mbp/bzr/controlfiles/bzrlib/tests/test_bzrdir.py", line 484, in test_open_containing
    self.get_readonly_url(''))
  File "/home/mbp/bzr/controlfiles/bzrlib/tests/__init__.py", line 959, in assertRaises
    callableObj(*args, **kwargs)
  File "/home/mbp/bzr/controlfiles/bzrlib/bzrdir.py", line 753, in open_containing
    return BzrDir.open_containing_from_transport(transport)
  File "/home/mbp/bzr/controlfiles/bzrlib/bzrdir.py", line 774, in open_containing_from_transport
    result = BzrDir.open_from_transport(a_transport)
  File "/home/mbp/bzr/controlfiles/bzrlib/bzrdir.py", line 728, in open_from_transport
    redirected)
  File "/home/mbp/bzr/controlfiles/bzrlib/transport/__init__.py", line 1658, in do_catching_redirections
    return action(transport)
  File "/home/mbp/bzr/controlfiles/bzrlib/bzrdir.py", line 705, in find_format
    transport, _server_formats=_server_formats)
  File "/home/mbp/bzr/controlfiles/bzrlib/bzrdir.py", line 1466, in find_format
    return format.probe_transport(transport)
  File "/home/mbp/bzr/controlfiles/bzrlib/bzrdir.py", line 2450, in probe_transport
    server_version = medium.protocol_version()
  File "/home/mbp/bzr/controlfiles/bzrlib/smart/medium.py", line 419, in protocol_version
    self._protocol_version = client_protocol.query_version()
  File "/home/mbp/bzr/controlfiles/bzrlib/smart/protocol.py", line 613, in query_version
    self.call('hello')
  File "/home/mbp/bzr/controlfiles/bzrlib/smart/protocol.py", line 481, in call
    self._request.finished_writing()
  File "/home/mbp/bzr/controlfiles/bzrlib/smart/medium.py", line 346, in finished_writing
    self._finished_writing()
  File "/home/mbp/bzr/controlfiles/bzrlib/transport/http/__init__.py", line 539, in _finished_writing
    data = self._medium.send_http_smart_request(self._buffer)
  File "/home/mbp/bzr/controlfiles/bzrlib/transport/http/__init__.py", line 518, in send_http_smart_request
    code, body_filelike = self._post(bytes)
  File "/home/mbp/bzr/controlfiles/bzrlib/transport/http/_pycurl.py", line 268, in _post
    self._curl_perform(curl, header, ['Expect: '])
  File "/home/mbp/bzr/controlfiles/bzrlib/transport/http/_pycurl.py", line 325, in _curl_perform
    curl.perform()
error: (55, 'select/poll returned error')

Revision history for this message
Martin Pool (mbp) wrote :

At one point I could reproduce this reliably on current trunk on my laptop. Now it's not recurring and I'm not sure what changed.

12:07 <spiv> poolie: I think I saw the pycurl problem without -Werror. I used strace and didn't see any sign of a problem on the server side, IIRC.
12:08 <spiv> Ah, I think I still have the straces somewhere.
12:08 <spiv> The straces also showed no error return from poll(2) either, IIRC.
12:09 <spiv> poolie: http://rafb.net/p/dlbvkD76.html
12:11 <spiv> poolie: that paste is part of an strace of a test run. Straight after that it starts re-reading .py files to build the traceback for the exception raised by pycurl.
12:12 <spiv> poolie: but the server behaved correctly (and as expected), that test server does not allow POST

Revision history for this message
Igor Zubkov (biow0lf) wrote :
Download full text (3.8 KiB)

I have the same problem with bzr.

make check fail with:

test_bzrdir.ChrootedTests.test_find_bzrdirs_list_current OK 276ms
test_bzrdir.ChrootedTests.test_open_containing ERROR 263ms
    (55, 'select/poll returned error')

======================================================================
ERROR: test_open_containing (bzrlib.tests.test_bzrdir.ChrootedTests)

vvvv[log from bzrlib.tests.test_bzrdir.ChrootedTests.test_open_containing]----
170.944 got pycurl error: 55, select/poll returned error, (55, 'select/poll returned error'), url: http://localhost:40608/.bzr/smart
170.976 opening working tree '/tmp/.private/icesik/testbzr-aFKDIS.tmp'

^^^^[log from bzrlib.tests.test_bzrdir.ChrootedTests.test_open_containing]----
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/icesik/RPM/BUILD/bzr-1.4/bzrlib/tests/test_bzrdir.py", line 484, in test_open_containing
    self.get_readonly_url(''))
  File "/home/icesik/RPM/BUILD/bzr-1.4/bzrlib/tests/__init__.py", line 959, in assertRaises
    callableObj(*args, **kwargs)
  File "/home/icesik/RPM/BUILD/bzr-1.4/bzrlib/bzrdir.py", line 753, in open_containing
    return BzrDir.open_containing_from_transport(transport)
  File "/home/icesik/RPM/BUILD/bzr-1.4/bzrlib/bzrdir.py", line 774, in open_containing_from_transport
    result = BzrDir.open_from_transport(a_transport)
  File "/home/icesik/RPM/BUILD/bzr-1.4/bzrlib/bzrdir.py", line 728, in open_from_transport
    redirected)
  File "/home/icesik/RPM/BUILD/bzr-1.4/bzrlib/transport/__init__.py", line 1658, in do_catching_redirections
    return action(transport)
  File "/home/icesik/RPM/BUILD/bzr-1.4/bzrlib/bzrdir.py", line 705, in find_format
    transport, _server_formats=_server_formats)
  File "/home/icesik/RPM/BUILD/bzr-1.4/bzrlib/bzrdir.py", line 1466, in find_format
    return format.probe_transport(transport)
  File "/home/icesik/RPM/BUILD/bzr-1.4/bzrlib/bzrdir.py", line 2450, in probe_transport
    server_version = medium.protocol_version()
  File "/home/icesik/RPM/BUILD/bzr-1.4/bzrlib/smart/medium.py", line 419, in protocol_version
    self._protocol_version = client_protocol.query_version()
  File "/home/icesik/RPM/BUILD/bzr-1.4/bzrlib/smart/protocol.py", line 613, in query_version
    self.call('hello')
  File "/home/icesik/RPM/BUILD/bzr-1.4/bzrlib/smart/protocol.py", line 481, in call
    self._request.finished_writing()
  File "/home/icesik/RPM/BUILD/bzr-1.4/bzrlib/smart/medium.py", line 346, in finished_writing
    self._finished_writing()
  File "/home/icesik/RPM/BUILD/bzr-1.4/bzrlib/transport/http/__init__.py", line 539, in _finished_writing
    data = self._medium.send_http_smart_request(self._buffer)
  File "/home/icesik/RPM/BUILD/bzr-1.4/bzrlib/transport/http/__init__.py", line 518, in send_http_smart_request
    code, body_filelike = self._post(bytes)
  File "/home/icesik/RPM/BUILD/bzr-1.4/bzrlib/transport/http/_pycurl.py", line 268, in _post
    self._curl_perform(curl, header, ['Expect: '])
  File "/home/icesik/RPM/BUILD/bzr-1.4/bzrlib/transport/htt...

Read more...

Changed in bzr:
status: New → Confirmed
Revision history for this message
Andrew Bennetts (spiv) wrote :

This is now preventing me from landing any changes to bzr.dev. I get this error all the time from PQM now. Bumping the priority up to Critical.

Changed in bzr:
importance: Undecided → Critical
milestone: none → 1.6
Revision history for this message
Robert Collins (lifeless) wrote :

I have asked for pycurl to be removed from the pqm chroot as an interim measure to prevent blocking everything while this is debugged.

We should still fix this as soon as possible so that pycurl support doesn't bitrot.

Revision history for this message
Martin von Gagern (gagern) wrote :

I get similar error messages here, for that test, and it seems to depend on the current locale.
I ran "./bzr selftest bzrlib.tests.test_bzrdir.ChrootedTests.test_open_containing" on bzr.dev r3490.
No locale specified => success
LANG or LC_CTYPE set to "de_DE" => fails
LANG or LC_CTYPE set to "de_DE.utf8" => success
Maybe this helps reproducing and investigating the issue.

Revision history for this message
Martin Pool (mbp) wrote : Re: [Bug 225020] Re: pycurl reports "select/poll returned error"

On Tue, Jun 17, 2008 at 12:24 AM, Martin von Gagern
<email address hidden> wrote:
> I get similar error messages here, for that test, and it seems to depend on the current locale.
> I ran "./bzr selftest bzrlib.tests.test_bzrdir.ChrootedTests.test_open_containing" on bzr.dev r3490.
> No locale specified => success
> LANG or LC_CTYPE set to "de_DE" => fails
> LANG or LC_CTYPE set to "de_DE.utf8" => success
> Maybe this helps reproducing and investigating the issue.

Is it consistently reproducible with those settings? That's a bit
amazing. If so thanks very much for tracking it down!

--
Martin <http://launchpad.net/~mbp/>

Revision history for this message
Andrew Bennetts (spiv) wrote :

Martin Pool wrote:
> On Tue, Jun 17, 2008 at 12:24 AM, Martin von Gagern
> <email address hidden> wrote:
> > I get similar error messages here, for that test, and it seems to depend on the current locale.
> > I ran "./bzr selftest bzrlib.tests.test_bzrdir.ChrootedTests.test_open_containing" on bzr.dev r3490.
> > No locale specified => success
> > LANG or LC_CTYPE set to "de_DE" => fails
> > LANG or LC_CTYPE set to "de_DE.utf8" => success
> > Maybe this helps reproducing and investigating the issue.
>
> Is it consistently reproducible with those settings? That's a bit
> amazing. If so thanks very much for tracking it down!

Changing LANG and/or LC_CTYPE doesn't help for me. Running "./bzr --no-plugins
selftest -s bzrlib.tests.test_bzrdir.ChrootedTests.test_open_containing" runs 3
tests, and regardless of environment variables I get one or two errors
everytime.

It's definitely a race condition somewhere. Perhaps on Martin von Gagern's
system changing the locale perturbs things just enough to miss the race, or
(more likely I think) those runs were just lucky, and the locale is a red
herring.

Martin: did you try running that command several times with each enviroment
setting?

-Andrew.

Revision history for this message
Martin von Gagern (gagern) wrote :

> Martin: did you try running that command several times with each enviroment
> setting?

I had repeated the combinations twice, but it seems that was just luck; repeating more often gives me different results for the same settings. Sorry.

Adding some printf to the curl code, more specifically to Curl_socket_ready in select.c, indicates this scenario:

readfd == CURL_SOCKET_BAD
writefd != CURL_SOCKET_BAD
pfd[0].revents == POLLERR | POLLHUP

http://cool.haxx.se/cvs.cgi/curl/lib/select.c?only_with_tag=curl-7_18_2 has a browsable version of the code, if you want to have a look without fetching the sources.

I haven't yet investigated what this actually signifies. Looks to me like curl considers sending some more data, while the remote side has already closed the connection.

I think it strange that there are no bug reports about this out there. Maybe something exotic in the way bzr uses curl?

P.S.: I had mailed this comment ages ago, but it seems launchpad ate it. :-(

Martin Pool (mbp)
Changed in bzr:
importance: Critical → High
Revision history for this message
Vincent Ladeuil (vila) wrote :

Reproduced here while reviewing a patch.

Occurs on every branch I've tried, including bzr trunk :-/

First test to fail is different in different branches.

Vincent Ladeuil (vila)
Changed in bzr:
assignee: nobody → vila
milestone: 1.6 → 1.7
Revision history for this message
Vincent Ladeuil (vila) wrote :

So, all the occurrences I get are related to the smart server probing.

More precisely against HTTP/1.0 returning 403 errors.

The attached http stream decoded with wireshark reveals that the body request is sent after receiving the response from the server.

So it looks like pycurl is trying to send the body when the server has already closed the connection.

I suspect that it's recent modification to curl that introduced this bug (which would explain why I encounter it only now, I upgraded to hardy a few days ago).

The most reasonable fix seems to just trap the error and simply consider that the smart server is not there. I'll send a patch implementing that.

Revision history for this message
Vincent Ladeuil (vila) wrote :

<cough>
The attachment above contains html, don't try to look at it with a browser.

Here is the corresponding capture.

Vincent Ladeuil (vila)
Changed in bzr:
status: Confirmed → Fix Committed
Revision history for this message
Vincent Ladeuil (vila) wrote :
Vincent Ladeuil (vila)
Changed in bzr:
status: Fix Committed → Fix Released
Revision history for this message
Vincent Ladeuil (vila) wrote :

Reopened, it's back under different circumstances.

Upstream bug has been diagnosed, it can occur with other error codes than 403. Better fix in progress.

Changed in bzr:
status: Fix Released → In Progress
Revision history for this message
Daniel Stenberg (daniel-haxx) wrote :

I have submitted a (first) suggested patch in the upstream bug report and I'd appreciate feedback from people you can repeat this problem:

http://sourceforge.net/tracker/index.php?func=detail&aid=2080222&group_id=976&atid=100976

I think the fix should be enough to kill the most serious bug. There's also another fix to do to make this thing behave nicer when the >= 400 code is returned early on so that libcurl can stop sending data (in vain) but that fix is somewhat larger and I don't want to do that right now as I hope to be able to do a new libcurl release in just a few days.

I'm hoping on your feedback to get this particular patch included in the release.

Revision history for this message
Vincent Ladeuil (vila) wrote :

Looks like we were both tracking the issue in both bug trackers :-)

So to summarize here, the upstream patch will be available in curl-7.19.0.

I confirmed that it fixed our use case in the test suite and introduced no regressions.

Changed in bzr:
status: In Progress → Fix Committed
Vincent Ladeuil (vila)
Changed in bzr:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.