selftest fails with "can't start new thread"

Bug #392127 reported by pva
44
This bug affects 6 people
Affects Status Importance Assigned to Milestone
Bazaar
Fix Released
High
Vincent Ladeuil
Gentoo Linux
Invalid
Undecided
Unassigned

Bug Description

bzr-1.16_rc1 builds and passes tests as expected while during tests run of bzr-1.16 there are lot's of following errors:

ERROR: test_http.TestProxyAuth.test_user_from_auth_conf(urllib,HTTP/1.1,basicdigest)
    Connection error: while sending GET http://localhost:54573/a: [Errno 104] Connection reset by peer

ERROR: test_http.TestProxyAuth.test_user_from_auth_conf(pycurl,HTTP/1.0,basic)
    can't start new thread

ERROR: test_http.TestProxyAuth.test_user_from_auth_conf(pycurl,HTTP/1.0,digest)
    can't start new thread

ERROR: test_http.TestProxyAuth.test_user_from_auth_conf(pycurl,HTTP/1.0,basicdigest)
    can't start new thread

ERROR: test_http.TestProxyAuth.test_user_from_auth_conf(pycurl,HTTP/1.1,basic)
    can't start new thread

ERROR: test_http.TestProxyAuth.test_user_from_auth_conf(pycurl,HTTP/1.1,digest)
    can't start new thread

...

and sometimes it even hangs on

Exception in thread server-bzr://127.0.0.1:44440/extra/:le_files.TestLockableFiles_LockDir.test_unlock_after_lock_write_with_tok
Traceback (most recent call last):
  File "/usr/lib/python2.6/threading.py", line 525, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.6/threading.py", line 477, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/vt/portage/tmp/portage/dev-util/bzr-1.16/work/bzr-1.16/bzrlib/smart/server.py", line 130, in serve
    self.serve_conn(conn, thread_name_suffix)
  File "/vt/portage/tmp/portage/dev-util/bzr-1.16/work/bzr-1.16/bzrlib/smart/server.py", line 164, in serve_conn
    connection_thread.start()
  File "/usr/lib/python2.6/threading.py", line 471, in start
    _start_new_thread(self.__bootstrap, ())
error: can't start new thread

Is this a known failure and can we ignore this errors?

Related branches

Revision history for this message
pva (pva) wrote :

This error still exists in 1.16.1. Could anybody review if this failures are critical or not? Thanks.

Revision history for this message
Vincent Ladeuil (vila) wrote :

@pva: thanks for shepherding bzr on gentoo.

This is likely caused by our tests leaking threads and python failing to garbage collect them fast enough.
Can you try:

   bzr selftest --starting-with bt.test_http

then you can safely ignore the whole test suite failures.

The test suite hanging when a thread can't start may be related to bug #392402

Changed in bzr:
importance: Undecided → Medium
status: New → Incomplete
Revision history for this message
pva (pva) wrote :

bzr selftest --starting-with bt.test_http
hangs on
[pva in /vt/portage/tmp/portage/dev-util/bzr-1.16.1/work/bzr-1.16.1]
 $=> bzr selftest --starting-with bt.test_http
testing: /usr/bin/bzr
   /usr/lib/python2.6/site-packages/bzrlib (1.16.1 python2.6.2)

XFAIL: test_http.TestInvalidStatusServer.test_http_get(pycurl,HTTP/1.1)
pycurl hangs if the server send back garbage

XFAIL: test_http.TestInvalidStatusServer.test_http_has(pycurl,HTTP/1.1)
pycurl hangs if the server send back garbage

ERROR: test_http.TestWallServer.test_http_get(pycurl,HTTP/1.1)
    (56, 'Failure when receiving data from the peer')

ERROR: test_http.TestWallServer.test_http_has(pycurl,HTTP/1.1)
    (56, 'Failure when receiving data from the peer')

XFAIL: test_http.TestProxyAuth.test_changing_nonce(pycurl,HTTP/1.0,digest)
pycurl does not handle a nonce change

XFAIL: test_http.TestProxyAuth.test_changing_nonce(pycurl,HTTP/1.1,digest)
pycurl does not handle a nonce change

XFAIL: test_http.TestAuth.test_changing_nonce(pycurl,HTTP/1.0,digest)
pycurl does not handle a nonce change

XFAIL: test_http.TestAuth.test_changing_nonce(pycurl,HTTP/1.1,digest)
pycurl does not handle a nonce change

[574/709 in 26s, 2 err] test_http.TestAuth.test_wrong_pass(pycurl,HTTP/1.1,basicdigest)

I've tried to strace on that process and it tries to read something:

camobap efence # strace -p 15809
Process 15809 attached - interrupt to quit
read(19,

This is also reproducible with the current bzr.dev branch.

Revision history for this message
Vincent Ladeuil (vila) wrote :

That's a different failure, can you please file a new bug for it ?

Also, when it hangs, can you try C-\'ing and report the backtrace (if you can't interrupt
or don't get a pdb prompt, it will be useful to know about it too as it will indicate a different source
of problems).

Revision history for this message
Vincent Ladeuil (vila) wrote :

Reproduced on a gentoo vm.

A workaround is to run with --parallel=fork which allows the test suite to complete without any errors.

Changed in bzr:
assignee: nobody → Vincent Ladeuil (vila)
assignee: Vincent Ladeuil (vila) → nobody
status: Incomplete → Confirmed
summary: - bzr-1.16 error: can't start new thread
+ bzr >= 1.16 on gentoo: can't start new thread
Changed in bzr:
importance: Medium → High
Revision history for this message
Vincent Ladeuil (vila) wrote : Re: bzr >= 1.16 on gentoo: can't start new thread

@pva: AFAICS this bug is the only one that should still be mentioned in the ebuild files, can you confirm and update them ?

Vincent Ladeuil (vila)
Changed in bzr:
assignee: nobody → Vincent Ladeuil (vila)
Revision history for this message
pva (pva) wrote :

Vincent, yes, this is the last one. (well, bzrlib.tests.test_osutils.TestWalkDirs fails too but the problem is not reproducible with the trunk). I've updated ebuild.

Revision history for this message
Vincent Ladeuil (vila) wrote :

\o/

In addition, I'll have a fix shortly based on the fix for bug #405745, I should be able to avoid
using shutdown() and since I will have a clean way to stop the server, it will become possible
to properly collect the sockets and the threads that were leaking and causing *this* bug...

Revision history for this message
Vincent Ladeuil (vila) wrote :

\o_
I have a fix allowing the full test suite to pass, it requires python-2.6 though.

Changed in bzr:
status: Confirmed → In Progress
Revision history for this message
pva (pva) wrote :

In Gentoo python-2.6 is stable, so this is not a problem.

Revision history for this message
Vincent Ladeuil (vila) wrote :

So it turns out this bug is really about SocketServer objects that are *not* intended to be used with a client
*thread* in the same process.

While I've made significant progress in reducing both the number of leaking tests and the threads left around
by a full test suite run, I now experiment a new kind of hang on FreeBSD and OSX.

I can't therefore release such a fix without digging further :-/

The status is that using '--parallel=fork' to run the full test suite is the only way to make it pass on all
platforms known to our test botnet (babune).

Revision history for this message
Vincent Ladeuil (vila) wrote :

To further clarify:
- I have a fix that allows the test suite to complete with or without --parallel=fork on Ubuntu (hardy, jaunty, karmic)
  and gentoo,
- that fix hangs when running with or without --parallel=fork on FreeBSD8, tiger and leopard

But we are already hanging or failing because we use either too many open files or too many threads
(on OSX and Windows at least) when running without --parallel=fork.

Since we know the problem is for our test framework only (i.e. the whole test suite currently passes
when using --parallel=fork), we need a deeper fix than the one already proposed in the related branch.

Martin Pool (mbp)
summary: - bzr >= 1.16 on gentoo: can't start new thread
+ selftest fails with "can't start new thread"
tags: added: selftest
Martin Packman (gz)
tags: added: babune
Vincent Ladeuil (vila)
Changed in bzr:
milestone: none → 2.3b1
status: In Progress → Fix Released
Martin Packman (gz)
Changed in gentoo:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.