zookeeper connection is not using exponential backoff

Bug #1078217 reported by Zygmunt Krynicki
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
txzookeeper
New
Undecided
Unassigned
juju (Ubuntu)
Confirmed
Medium
Unassigned

Bug Description

My juju cluster had some connection issues to zookeeper. While I was reading the charm.log of my jenkins-slave unit I noticed that juju had logged many thousands of exceptions such as this one:

2012-11-09 06:51:07,514: twisted@ERROR: Traceback (most recent call last):
2012-11-09 06:51:07,514: twisted@ERROR: File "/usr/lib/python2.7/dist-packages/txzookeeper/managed.py", line 319, in _cb_created
2012-11-09 06:51:07,514: twisted@ERROR: if self._check_result(result_code, d):
2012-11-09 06:51:07,514: twisted@ERROR: File "/usr/lib/python2.7/dist-packages/txzookeeper/client.py", line 219, in _check_result
2012-11-09 06:51:07,514: twisted@ERROR: self, error)
2012-11-09 06:51:07,515: twisted@ERROR: File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 134, in maybeDeferred
2012-11-09 06:51:07,515: twisted@ERROR: result = f(*args, **kw)
2012-11-09 06:51:07,515: twisted@ERROR: File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 1181, in unwindGenerator
2012-11-09 06:51:07,515: twisted@ERROR: return _inlineCallbacks(None, gen, Deferred())
2012-11-09 06:51:07,516: twisted@ERROR: --- <exception caught here> ---
2012-11-09 06:51:07,516: twisted@ERROR: File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 1039, in _inlineCallbacks
2012-11-09 06:51:07,516: twisted@ERROR: result = g.send(result)
2012-11-09 06:51:07,516: twisted@ERROR: File "/usr/lib/python2.7/dist-packages/txzookeeper/managed.py", line 257, in _cb_connection_error
2012-11-09 06:51:07,517: twisted@ERROR: raise error
2012-11-09 06:51:07,517: twisted@ERROR: zookeeper.ConnectionLossException: connection loss

I can see about 300 such exceptions _every second_. This is very bad on two levels:

1) It quickly fills the log with pointless exceptions, using disk space, saturating slow virtual IO
2) It is against proven network practice of using exponential backoff when retrying failed communication

ProblemType: Bug
DistroRelease: Ubuntu 12.04
Package: juju 0.5+bzr531-0ubuntu1.3
ProcVersionSignature: User Name 3.2.0-32.51-virtual 3.2.30
Uname: Linux 3.2.0-32-virtual x86_64
ApportVersion: 2.0.1-0ubuntu14
Architecture: amd64
Date: Tue Nov 13 09:26:31 2012
Ec2AMI: ami-000000bf
Ec2AMIManifest: FIXME
Ec2AvailabilityZone: nova
Ec2InstanceType: m1.small
Ec2Kernel: unavailable
Ec2Ramdisk: unavailable
PackageArchitecture: all
ProcEnviron:
 TERM=xterm-256color
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: juju
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Zygmunt Krynicki (zyga) wrote :
Changed in juju (Ubuntu):
importance: Undecided → Medium
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in juju (Ubuntu):
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.