juju cannot connect to zookeeper - too many connections (max 10)

Bug #1078242 reported by Zygmunt Krynicki
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
juju (Ubuntu)
New
Medium
Unassigned

Bug Description

My juju cluster running on precise had severe connection issues to zookeeper. Looking at zookeeper logs I can see that my juju nodes cannot connect as they already have pending connections.

Zookeeper log is full of lines like (quick count gives me around 30,000)

2012-11-08 06:46:48,777 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@247] - Too many connections from /10.55.60.51 - max is 10

There are also session expiration messages (around 1000)

2012-11-08 06:46:58,000 - INFO [SessionTracker:ZooKeeperServer@316] - Expiring session 0x13ad043e9460400, timeout of 10000ms exceeded

And exceptions (about 4000)

2012-11-08 06:47:05,663 - ERROR [SyncThread:0:NIOServerCnxn@445] - Unexpected Exception:
java.nio.channels.CancelledKeyException
        at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
        at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
        at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:418)
        at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1509)
        at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:162)
        at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:135)

ProblemType: Bug
DistroRelease: Ubuntu 12.04
Package: juju 0.5+bzr531-0ubuntu1.3
ProcVersionSignature: User Name 3.2.0-32.51-virtual 3.2.30
Uname: Linux 3.2.0-32-virtual x86_64
ApportVersion: 2.0.1-0ubuntu14
Architecture: amd64
Date: Tue Nov 13 11:11:30 2012
Ec2AMI: ami-000000bf
Ec2AMIManifest: FIXME
Ec2AvailabilityZone: nova
Ec2InstanceType: m1.small
Ec2Kernel: unavailable
Ec2Ramdisk: unavailable
PackageArchitecture: all
ProcEnviron:
 TERM=xterm-256color
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: juju
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Zygmunt Krynicki (zyga) wrote :
Revision history for this message
James Page (james-page) wrote :

Zookeeper includes a default limit per client of 10 connections; this is to ensure that a single rogue client can't kill a zookeeper cluster.

This points to something else bad happening in your environment.

If more than ten connections per client are required for juju, then maxClientCnxns can be specified in the zookeeper config file on the bootstrap node - although I don't think this should be done.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Could you please post the charms you were using, your sanitized environments.yaml, and the set of juju commands you used, to help us reproduce and debug the issue you had?

Changed in juju (Ubuntu):
importance: Undecided → Medium
status: New → Incomplete
Revision history for this message
Zygmunt Krynicki (zyga) wrote :

Sure, the cluster is still up at canonistack if you want to poke at them:

I've used two charms: jenkins x1 and jenkins-slave x2, vanilla on precise.

My envirionments.yaml looks like this:

environments:
  canonistack:
    type: ec2
    ec2-uri: http://91.189.93.65:8773/services/Cloud
    s3-uri: http://91.189.93.65:3333
    access-key: <redacted>
    secret-key: <redacted>
    default-image-id: ami-000000bf #64-bit precise, or see euca-describe-images and pick one.
    default-series: precise
    control-bucket: <redacted>
    admin-secret: <redacted>
    authorized-keys-path: ~/.ssh/authorized_keys

Changed in juju (Ubuntu):
assignee: nobody → Serge Hallyn (serge-hallyn)
status: Incomplete → New
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

I'm rapidly running out of time this year so may not be able to. (Might ask for someone on #juju.)

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Assigning James Page as he is the maintainer of the jenkins charms.

Changed in juju (Ubuntu):
assignee: Serge Hallyn (serge-hallyn) → James Page (james-page)
Revision history for this message
James Page (james-page) wrote :

This is a juju issue, not a charm issue; I suspect something bad happened an caused the juju agent of go wild.

Are you still seeing this issue?

Changed in juju (Ubuntu):
assignee: James Page (james-page) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.