Zookeeper errors in local provider cause strange status view and possibly broken topology

Bug #875903 reported by Clint Byrum
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
pyjuju
Confirmed
Medium
Unassigned

Bug Description

I saw these errors in the unit.log, and after that, status transitioned in a most unexpected way:

clint@clint-MacBookPro:~$ juju destroy-service ceph
2011-10-16 11:34:51,764 INFO Service 'ceph' destroyed.
2011-10-16 11:34:51,764 INFO 'destroy_service' command finished successfully
clint@clint-MacBookPro:~$ juju deploy --repository charms local:ceph
2011-10-16 11:35:06,764 INFO Charm deployed as service: 'ceph'
2011-10-16 11:35:06,764 INFO 'deploy' command finished successfully
clint@clint-MacBookPro:~$ juju add-unit ceph
2011-10-16 11:36:27,898 INFO Unit 'ceph/13' added to service 'ceph'
2011-10-16 11:36:27,900 INFO 'add_unit' command finished successfully
clint@clint-MacBookPro:~$ juju add-unit ceph
2011-10-16 11:36:29,621 INFO Unit 'ceph/14' added to service 'ceph'
2011-10-16 11:36:29,623 INFO 'add_unit' command finished successfully
clint@clint-MacBookPro:~$ juju status
machines:
  0: {dns-name: localhost, instance-id: local}
services:
  ceph:
    charm: local:oneiric/ceph-39
    relations: {mds: ceph, mon: ceph, osd: ceph, ssh: ceph}
    units:
      ceph/12:
        machine: 0
        public-address: 192.168.122.72
        relations:
          mds: {state: up}
          mon: {state: up}
          osd: {state: up}
          ssh: {state: up}
        state: started
      ceph/13:
        machine: 0
        public-address: null
        relations: {}
        state: null
      ceph/14:
        machine: 0
        public-address: null
        relations: {}
        state: null
2011-10-16 11:36:35,776 INFO 'status' command finished successfully
clint@clint-MacBookPro:~$ sudo tail -f .juju/data/clint-local/units/ceph-12/unit.log
2011-10-16 18:36:26,134: unit.relation.lifecycle@DEBUG: started relation:mon lifecycle
2011-10-16 18:36:26,142: statemachine@DEBUG: relationworkflowstate: transition complete start (state up) {}
2011-10-16 18:36:26,143: unit.lifecycle@DEBUG: started unit lifecycle
2011-10-16 18:36:26,176: statemachine@DEBUG: unitworkflowstate: transition complete start (state started) {}
2011-10-16 18:36:36,291:469(0x7f5d29462700):ZOO_ERROR@handle_socket_error_msg@1528: Socket [192.168.122.1:56037] zk retcode=-7, errno=110(Connection timed out): connection timed out (exceeded timeout by 5ms)
2011-10-16 18:36:39,784:469(0x7f5d29462700):ZOO_WARN@zookeeper_interest@1461: Exceeded deadline by 3496ms
2011-10-16 18:36:46,453:469(0x7f5d29462700):ZOO_ERROR@handle_socket_error_msg@1528: Socket [192.168.122.1:56037] zk retcode=-7, errno=110(Connection timed out): connection timed out (exceeded timeout by 3ms)
2011-10-16 18:36:49,788:469(0x7f5d29462700):ZOO_WARN@zookeeper_interest@1461: Exceeded deadline by 3338ms
2011-10-16 18:36:49,788:469(0x7f5d29462700):ZOO_INFO@check_events@1585: initiated connection to server [192.168.122.1:56037]
2011-10-16 18:36:49,789:469(0x7f5d29462700):ZOO_ERROR@handle_socket_error_msg@1621: Socket [192.168.122.1:56037] zk retcode=-112, errno=116(Stale NFS file handle): sessionId=0x1330dd27c6e002f has expired.
^Cclint@clint-MacBookPro:~$ juju status
machines:
  0: {dns-name: localhost, instance-id: local}
services:
  ceph:
    charm: local:oneiric/ceph-39
    relations: {mds: ceph, mon: ceph, osd: ceph, ssh: ceph}
    units:
      ceph/12:
        machine: 0
        public-address: 192.168.122.72
        relations: {}
        state: started
      ceph/13:
        machine: 0
        public-address: 192.168.122.94
        relations:
          mds: {state: up}
          mon: {state: up}
          osd: {state: up}
          ssh: {state: up}
        state: started
      ceph/14:
        machine: 0
        public-address: 192.168.122.208
        relations:
          mds: {state: up}
          mon: {state: up}
          osd: {state: up}
          ssh: {state: up}
        state: started
2011-10-16 11:39:31,541 INFO 'status' command finished successfully
clint@clint-MacBookPro:~$
clint@clint-MacBookPro:~$

Will attach the unit.logs from all 3 and the charm as a zip

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Attaching unit logs, machine-agent log, and the bundled charm from datadir/files/...

Changed in juju:
milestone: none → florence
Changed in juju:
importance: Undecided → Medium
Revision history for this message
Scott Moser (smoser) wrote :

Any status on this ? Any work arounds? It seems to me that maybe it occurs under load, but i randomly see it.

Is there any way to fix this?

Changed in juju:
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.