ceph-radosgw restart fails
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
ceph (Ubuntu) |
Fix Released
|
High
|
James Page | ||
Trusty |
Fix Released
|
High
|
Liam Young | ||
Vivid |
Fix Released
|
High
|
James Page | ||
Wily |
Fix Released
|
High
|
James Page |
Bug Description
Upstream Bug: http://
[Impact]
On 14.04 the restart target of the sysvinit script brings the service down
but sometimes fails to bring the service back up again. There is a race between stop and start and in the failure case the attempt to bring the service up runs before the service has been stopped and the start command is never issued:
The proposed fix updates /etc/init.d/radosgw so that the stop target
waits for up to 30 seconds for the service to stop cleanly
[Test Case]
Bundle:
openstack-services:
services:
mysql:
branch: lp:~openstack-charmers/charms/trusty/percona-cluster/next
constraints: mem=1G
options:
ceph:
branch: lp:~openstack-charmers/charms/trusty/ceph/next
num_units: 3
constraints: mem=1G
options:
fsid: 6547bd3e-
keystone:
branch: lp:~openstack-charmers/charms/trusty/keystone/next
constraints: mem=1G
options:
ceph-radosgw:
branch: lp:~openstack-charmers/charms/trusty/ceph-radosgw/next
options:
relations:
- [ keystone, mysql ]
- [ ceph-radosgw, keystone ]
- [ ceph-radosgw, ceph ]
# kilo
trusty-kilo:
inherits: openstack-services
series: trusty
overrides:
openstack-
source: cloud:trusty-kilo
trusty-icehouse:
inherits: openstack-services
series: trusty
$ juju-deployer -c next.yaml trusty-icehouse
$ juju ssh ceph-radosgw/0
$ sudo su -
# service radosgw status
/usr/bin/radosgw is running.
# service radosgw restart
Starting client.
/usr/bin/radosgw already running.
/usr/bin/radosgw is running.
# service radosgw status
/usr/bin/radosgw is not running.
# apt-cache policy radosgw
radosgw:
Installed: 0.80.10-
Candidate: 0.80.10-
Version table:
*** 0.80.10-
500 http://
100 /var/lib/
0.79-0ubuntu1 0
500 http://
root@juju-
[Regression Potential]
* The only change in behaviour that would result from this change is that
running the stop target in the init script will wait for up to 30s before
exiting rather than retuning immediatly. I cannot think of any use cases
where this would be an issue.
[Original Bug Report]
job handler:
Jul 22 16:03:44 job-handler-1 ERR Failed to execute job: PUT request for http://
Other logs attached.
Related branches
- James Page: Pending requested
-
Diff: 395 lines (+320/-1)9 files modified.pc/1cca0c1.patch/src/init-radosgw (+98/-0)
.pc/1cca0c1.patch/src/init-radosgw.sysv (+112/-0)
.pc/applied-patches (+1/-0)
debian/changelog (+7/-0)
debian/patches/1cca0c1.patch (+60/-0)
debian/patches/init-script-stop.patch (+21/-0)
debian/patches/series (+1/-0)
src/init-radosgw (+9/-1)
src/init-radosgw.sysv (+11/-0)
tags: | removed: kanban |
tags: | added: cpec |
summary: |
- ceph-radosgw died during deployment + ceph-radosgw restart fails |
affects: | ceph-radosgw (Ubuntu) → ceph (Ubuntu) |
Changed in ceph (Ubuntu): | |
status: | New → Confirmed |
description: | updated |
description: | updated |
description: | updated |
Changed in ceph (Ubuntu Wily): | |
status: | Confirmed → Fix Released |
Changed in ceph (Ubuntu Trusty): | |
status: | New → Triaged |
importance: | Undecided → High |
Changed in ceph (Ubuntu Wily): | |
importance: | Undecided → High |
description: | updated |
description: | updated |
description: | updated |
Changed in ceph (Ubuntu Wily): | |
status: | Fix Released → Triaged |
Changed in ceph (Ubuntu Vivid): | |
status: | New → Triaged |
importance: | Undecided → High |
Changed in ceph (Ubuntu Wily): | |
assignee: | nobody → James Page (james-page) |
Changed in ceph (Ubuntu Vivid): | |
assignee: | nobody → James Page (james-page) |
Changed in ceph (Ubuntu Trusty): | |
assignee: | nobody → Liam Young (gnuoy) |
status: | Triaged → In Progress |
Changed in ceph (Ubuntu Vivid): | |
status: | Triaged → In Progress |
Changed in ceph (Ubuntu Wily): | |
status: | Triaged → In Progress |
tags: | added: landscape-release-29 |
tags: | added: kanban-cross-team |
tags: | removed: landscape-release-29 |
Changed in ceph (Ubuntu Vivid): | |
status: | Fix Released → Fix Committed |
affects: | ceph-radosgw (Juju Charms Collection) → ubuntu-translations |
no longer affects: | ubuntu-translations |
ceph-radosgw just died. Last log entries from /var/log/ ceph/radosgw. log:
2015-07-22 15:01:33.303237 7f46bd7fa700 1 handle_sigterm
2015-07-22 15:01:33.396803 7f46e14aa7c0 1 final shutdown
And nothing after that. Landscape got the first error at 15:03:57, and failed continuously until the end.
I logged in on the unit, and there was no radosgw process running. I started one by running the contents of /var/www/s3gw.fcgi: radosgw. gateway
exec /usr/bin/radosgw -c /etc/ceph/ceph.conf -n client.
And then it worked.
The object- internal- error.tar. xz has the inner logs in landscape- 0-inner- logs/. You can find the /var/log contents from the ceph-radosgw/0 unit in landscape- 0-inner- logs/ceph- radosgw- 0/var/log/ for example.