Ubuntu
ceph package

ceph-radosgw restart fails

Bug #1477225 reported by Andreas Hasenack on 2015-07-22

This bug affects 4 people

	Status	Importance	Assigned to
ceph (Ubuntu)	Fix Released	High	James Page
Trusty	Fix Released	High	Liam Young
Vivid	Fix Released	High	James Page
Wily	Fix Released	High	James Page

Bug Description

Upstream Bug: http://tracker.ceph.com/issues/11140

[Impact]

On 14.04 the restart target of the sysvinit script brings the service down
but sometimes fails to bring the service back up again. There is a race between stop and start and in the failure case the attempt to bring the service up runs before the service has been stopped and the start command is never issued:

The proposed fix updates /etc/init.d/radosgw so that the stop target
waits for up to 30 seconds for the service to stop cleanly

[Test Case]

Bundle:

openstack-services:
  services:
    mysql:
      branch: lp:~openstack-charmers/charms/trusty/percona-cluster/next
      constraints: mem=1G
      options:
        dataset-size: 50%
    ceph:
      branch: lp:~openstack-charmers/charms/trusty/ceph/next
      num_units: 3
      constraints: mem=1G
      options:
        monitor-count: 3
        fsid: 6547bd3e-1397-11e2-82e5-53567c8d32dc
        monitor-secret: AQCXrnZQwI7KGBAAiPofmKEXKxu5bUzoYLVkbQ==
        osd-devices: /dev/vdb
        osd-reformat: "yes"
        ephemeral-unmount: /mnt
    keystone:
      branch: lp:~openstack-charmers/charms/trusty/keystone/next
      constraints: mem=1G
      options:
        admin-password: openstack
        admin-token: ubuntutesting
    ceph-radosgw:
      branch: lp:~openstack-charmers/charms/trusty/ceph-radosgw/next
      options:
        use-embedded-webserver: True
  relations:
    - [ keystone, mysql ]
    - [ ceph-radosgw, keystone ]
    - [ ceph-radosgw, ceph ]
# kilo
trusty-kilo:
  inherits: openstack-services
  series: trusty
  overrides:
    openstack-origin: cloud:trusty-kilo
    source: cloud:trusty-kilo
trusty-icehouse:
  inherits: openstack-services
  series: trusty

$ juju-deployer -c next.yaml trusty-icehouse
$ juju ssh ceph-radosgw/0
$ sudo su -
# service radosgw status
/usr/bin/radosgw is running.
# service radosgw restart
Starting client.radosgw.gateway...
/usr/bin/radosgw already running.
/usr/bin/radosgw is running.
# service radosgw status
/usr/bin/radosgw is not running.
# apt-cache policy radosgw
radosgw:
  Installed: 0.80.10-0ubuntu0.14.04.1
  Candidate: 0.80.10-0ubuntu0.14.04.1
  Version table:
*** 0.80.10-0ubuntu0.14.04.1 0
        500 http://nova.clouds.archive.ubuntu.com/ubuntu/ trusty-updates/main amd64 Packages
        100 /var/lib/dpkg/status
     0.79-0ubuntu1 0
        500 http://nova.clouds.archive.ubuntu.com/ubuntu/ trusty/main amd64 Packages
root@juju-lytrusty-machine-4:~#

[Regression Potential]

* The only change in behaviour that would result from this change is that
   running the stop target in the init script will wait for up to 30s before
   exiting rather than retuning immediatly. I cannot think of any use cases
   where this would be an issue.

[Original Bug Report]
job handler:
Jul 22 16:03:44 job-handler-1 ERR Failed to execute job: PUT request for http://10.96.4.129:80/swift/v1/simplestreams failed with code 500 Internal Server Error: '<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">\n<html><head>\n<title>500 Internal Server Error</title>\n</head><body>\n<h1>Internal Server Error</h1>\n<p>The server encountered an internal error or\nmisconfiguration and was unable to complete\nyour request.</p>\n<p>Please contact the server administrator at \n <email address hidden> to inform them of the time this error occurred,\n and the actions you performed just before this error.</p>\n<p>More information about this error may be available\nin the server error log.</p>\n</body></html>\n'#012Traceback (most recent call last):#012 File "/opt/canonical/landscape/canonical/landscape/model/activity/jobrunner.py", line 38, in run#012 yield self._run_activity(account_id, activity_id)#012HTTPError: PUT request for http://10.96.4.129:80/swift/v1/simplestreams failed with code 500 Internal Server Error: '<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">\n<html><head>\n<title>500 Internal Server Error</title>\n</head><body>\n<h1>Internal Server Error</h1>\n<p>The server encountered an internal error or\nmisconfiguration and was unable to complete\nyour request.</p>\n<p>Please contact the server administrator at \n <email address hidden> to inform them of the time this error occurred,\n and the actions you performed just before this error.</p>\n<p>More information about this error may be available\nin the server error log.</p>\n</body></html>\n'

Other logs attached.

See original description

Tags:

Related branches

lp:~gnuoy/ubuntu/trusty/ceph/1477225

Merged into lp:ubuntu/trusty-proposed/ceph

James Page: Pending requested 2015-09-07

lp:ubuntu/wily-proposed/ceph

Revision history for this message

Andreas Hasenack (ahasenack) wrote on 2015-07-22:

landscape-cloud-installer-logs-2015-07-22T16_38_33Z.tar.gz Edit (1.7 MiB, application/x-tar)

🤖 Landscape Builder (landscape-builder) on 2015-07-22

tags:

removed: kanban

Revision history for this message

Andreas Hasenack (ahasenack) wrote on 2015-07-22:

object-internal-error.tar.xz Edit (38.1 MiB, application/octet-stream)

Revision history for this message

Andreas Hasenack (ahasenack) wrote on 2015-07-22:

ceph-radosgw just died. Last log entries from /var/log/ceph/radosgw.log:
2015-07-22 15:01:33.303237 7f46bd7fa700 1 handle_sigterm
2015-07-22 15:01:33.396803 7f46e14aa7c0 1 final shutdown

And nothing after that. Landscape got the first error at 15:03:57, and failed continuously until the end.

I logged in on the unit, and there was no radosgw process running. I started one by running the contents of /var/www/s3gw.fcgi:
exec /usr/bin/radosgw -c /etc/ceph/ceph.conf -n client.radosgw.gateway

And then it worked.

The object-internal-error.tar.xz has the inner logs in landscape-0-inner-logs/. You can find the /var/log contents from the ceph-radosgw/0 unit in landscape-0-inner-logs/ceph-radosgw-0/var/log/ for example.

summary:	- Internal server error when uploading to object store (ceph-radosgw) + ceph-radosgw died during deployment
information type:	Proprietary → Public

Revision history for this message

Andreas Hasenack (ahasenack) wrote on 2015-07-22: Re: ceph-radosgw died during deployment

Changing project to the ceph-radosgw charm

affects:

landscape → ceph-radosgw (Juju Charms Collection)

Revision history for this message

Nobuto Murata (nobuto) wrote on 2015-08-05:

FWIW, I'm also getting 500 frequently with 'FastCGI: incomplete headers (0 bytes) received from server "/var/www/s3gw.fcgi"'. After doing `juju set ceph-radosgw use-embedded-webserver=true`(i.e. bypassing Apache + mod-fastcgi), the issue has gone.

I'm using cloud:trusty-kilo.

Revision history for this message

Alberto Donato (ack) wrote on 2015-08-13:

landscape-openstack-autopilot-logs-2015-08-13T16-03-45Z.tar.gz Edit (807.1 KiB, application/x-tar)

I had a similar issue with a ceph/ceph OSA deploy using current stable charms (specifically, cs:trusty/ceph-radosgw-15).

The autopilot fails while trying to upload simplestreams:

Aug 13 16:02:32 job-handler-1 INFO PUT http://10.1.48.88:80/swift/v1/simplestreams headers={'X-Container-Read': '.r:*'} auth_retry_attempts=0 blind_retry_attempts=0

Last entry in radosgw.log shows the server was stopped:

2015-08-13 15:39:21.500670 7f09d10b47c0 0 ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3), process radosgw, pid 9937
2015-08-13 15:39:24.407231 7f09d10b47c0 0 framework: civetweb
2015-08-13 15:39:24.407246 7f09d10b47c0 0 framework conf key: port, val: 70
2015-08-13 15:39:24.407270 7f09d10b47c0 0 starting handler: civetweb
2015-08-13 15:39:28.187979 7f09ad7fa700 -1 failed to list objects pool_iterate returned r=-2
2015-08-13 15:39:28.187990 7f09ad7fa700 0 ERROR: lists_keys_next(): ret=-2
2015-08-13 15:39:28.187995 7f09ad7fa700 0 ERROR: sync_all_users() returned ret=-2
2015-08-13 15:40:19.341212 7f09acff9700 1 handle_sigterm
2015-08-13 15:40:19.341248 7f09acff9700 1 handle_sigterm set alarm for 120
2015-08-13 15:40:19.341251 7f09d10b47c0 -1 shutting down
2015-08-13 15:40:19.458224 7f09acff9700 1 handle_sigterm
2015-08-13 15:40:19.458252 7f09acff9700 1 handle_sigterm set alarm for 120
2015-08-13 15:40:20.046138 7f09d10b47c0 1 final shutdown

Revision history for this message

Alberto Donato (ack) wrote on 2015-08-13:

ceph-radosgw log Edit (19.8 KiB, text/plain)

Nobuto Murata (nobuto) on 2015-08-17

tags:

added: cpec

Revision history for this message

Liam Young (gnuoy) wrote on 2015-09-01:

This is not a charm bug. It looks like an upstart script issue:

# service radosgw status
/usr/bin/radosgw is not running.
# service radosgw start
Starting client.radosgw.gateway...
/usr/bin/radosgw is running.
# service radosgw status
/usr/bin/radosgw is running.
# service radosgw restart
Starting client.radosgw.gateway...
/usr/bin/radosgw already running.
/usr/bin/radosgw is running.
# service radosgw status
/usr/bin/radosgw is not running.

Changed in ceph-radosgw (Juju Charms Collection):
status:	New → Invalid

Liam Young (gnuoy) on 2015-09-01

summary:

- ceph-radosgw died during deployment
+ ceph-radosgw restart fails

Revision history for this message

Launchpad Janitor (janitor) wrote on 2015-09-02:

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in ceph (Ubuntu):
status:	New → Confirmed

Liam Young (gnuoy) on 2015-09-02

affects:	ceph-radosgw (Ubuntu) → ceph (Ubuntu)
Changed in ceph (Ubuntu):
status:	New → Confirmed

Liam Young (gnuoy) on 2015-09-02

description:	updated
description:	updated
description:	updated

James Page (james-page) on 2015-09-04

Changed in ceph (Ubuntu Wily):
status:	Confirmed → Fix Released
Changed in ceph (Ubuntu Trusty):
status:	New → Triaged
importance:	Undecided → High
Changed in ceph (Ubuntu Wily):
importance:	Undecided → High

Liam Young (gnuoy) on 2015-09-04

description:

updated

Liam Young (gnuoy) on 2015-09-04

description:

updated

Liam Young (gnuoy) on 2015-09-07

description:

updated

James Page (james-page) on 2015-09-07

Changed in ceph (Ubuntu Wily):
status:	Fix Released → Triaged
Changed in ceph (Ubuntu Vivid):
status:	New → Triaged
importance:	Undecided → High

James Page (james-page) on 2015-09-07

Changed in ceph (Ubuntu Wily):
assignee:	nobody → James Page (james-page)
Changed in ceph (Ubuntu Vivid):
assignee:	nobody → James Page (james-page)
Changed in ceph (Ubuntu Trusty):
assignee:	nobody → Liam Young (gnuoy)
status:	Triaged → In Progress
Changed in ceph (Ubuntu Vivid):
status:	Triaged → In Progress
Changed in ceph (Ubuntu Wily):
status:	Triaged → In Progress

Revision history for this message

Launchpad Janitor (janitor) wrote on 2015-09-08:

#10

This bug was fixed in the package ceph - 0.94.3-0ubuntu2

---------------
ceph (0.94.3-0ubuntu2) wily; urgency=medium

* d/ceph.install: Drop ceph-deploy manpage from packaging, provided
by ceph-deploy itself (LP: #1475910).

-- James Page <email address hidden> Mon, 07 Sep 2015 14:42:03 +0100

Changed in ceph (Ubuntu Wily):
status:	In Progress → Fix Released

Free Ekanayaka (free.ekanayaka) on 2015-09-08

tags:

added: landscape-release-29

Revision history for this message

Chad Smith (chad.smith) wrote on 2015-09-09:

#11

Will need to confirm once we have a 0.94.3-0ubuntu2 available for deployment

lp:1468335 seems very likely related

Revision history for this message

Chris J Arges (arges) wrote on 2015-09-23:

#12

This is blocked in the unapproved queue because bug 1475247 and bug 1477174 have not yet been verified. Please test those bugs first.

Revision history for this message

Brian Murray (brian-murray) wrote on 2015-10-01: Please test proposed package

#13

Hello Andreas, or anyone else affected,

Accepted ceph into vivid-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/ceph/0.94.3-0ubuntu0.15.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in ceph (Ubuntu Vivid):
status:	In Progress → Fix Committed
tags:	added: verification-needed

David Britton (dpb) on 2015-10-02

tags:

added: kanban-cross-team

David Britton (dpb) on 2015-10-07

tags:

removed: landscape-release-29

Revision history for this message

Chris J Arges (arges) wrote on 2015-10-07:

#14

Hello Andreas, or anyone else affected,

Accepted ceph into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/ceph/0.80.10-0ubuntu1.14.04.3 in a few hours, and then in the -proposed repository.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in ceph (Ubuntu Trusty):
status:	In Progress → Fix Committed
Changed in ceph (Ubuntu Vivid):
status:	Fix Committed → Fix Released

James Page (james-page) on 2015-10-22

Changed in ceph (Ubuntu Vivid):
status:	Fix Released → Fix Committed

Revision history for this message

James Page (james-page) wrote on 2015-11-04:

#15

Tested from trusty proposed - restarts of radosgw are reliable post upgrade.

tags:

added: verification-done verification-needed-vivid
removed: verification-needed

Revision history for this message

James Page (james-page) wrote on 2015-11-04:

#16

Also verified OK on vivid - restarts under systemd are now consistent.

tags:

removed: verification-needed-vivid

Revision history for this message

Free Ekanayaka (free.ekanayaka) wrote on 2015-11-04:

#17

@James: is there a plan to upload the fix to the kilo/liberty trusty cloud archive too? That'd be the only way the Landscape Openstack Autopilot could get it I think.

Revision history for this message

Launchpad Janitor (janitor) wrote on 2015-11-04:

#18

This bug was fixed in the package ceph - 0.80.10-0ubuntu1.14.04.3

---------------
ceph (0.80.10-0ubuntu1.14.04.3) trusty; urgency=medium

  * d/p/ceph-radosgw-init.patch: Cherry pick patch from upstream VCS to
    ensure that restarts of the radosgw wait an appropriate amount of time
    for the existing daemon to shutdown (LP: #1477225).

ceph (0.80.10-0ubuntu1.14.04.2) trusty; urgency=medium

  * Switch to two step 'zapping' of disks, ensuring that disks with invalid
    metadata don't cause hangs are fully cleaned and initialized prior
    to use (LP: #1475247).

-- Liam Young <email address hidden> Mon, 07 Sep 2015 16:00:31 +0100

Changed in ceph (Ubuntu Trusty):
status:	Fix Committed → Fix Released

Revision history for this message

Chris J Arges (arges) wrote on 2015-11-04: Update Released

#19

The verification of the Stable Release Update for ceph has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message

Launchpad Janitor (janitor) wrote on 2015-11-04:

#20

This bug was fixed in the package ceph - 0.94.3-0ubuntu0.15.04.1

---------------
ceph (0.94.3-0ubuntu0.15.04.1) vivid; urgency=medium

  [ James Page ]
  * New upstream point release (LP: #1492227).
  * d/ceph.install: Drop ceph-deploy manpage from packaging, provided
    by ceph-deploy itself (LP: #1475910).

  [ Liam Young ]
  * d/p/ceph-radosgw-init.patch: Cherry pick patch from upstream VCS to
    ensure that restarts of the radosgw wait an appropriate amount of time
    for the existing daemon to shutdown (LP: #1477225).

-- James Page <email address hidden> Mon, 07 Sep 2015 16:01:46 +0100

Changed in ceph (Ubuntu Vivid):
status:	Fix Committed → Fix Released

Mathew Hodson (mhodson) on 2015-11-04

affects:	ceph-radosgw (Juju Charms Collection) → ubuntu-translations
no longer affects:	ubuntu-translations

Report a bug

This report contains Public information

Everyone can see this information.

Duplicates of this bug

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.

Ubuntuceph package

ceph-radosgw restart fails

Bug Description

Related branches

Duplicates of this bug

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntu
ceph package