ERROR: error creating empty object store in /var/lib/ceph/osd/ceph-NN/: (11) Resource temporarily unavailable

Bug #1903221 reported by Przemyslaw Hausman
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ceph OSD Charm
Won't Fix
Low
Unassigned
ceph (Ubuntu)
Fix Released
Medium
Unassigned

Bug Description

ussuri-focal, charms revision 20.10.

ceph-osd fails to initialize 5 out of 36 OSDs on each storage node every time I redeploy.

I have 3 storage nodes. Each node has 36x 4TB disks used as OSDs. Bcache for OSDs is also set up. Every time I redeploy the bundle, each storage node ends up with only 31 OSDs initialized. During the initialization of remaining 5 OSD, the following errors occur:

unit-ceph-osd-0: 22:03:20 WARNING unit.ceph-osd/0.mon-relation-changed Running command: /usr/bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 94 --monmap /var/lib/ceph/osd/ceph-94/activate.monmap --keyfile - --osd-data /var/lib/ceph/osd/ceph-94/ --osd-uuid fa372321-a0bc-4995-9936-227d2f663dd2 --setuser ceph --setgroup ceph
unit-ceph-osd-0: 22:03:20 WARNING unit.ceph-osd/0.mon-relation-changed stderr: 2020-11-05T22:03:20.311+0000 7fc1d846dd80 -1 bluestore(/var/lib/ceph/osd/ceph-94/) _read_fsid unparsable uuid
unit-ceph-osd-0: 22:03:20 WARNING unit.ceph-osd/0.mon-relation-changed stderr: 2020-11-05T22:03:20.319+0000 7fc1d846dd80 -1 bdev(0x5556cac96700 /var/lib/ceph/osd/ceph-94//block) _aio_start io_setup(2) failed with EAGAIN; try increasing /proc/sys/fs/aio-max-nr
unit-ceph-osd-0: 22:03:20 WARNING unit.ceph-osd/0.mon-relation-changed stderr: 2020-11-05T22:03:20.319+0000 7fc1d846dd80 -1 bluestore(/var/lib/ceph/osd/ceph-94/) _minimal_open_bluefs add block device(/var/lib/ceph/osd/ceph-94//block) returned: (11) Resource temporarily unavailable
unit-ceph-osd-0: 22:03:20 WARNING unit.ceph-osd/0.mon-relation-changed stderr: 2020-11-05T22:03:20.603+0000 7fc1d846dd80 -1 bluestore(/var/lib/ceph/osd/ceph-94/) mkfs failed, (11) Resource temporarily unavailable
unit-ceph-osd-0: 22:03:20 WARNING unit.ceph-osd/0.mon-relation-changed stderr: 2020-11-05T22:03:20.603+0000 7fc1d846dd80 -1 OSD::mkfs: ObjectStore::mkfs failed with error (11) Resource temporarily unavailable
unit-ceph-osd-0: 22:03:20 WARNING unit.ceph-osd/0.mon-relation-changed stderr: 2020-11-05T22:03:20.603+0000 7fc1d846dd80 -1 ** ERROR: error creating empty object store in /var/lib/ceph/osd/ceph-94/: (11) Resource temporarily unavailable

I'm attaching the logs for both failed and successful OSD initialization.

Revision history for this message
Przemyslaw Hausman (phausman) wrote :
Revision history for this message
Przemyslaw Hausman (phausman) wrote :
Revision history for this message
Przemyslaw Hausman (phausman) wrote :

I tried deploying with the limited number of OSDs initially configured in the bundle. To start with, I configured only 6 disks in osd-devices config option.

Once the first 6 OSDs have been initialized, I added next 10 disks with `juju config ceph-osd osd-devices=...`. Additional 10 OSDs were successfully initialized.

I repeated this process two more times to reach 36 OSDs in total. But during the processing of the last batch, the error occurred again. And again 5 OSDs per each storage node failed to initialize.

Revision history for this message
Billy Olsen (billy-olsen) wrote :

Is it always the same OSDs that are having problems?

Revision history for this message
Przemyslaw Hausman (phausman) wrote :

> Is it always the same OSDs that are having problems?

No. In the test I did in the comment #3 I started with the 6 disks including the 5 that did not initialize earlier. This time they did initialize correctly.

This makes me think that the actual number of somehow "allowed" disks is the limit.

Could it have something to do with bcache? Caching partition is relatively small -- it has only ~630GB fo 36x 4TB of backing devices. Maybe I should disable bcache for testing and try to onboard OSD disks directly.

Revision history for this message
Przemyslaw Hausman (phausman) wrote :

I removed the bcache entirely and configured raw disks as osd-devices for ceph-osd. The issue still stands. The last 5 OSDs fail to initialize.

This is the juju-crashdump from the model: https://drive.google.com/file/d/17PHLgi_Ps4BWDU9R4tL9rUrFA3eGRQP5/

Revision history for this message
Przemyslaw Hausman (phausman) wrote :

Subscribing ~field-high as this issue occurs on a customer deployment.

Revision history for this message
Andrew McLeod (admcleod) wrote :

As discussed in chat: if you could try ceph-volume manually and then get an strace of the issue that might be very useful

Revision history for this message
Andrew McLeod (admcleod) wrote :

This bug also may be related, but seems to occur right at the start of the creation: https://tracker.ceph.com/issues/46124

Revision history for this message
Przemyslaw Hausman (phausman) wrote :

I think I found the issue. In the log for failed OSD initialization we can see "_aio_start io_setup(2) failed with EAGAIN; try increasing /proc/sys/fs/aio-max-nr". So I checked both aio-max-nr and aio-nr and realized the system is close to the limit. When I increased the limit and manually run ceph-volume, it succeeded.

# Read the limit
root@storage-1:~# cat /proc/sys/fs/aio-max-nr
65536

# Read the current value
root@storage-1:~# cat /proc/sys/fs/aio-nr
63490

# Then I increased /proc/sys/fs/aio-max-nr to 1048576
root@storage-1:~# echo 1048576 > /proc/sys/fs/aio-max-nr
root@storage-1:~# cat /proc/sys/fs/aio-max-nr
1048576

After this I was able to successfully initialize new OSD.

I'm wondering if this is something that charm could calculate and adjust?

Revision history for this message
Przemyslaw Hausman (phausman) wrote :

I noticed that sysctl config file related to ceph-ods exist here: /etc/sysctl.d/30-ceph-osd.conf. But it looks like the system is not provisioned with these values:

root@storage-1:~# cat /etc/sysctl.d/30-ceph-osd.conf
fs.aio-max-nr = 1048576
kernel.pid_max = 4194304

root@storage-1:~# sysctl fs.aio-max-nr
fs.aio-max-nr = 65536

root@storage-1:~# sysctl kernel.pid_max
kernel.pid_max = 2097152

Revision history for this message
Przemyslaw Hausman (phausman) wrote :

Only when I restarted systemd-sysctl.service, the kernel parameters have been set.

Revision history for this message
Corey Bryant (corey.bryant) wrote :

Przemyslaw, I think kernel.pid_max can be explained by the ceph-osd config option 'sysctl'. I'm not sure about the others.

  sysctl:
    type: string
    default: '{ kernel.pid_max : 2097152, vm.max_map_count : 524288,
                kernel.threads-max: 2097152 }'
    description: |
      YAML-formatted associative array of sysctl key/value pairs to be set
      persistently. By default we set pid_max, max_map_count and
      threads-max to a high value to avoid problems with large numbers (>20)
      of OSDs recovering. very large clusters should set those values even
      higher (e.g. max for kernel.pid_max is 4194303).

Revision history for this message
Przemyslaw Hausman (phausman) wrote :

Just did some more testing. The issue can be reproduced on bionic as well.

Revision history for this message
Przemyslaw Hausman (phausman) wrote :

Configuring `sysctl` option on ceph-osd application as below fixes the issue. Thanks @corey.bryant!

```
sysctl: "{ kernel.pid_max : 2097152, vm.max_map_count : 524288, kernel.threads-max: 2097152, fs.aio-max-nr: 1048576 }"
```

I'm not sure though why specifying "fs.aio-max-nr: 1048576" explicitly is required. I'm under the impression that the parameters configured in /etc/sysctl.d/30-ceph-osd.conf should be applied in the system out of the box (e.g. by the ceph-osd charm).

Revision history for this message
Billy Olsen (billy-olsen) wrote :

Triaging to low as the sysctl defaults may need to change. However, a config option is available specifically for scenarios like this and should be used for the time being.

Changed in charm-ceph-osd:
status: New → Triaged
importance: Undecided → Low
Revision history for this message
Nobuto Murata (nobuto) wrote :

Adding Ubuntu Ceph packaging task here.

30-ceph-osd.conf file is owned by ceph-osd package as follows.

$ dpkg -S /etc/sysctl.d/30-ceph-osd.conf
ceph-osd: /etc/sysctl.d/30-ceph-osd.conf

However, as far as I see in 15.2.8-0ubuntu0.20.04.1/focal there is no place in /var/lib/dpkg/info/ceph-osd.postinst activating the sysctl file so it requires a reboot basically or manual intervention is required to apply the value in the package. I think that's why fs.aio-max-nr wasn't applied in the first place.

Revision history for this message
James Page (james-page) wrote :

As Nobuto highlights this should be handled in the packaging.

Changed in ceph (Ubuntu):
status: New → Triaged
importance: Undecided → Medium
Changed in charm-ceph-osd:
status: Triaged → Won't Fix
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ceph - 17.1.0-0ubuntu3

---------------
ceph (17.1.0-0ubuntu3) jammy; urgency=medium

  * d/p/py310-py-ssize-t-compat.patch: Cherry pick fix to resolve
    compatibility issues with Python 3.10 (LP: #1964322).
  * d/ceph-osd.postinst: apply sysctl tuning for ceph-osd daemons
    on installation (LP: #1903221).
  * d/control: Drop use of google-perftools on armhf (LP: #1812179).

 -- James Page <email address hidden> Tue, 22 Mar 2022 10:22:37 +0000

Changed in ceph (Ubuntu):
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.