clear-holders doesn't wait for md device to stop

Bug #1682584 reported by Ryan Harper
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
curtin
Fix Released
High
Unassigned

Bug Description

Initially in clear-holders we saw an issue[1] when attempting to use mdadm_remove on a stopped
device due to how the mdadm shutdown handler in curtin's clear-holders is written.

While documented in many places, examination of the mdadm code in Trusty (and beyond) mdadm --remove is explicitly for ejecting one of the raid members; it is not needed as part of releasing raid resources.

With removing calls the mdadm_remove, this helped not trip a failure when calling remove on the array; however, it exposed the fact that clear-holders didn't wait until the md resources were released.

While working on that piece, Xenial and newer kernels exhibit some resource leaking which leaves entries in sysfs present (LP:1682456); this requires a kernel fix but does not block any functionalty. Curtin can continue to create a raid device and complete an install.

Until the kernel portion is fixed, curtin will monitor whether a raid device has been released by examining the output of /proc/mdstat.

The triggering storage config[2] is being added to curtin's vmtests to help catch an regressions here and then validate once the kernel fix is complete that we can switch to watching sysfs entries instead of /proc/mdstat.

1.
shutdown running on holder type: 'raid' syspath: '/sys/class/block/md1'
path_to_kname input: '/sys/devices/virtual/block/md1' output: 'md1'
kname_to_path input: 'md1' output: '/dev/md1'
using mdadm.mdadm_stop on dev: /dev/md1
mdadm stopping: /dev/md1
Running command ['mdadm', '--stop', '/dev/md1'] with allowed return codes [0] (shell=False, capture=True)
mdadm stop:

mdadm: stopped /dev/md1

mdadm removing: /dev/md1
Running command ['mdadm', '--remove', '/dev/md1'] with allowed return codes [0] (shell=False, capture=True)
finish: cmd-install/stage-partitioning/builtin/cmd-block-meta: FAIL: failed: removing previous storage devices
finish: cmd-install/stage-partitioning/builtin/cmd-block-meta: FAIL: failed: curtin command block-meta
Traceback (most recent call last):
  File "/curtin/curtin/commands/main.py", line 211, in main
    ret = args.func(args)
  File "curtin/commands/block_meta.py", line 62, in block_meta
    meta_custom(args)
  File "curtin/commands/block_meta.py", line 1041, in meta_custom
    clear_holders.clear_holders(disk_paths)
  File "curtin/block/clear_holders.py", line 379, in clear_holders
    shutdown_function(dev_info['device'])
  File "curtin/block/clear_holders.py", line 146, in shutdown_mdadm
    mdadm.mdadm_remove(blockdev)
  File "curtin/block/mdadm.py", line 269, in mdadm_remove
    rcs=[0], capture=True)
  File "curtin/util.py", line 174, in subp
    return _subp(*args, **kwargs)
  File "curtin/util.py", line 122, in _subp
    cmd=args)
ProcessExecutionError: Unexpected error while running command.
Command: ['mdadm', '--remove', '/dev/md1']
Exit code: 1
Reason: -
Stdout: ''
Stderr: u'mdadm: error opening /dev/md1: No such file or directory\n'
Unexpected error while running command.
Command: ['mdadm', '--remove', '/dev/md1']
Exit code: 1
Reason: -
Stdout: ''
Stderr: u'mdadm: error opening /dev/md1: No such file or directory\n'
builtin command failed
finish: cmd-install/stage-partitioning/builtin: FAIL: failed: running 'curtin block-meta custom'
builtin took 2.579 seconds
stage_partitioning took 2.579 seconds
finish: cmd-install/stage-partitioning: FAIL: failed: configuring storage
Installation failed with exception: Unexpected error while running command.
Command: ['curtin', 'block-meta', 'custom']
Exit code: 3

2.
storage:
  config:
  - grub_device: true
    id: sda
    name: sda
    ptable: msdos
    type: disk
    wipe: superblock
    path: /dev/vdb
    name: main_disk
  - id: sdb
    name: sdb
    ptable: gpt
    type: disk
    wipe: superblock
    path: /dev/vdc
    name: second_disk
  - device: sda
    flag: boot
    id: sda-part1
    name: sda-part1
    number: 1
    offset: 4194304B
    size: 511705088B
    type: partition
    uuid: fc7ab24c-b6bf-460f-8446-d3ac362c0625
    wipe: superblock
  - device: sda
    id: sda-part2
    name: sda-part2
    number: 2
    size: 2G
    type: partition
    uuid: 47c97eae-f35d-473f-8f3d-d64161d571f1
    wipe: superblock
  - device: sda
    id: sda-part3
    name: sda-part3
    number: 3
    size: 2G
    type: partition
    uuid: e3202633-841c-4936-a520-b18d1f7938ea
    wipe: superblock
  - device: sdb
    flag: boot
    id: sdb-part1
    name: sdb-part1
    number: 1
    offset: 4194304B
    size: 511705088B
    type: partition
    uuid: 86326392-3706-4124-87c6-2992acfa31cc
    wipe: superblock
  - device: sdb
    id: sdb-part2
    name: sdb-part2
    number: 2
    size: 2G
    type: partition
    uuid: a33a83dd-d1bf-4940-bf3e-6d931de85dbc
    wipe: superblock
  - devices:
    - sda-part2
    - sdb-part2
    id: md0
    name: md0
    raidlevel: 1
    spare_devices: []
    type: raid
  - device: sdb
    id: sdb-part3
    name: sdb-part3
    number: 3
    size: 2G
    type: partition
    uuid: 27e29758-fdcf-4c6a-8578-c92f907a8a9d
    wipe: superblock
  - devices:
    - sda-part3
    - sdb-part3
    id: md1
    name: md1
    raidlevel: 1
    spare_devices: []
    type: raid
  - fstype: fat32
    id: sda-part1_format
    label: efi
    type: format
    uuid: b3d50fc7-2f9e-4d1a-9e24-28985e4c560b
    volume: sda-part1
  - fstype: fat32
    id: sdb-part1_format
    label: efi
    type: format
    uuid: c604cbb1-2ee1-4575-9489-d38a60fa0cf2
    volume: sdb-part1
  - fstype: ext4
    id: md0_format
    label: ''
    type: format
    uuid: 76a315b7-2979-436c-b156-9ae64a565a59
    volume: md0
  - fstype: ext4
    id: md1_format
    label: ''
    type: format
    uuid: 48dceca6-a9f9-4c7b-bfd3-7f3a0faa4ecc
    volume: md1
  - device: md0_format
    id: md0_mount
    options: ''
    path: /
    type: mount
  - device: sda-part1_format
    id: sda-part1_mount
    options: ''
    path: /boot/efi
    type: mount
  - device: md1_format
    id: md1_mount
    options: ''
    path: /var
    type: mount
  version: 1

Tags: 4010

Related branches

Ryan Harper (raharper)
description: updated
Ryan Harper (raharper)
Changed in curtin:
importance: Undecided → High
status: New → Fix Committed
tags: added: 4010
Revision history for this message
Scott Moser (smoser) wrote : Fixed in Curtin 17.1

This bug is believed to be fixed in curtin in 17.1. If this is still a problem for you, please make a comment and set the state back to New

Thank you.

Changed in curtin:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.