show-regs can cause some samsung controllers to go offline

Bug #1931886 reported by dann frazier
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
nvme-cli (Debian)
Fix Released
Unknown
nvme-cli (Ubuntu)
Fix Released
Undecided
dann frazier
Groovy
Won't Fix
Undecided
dann frazier
Hirsute
Fix Released
Undecided
dann frazier
Impish
Fix Released
Undecided
dann frazier

Bug Description

[Impact]
nvme show-regs has been found to cause certain Samsung controllers
(MZ1L21T9HCLS in particular) to go offline.

[Test Case]
Run `nvme show-regs` on an effected controller device. Messages similar to this will appear in dmesg:
[963314.311332] nvme nvme2: controller is down; will reset: CSTS=0x3, PCI_STATUS=0x10
[963334.951328] nvme nvme2: Device not ready; aborting reset
[963334.963114] nvme nvme2: Removing after probe failure status: -19
[963334.999600] blk_update_request: I/O error, dev nvme2n1, sector 1050640 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
[963335.023410] md: super_written gets error=10
[963335.033842] md/raid1:md0: Disk failure on nvme2n1p2, disabling device.
                md/raid1:md0: Operation continuing on 1 devices.
[ +0.009599] XFS (md127): log I/O error -5
[ +0.015136] XFS (md127): xfs_do_force_shutdown(0x2) called from line 1250 of file fs/xfs/xfs_log.c. Return address = 00000000d0ea8129
[ +0.000001] XFS (md127): Log I/O Error Detected. Shutting down filesystem
[ +0.009290] XFS (md127): Please unmount the filesystem and rectify the problem(s)

[Fix]
This has been fixed upstream with the following commits:
  https://github.com/linux-nvme/nvme-cli/commit/33e60ff64a043b189d2661543b417b21b6f3667b
  https://github.com/linux-nvme/nvme-cli/commit/d43d545a68cc6cea5ac78fda4edeedf3b5198847

[What Could Go Wrong]
Because the register prmsc is now split into prmscl/prmscu as the specification requires, the displayed registers will be different in showregs output. This might surprise any code that is trying to parse this output. Also upstream made a formatting change here that adds additional whitespace to a field when running w/ -H (human-readable mode):

This:
Controller Base Address (CBA) : 0
Became:
Controller Base Address (CBA): 0

It is human-readable mode which at least I interpret as "not for scripting", but it's possible that there is a user expecting that specific format. We could carry an additional patch to restore this whitespace if the SRU team is so inclined.

dann frazier (dannf)
description: updated
Changed in nvme-cli (Debian):
status: Unknown → Confirmed
dann frazier (dannf)
Changed in nvme-cli (Ubuntu Impish):
status: New → In Progress
assignee: nobody → dann frazier (dannf)
Changed in nvme-cli (Debian):
status: Confirmed → Fix Released
dann frazier (dannf)
Changed in nvme-cli (Ubuntu Impish):
status: In Progress → Triaged
Revision history for this message
dann frazier (dannf) wrote :

This is fixed in the 1.14 upstream release which I've now sync'd from Debian experimental.

Changed in nvme-cli (Ubuntu Impish):
status: Triaged → Fix Committed
dann frazier (dannf)
Changed in nvme-cli (Ubuntu Impish):
status: Fix Committed → Fix Released
Changed in nvme-cli (Ubuntu Hirsute):
status: New → In Progress
Changed in nvme-cli (Ubuntu Groovy):
status: New → In Progress
assignee: nobody → dann frazier (dannf)
Changed in nvme-cli (Ubuntu Hirsute):
assignee: nobody → dann frazier (dannf)
dann frazier (dannf)
Changed in nvme-cli (Ubuntu Groovy):
status: In Progress → Won't Fix
Revision history for this message
Brian Murray (brian-murray) wrote : Please test proposed package

Hello dann, or anyone else affected,

Accepted nvme-cli into hirsute-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/nvme-cli/1.12-5ubuntu0.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-hirsute to verification-done-hirsute. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-hirsute. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in nvme-cli (Ubuntu Hirsute):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-hirsute
Revision history for this message
Pedro Principeza (pprincipeza) wrote :

Hi.

A customer provided us the feedback from Hirsute's testing. Formally, on their own words:

"The issue with the Samsung drives is no longer seen with this version, I'm able to access the drive's registers without issues."

I'm moving on with the verification tagging. Thanks!

tags: added: verification-done-hirsute
removed: verification-needed verification-needed-hirsute
Revision history for this message
Brian Murray (brian-murray) wrote :

Generally, I'd expect more detailed information for an SRU verification. However, for this specific SRU the test case is straight forward so I'll go ahead and accept it.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package nvme-cli - 1.12-5ubuntu0.1

---------------
nvme-cli (1.12-5ubuntu0.1) hirsute; urgency=medium

  * Fix issue where 'showregs' can cause certain Samsung devices
    to go offline. LP: #1931886.

 -- dann frazier <email address hidden> Wed, 07 Jul 2021 15:01:59 -0600

Changed in nvme-cli (Ubuntu Hirsute):
status: Fix Committed → Fix Released
Revision history for this message
Brian Murray (brian-murray) wrote : Update Released

The verification of the Stable Release Update for nvme-cli has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.