Avoton server occasionally fails to boot when using SSD disks connected using ahci driver

Bug #1458617 reported by Rafael David Tinoco
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
High
Rafael David Tinoco
Trusty
Fix Released
Undecided
Unassigned
Utopic
Fix Released
Undecided
Unassigned
Vivid
Fix Released
Undecided
Unassigned

Bug Description

SRU Justification:

Impact: Intermittent OS failure due to SSD disk initialisation failure
Fix: Upstream development
Testcase: About once per 5000 reboots on one server.
          If the user has several hundred servers,
          the frequency isn't negligible.

---------------------

It was brought to my attention the following BUG:

Avoton server occasionally fails to boot when using SSD disks
connected using ahci driver. When this problem occurs, the disk is
not recognized by the Linux kernel, and the server fails to boot.

* Upstream acceptance information:

Intel's patch has already been accepted upstream
commit dbfe8ef5599a5370abc441fcdbb382b656563eb4
Author: Dan Williams <email address hidden>
Date: Fri May 8 15:23:55 2015 -0400
ahci: avoton port-disable reset-quirk

Refer to http://permalink.gmane.org/gmane.linux.kernel.commits.head/524068

* How reproducible:
About once per 5000 reboots on one server. However, if the customer
has several hundred servers, the frequency isn't negligible.

* Steps to Reproduce:
Reboot server.

Changed in linux (Ubuntu):
status: New → In Progress
assignee: nobody → Rafael David Tinoco (inaddy)
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

I have created the following PPA:

https://launchpad.net/~inaddy/+archive/ubuntu/lp1458617

Containing a hot-fixed kernel for Trusty so I can have proper feedback.

I'll work together with kernel team to include this fix into Trusty, Utopic, Vivid and Wily kernels.

Thank you

Rafael Tinoco

penalvch (penalvch)
Changed in linux (Ubuntu):
importance: Undecided → High
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

I had positive feedback on the patch I have provided in the PPA:

Ubuntu-3.13.0-53.89 + patch:

commit 682de5f4135f3414ec87bb76ef400de1148393c8
Author: Dan Williams <email address hidden>
Date: Fri May 8 15:23:55 2015 -0400

    ahci: avoton port-disable reset-quirk

    Avoton AHCI occasionally sees drive probe timeouts at driver load time.
    When this happens SCR_STATUS indicates device detected, but no D2H FIS
    reception. Reset the internal link state machines by bouncing
    port-enable in the PCS register when this occurs.

    Cc: <email address hidden>
    Signed-off-by: Dan Williams <email address hidden>
    Signed-off-by: Tejun Heo <email address hidden>

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

I'm suggesting this patch to kernel team using kernel-team mailing list.

Thank you.

description: updated
Brad Figg (brad-figg)
Changed in linux (Ubuntu Vivid):
status: New → Fix Committed
Changed in linux (Ubuntu Utopic):
status: New → Fix Committed
Changed in linux (Ubuntu Trusty):
status: New → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (8.5 KiB)

This bug was fixed in the package linux - 3.19.0-22.22

---------------
linux (3.19.0-22.22) vivid; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #1465755

  [ Tai Nguyen ]

  * SAUCE: power: reset: Add syscon reboot device node for APM X-Gene
    platform
    - LP: #1463211

  [ Upstream Kernel Changes ]

  * Revert "dm crypt: fix deadlock when async crypto algorithm returns
    -EBUSY"
    - LP: #1465696
  * Bluetooth: ath3k: Add a new ID 0cf3:e006 to ath3k list
    - LP: #1459934
  * cdc-acm: prevent infinite loop when parsing CDC headers.
    - LP: #1460657
  * (upstream) libata: Blacklist queued TRIM on all Samsung 800-series
    - LP: #1338706, #1449005
  * powerpc/powernv: Check image loaded or not before calling flash
    - LP: #1461553
  * ahci: avoton port-disable reset-quirk
    - LP: #1458617
  * Bluetooth: btusb: support public address configuration for ath3012
    - LP: #1459937
  * Bluetooth: btusb: Add setup callback for chip init on USB
    - LP: #1459937
  * Bluetooth: btusb: Add support for QCA ROME chipset family
    - LP: #1459937
  * Bluetooth: btusb: Fix incorrect type in qca_device_info
    - LP: #1459937
  * Bluetooth: btusb: Fix minor whitespace issue in QCA ROME device entries
    - LP: #1459937
  * Bluetooth: btusb: Add support for 0cf3:e007
    - LP: #1459937
  * storvsc: Set the SRB flags correctly when no data transfer is needed
    - LP: #1439780
  * vfs: read file_handle only once in handle_to_path
    - LP: #1416503
    - CVE-2015-1420
  * ozwpan: Use unsigned ints to prevent heap overflow
    - LP: #1463442
    - CVE-2015-4001
  * ozwpan: divide-by-zero leading to panic
    - LP: #1463445
    - CVE-2015-4003
  * ozwpan: Use proper check to prevent heap overflow
    - LP: #1463444
    - CVE-2015-4002
  * ozwpan: unchecked signed subtraction leads to DoS
    - LP: #1463444
    - CVE-2015-4002
  * enclosure: fix WARN_ON removing an adapter in multi-path devices
    - LP: #1415178
  * ASoC: tfa9879: Fix return value check in tfa9879_i2c_probe()
    - LP: #1465696
  * ASoC: samsung: s3c24xx-i2s: Fix return value check in
    s3c24xx_iis_dev_probe()
    - LP: #1465696
  * ASoC: dapm: Enable autodisable on SOC_DAPM_SINGLE_TLV_AUTODISABLE
    - LP: #1465696
  * ASoC: rt5677: add register patch for PLL
    - LP: #1465696
  * btrfs: unlock i_mutex after attempting to delete subvolume during send
    - LP: #1465696
  * ALSA: hda - Fix mute-LED fixed mode
    - LP: #1465696
  * ALSA: hda - Add mute-LED mode control to Thinkpad
    - LP: #1465696
  * arm64: dma-mapping: always clear allocated buffers
    - LP: #1465696
  * ALSA: emu10k1: Fix card shortname string buffer overflow
    - LP: #1465696
  * ALSA: emux: Fix mutex deadlock at unloading
    - LP: #1465696
  * drm/radeon: Use drm_calloc_ab for CS relocs
    - LP: #1465696
  * drm/radeon: adjust pll when audio is not enabled
    - LP: #1465696
  * drm/radeon: add SI DPM quirk for Sapphire R9 270 Dual-X 2G GDDR5
    - LP: #1465696
  * drm/radeon: fix lockup when BOs aren't part of the VM on release
    - LP: #1465696
  * drm/radeon: reset BOs address after clearing it.
    - LP: #1465696
  * drm/radeon: check new address before removing old one
  ...

Read more...

Changed in linux (Ubuntu):
status: In Progress → Fix Released
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Fix was verified in Trusty, Utopic and Vivid and proved to work.

tags: added: verification-done
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (8.5 KiB)

This bug was fixed in the package linux - 3.19.0-22.22

---------------
linux (3.19.0-22.22) vivid; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #1465755

  [ Tai Nguyen ]

  * SAUCE: power: reset: Add syscon reboot device node for APM X-Gene
    platform
    - LP: #1463211

  [ Upstream Kernel Changes ]

  * Revert "dm crypt: fix deadlock when async crypto algorithm returns
    -EBUSY"
    - LP: #1465696
  * Bluetooth: ath3k: Add a new ID 0cf3:e006 to ath3k list
    - LP: #1459934
  * cdc-acm: prevent infinite loop when parsing CDC headers.
    - LP: #1460657
  * (upstream) libata: Blacklist queued TRIM on all Samsung 800-series
    - LP: #1338706, #1449005
  * powerpc/powernv: Check image loaded or not before calling flash
    - LP: #1461553
  * ahci: avoton port-disable reset-quirk
    - LP: #1458617
  * Bluetooth: btusb: support public address configuration for ath3012
    - LP: #1459937
  * Bluetooth: btusb: Add setup callback for chip init on USB
    - LP: #1459937
  * Bluetooth: btusb: Add support for QCA ROME chipset family
    - LP: #1459937
  * Bluetooth: btusb: Fix incorrect type in qca_device_info
    - LP: #1459937
  * Bluetooth: btusb: Fix minor whitespace issue in QCA ROME device entries
    - LP: #1459937
  * Bluetooth: btusb: Add support for 0cf3:e007
    - LP: #1459937
  * storvsc: Set the SRB flags correctly when no data transfer is needed
    - LP: #1439780
  * vfs: read file_handle only once in handle_to_path
    - LP: #1416503
    - CVE-2015-1420
  * ozwpan: Use unsigned ints to prevent heap overflow
    - LP: #1463442
    - CVE-2015-4001
  * ozwpan: divide-by-zero leading to panic
    - LP: #1463445
    - CVE-2015-4003
  * ozwpan: Use proper check to prevent heap overflow
    - LP: #1463444
    - CVE-2015-4002
  * ozwpan: unchecked signed subtraction leads to DoS
    - LP: #1463444
    - CVE-2015-4002
  * enclosure: fix WARN_ON removing an adapter in multi-path devices
    - LP: #1415178
  * ASoC: tfa9879: Fix return value check in tfa9879_i2c_probe()
    - LP: #1465696
  * ASoC: samsung: s3c24xx-i2s: Fix return value check in
    s3c24xx_iis_dev_probe()
    - LP: #1465696
  * ASoC: dapm: Enable autodisable on SOC_DAPM_SINGLE_TLV_AUTODISABLE
    - LP: #1465696
  * ASoC: rt5677: add register patch for PLL
    - LP: #1465696
  * btrfs: unlock i_mutex after attempting to delete subvolume during send
    - LP: #1465696
  * ALSA: hda - Fix mute-LED fixed mode
    - LP: #1465696
  * ALSA: hda - Add mute-LED mode control to Thinkpad
    - LP: #1465696
  * arm64: dma-mapping: always clear allocated buffers
    - LP: #1465696
  * ALSA: emu10k1: Fix card shortname string buffer overflow
    - LP: #1465696
  * ALSA: emux: Fix mutex deadlock at unloading
    - LP: #1465696
  * drm/radeon: Use drm_calloc_ab for CS relocs
    - LP: #1465696
  * drm/radeon: adjust pll when audio is not enabled
    - LP: #1465696
  * drm/radeon: add SI DPM quirk for Sapphire R9 270 Dual-X 2G GDDR5
    - LP: #1465696
  * drm/radeon: fix lockup when BOs aren't part of the VM on release
    - LP: #1465696
  * drm/radeon: reset BOs address after clearing it.
    - LP: #1465696
  * drm/radeon: check new address before removing old one
  ...

Read more...

Changed in linux (Ubuntu Vivid):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (20.2 KiB)

This bug was fixed in the package linux - 3.16.0-43.58

---------------
linux (3.16.0-43.58) utopic; urgency=low

  [ Luis Henriques ]

  * Release Tracking Bug
    - LP: #1466792

  [ Brad Figg ]

  * Merged back Ubuntu-3.16.0-41.57 regression fix for security release

linux (3.16.0-42.56) utopic; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #1465714

  [ Chris J Arges ]

  * [config] CONFIG_IPMI_POWERNV=m on ppc64el
    - LP: #1439562

  [ Luis Henriques ]

  * [Config] Disable CONFIG_USB_OTG
    - LP: #1411295

  [ Upstream Kernel Changes ]

  * Revert "i2c: Mark adapter devices with pm_runtime_no_callbacks"
    - LP: #1465613
  * Revert "mm/hugetlb: use pmd_page() in follow_huge_pmd()"
    - LP: #1465613
  * cdc-acm: prevent infinite loop when parsing CDC headers.
    - LP: #1460657
  * drivers/char/ipmi: Add powernv IPMI driver
    - LP: #1439562
  * powerpc/powernv: Add OPAL IPMI interface
    - LP: #1439562
  * powerpc/powernv: Support OPAL requested heartbeat
    - LP: #1439562
  * powerpc/kernel: Make syscall_exit a local label
    - LP: #1439562
  * powerpc: Remove old compile time disabled syscall tracing code
    - LP: #1439562
  * powerpc/powernv: Remove "opal" prefix from pr_xxx()s
    - LP: #1439562
  * powerpc/powernv: Separate function for OPAL IRQ setup
    - LP: #1439562
  * powerpc/powernv: Add OPAL message notifier unregister function
    - LP: #1439562
  * device: Add dev_of_node() accessor
    - LP: #1439562
  * drivers/core/of: Add symlink to device-tree from devices with an OF
    node
    - LP: #1439562
  * powerpc: Add a proper syscall for switching endianness
    - LP: #1439562
  * (upstream) libata: Blacklist queued TRIM on all Samsung 800-series
    - LP: #1338706, #1449005
  * ahci: avoton port-disable reset-quirk
    - LP: #1458617
  * udf: Remove repeated loads blocksize
    - LP: #1462173
    - CVE-2015-4167
  * udf: Check length of extended attributes and allocation descriptors
    - LP: #1462173
    - CVE-2015-4167
  * (upstream)scsi_lib: remove the description string in
    scsi_io_completion()
    - LP: #1449372
  * vfs: read file_handle only once in handle_to_path
    - LP: #1416503
    - CVE-2015-1420
  * ozwpan: Use unsigned ints to prevent heap overflow
    - LP: #1463442
    - CVE-2015-4001
  * ozwpan: divide-by-zero leading to panic
    - LP: #1463445
    - CVE-2015-4003
  * ozwpan: Use proper check to prevent heap overflow
    - LP: #1463444
    - CVE-2015-4002
  * ozwpan: unchecked signed subtraction leads to DoS
    - LP: #1463444
    - CVE-2015-4002
  * net: eth: xgene: devm_ioremap() returns NULL on error
    - LP: #1458042
  * drivers: net: xgene: fix new firmware backward compatibility with older
    driver
    - LP: #1458042
  * drivers: net: xgene: constify of_device_id array
    - LP: #1458042
  * drivers: net: xgene: Add second SGMII based 1G interface
    - LP: #1458042
  * dtb: change binding name to match with newer firmware DT
    - LP: #1458042
  * dtb: xgene: Add second SGMII based 1G interface node
    - LP: #1458042
  * mlx4: Fix tx ring affinity_mask creation
    - LP: #1465613
  * net/mlx4_en: Schedule napi when RX buffers allocation fails
    - LP: #1465613
...

Changed in linux (Ubuntu Utopic):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (9.2 KiB)

This bug was fixed in the package linux - 3.13.0-57.95

---------------
linux (3.13.0-57.95) trusty; urgency=low

  [ Luis Henriques ]

  * Release Tracking Bug
    - LP: #1466592

  [ Brad Figg ]

  * Merged back Ubuntu-3.13.0-55.94 regression fix for security release

linux (3.13.0-56.93) trusty; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #1465798

  [ Upstream Kernel Changes ]

  * net: eth: xgene: devm_ioremap() returns NULL on error
    - LP: #1458042
  * drivers: net: xgene: fix new firmware backward compatibility with older
    driver
    - LP: #1458042
  * drivers: net: xgene: constify of_device_id array
    - LP: #1458042
  * drivers: net: xgene: Add second SGMII based 1G interface
    - LP: #1458042
  * net: phy: re-design phy_modes to be self-contained
    - LP: #1458042
  * dtb: change binding name to match with newer firmware DT
    - LP: #1458042
  * dtb: xgene: Add second SGMII based 1G interface node
    - LP: #1458042
  * Btrfs: make xattr replace operations atomic
    - LP: #1438501
    - CVE-2014-9710
  * cdc-acm: prevent infinite loop when parsing CDC headers.
    - LP: #1460657
  * (upstream) libata: Blacklist queued TRIM on all Samsung 800-series
    - LP: #1338706, #1449005
  * ahci: avoton port-disable reset-quirk
    - LP: #1458617
  * xfs: avoid false quotacheck after unclean shutdown
    - LP: #1461730
  * (upstream)[SCSI] Add timeout to avoid infinite command retry
    - LP: #1449372
  * (upstream)scsi_lib: remove the description string in
    scsi_io_completion()
    - LP: #1449372
  * udf: Remove repeated loads blocksize
    - LP: #1462173
    - CVE-2015-4167
  * udf: Check length of extended attributes and allocation descriptors
    - LP: #1462173
    - CVE-2015-4167
  * vfs: read file_handle only once in handle_to_path
    - LP: #1416503
    - CVE-2015-1420
  * ozwpan: Use unsigned ints to prevent heap overflow
    - LP: #1463442
    - CVE-2015-4001
  * ozwpan: divide-by-zero leading to panic
    - LP: #1463445
    - CVE-2015-4003
  * ozwpan: Use proper check to prevent heap overflow
    - LP: #1463444
    - CVE-2015-4002
  * ozwpan: unchecked signed subtraction leads to DoS
    - LP: #1463444
    - CVE-2015-4002
  * Input: elantech - add new icbody type
    - LP: #1464490
  * Bluetooth: ath3k: Add support Atheros AR5B195 combo Mini PCIe card
    - LP: #1465796
  * power_supply: twl4030_madc: Check return value of power_supply_register
    - LP: #1465796
  * power_supply: lp8788-charger: Fix leaked power supply on probe fail
    - LP: #1465796
  * ARM: dts: dove: Fix uart[23] reg property
    - LP: #1465796
  * xtensa: xtfpga: fix hardware lockup caused by LCD driver
    - LP: #1465796
  * Drivers: hv: vmbus: Fix a bug in the error path in vmbus_open()
    - LP: #1465796
  * xtensa: provide __NR_sync_file_range2 instead of __NR_sync_file_range
    - LP: #1465796
  * KVM: s390: Zero out current VMDB of STSI before including level3 data.
    - LP: #1465796
  * usb: musb: core: fix TX/RX endpoint order
    - LP: #1465796
  * drm/radeon: fix doublescan modes (v2)
    - LP: #1465796
  * usb: phy: Find the right match in devm_usb_phy_match
    - LP: #1465796
  * tools lib traceevent kbuffer: Rem...

Read more...

Changed in linux (Ubuntu Trusty):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.