Failed to acknowledge elog: /sys/firmware/opal/elog/0x5018d709/acknowledge (2:No such file or directory)

Bug #1619552 reported by bugproxy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
Unassigned
Xenial
Fix Released
Undecided
Tim Gardner
Yakkety
Fix Released
Undecided
Unassigned

Bug Description

== Comment: #0 - Mukesh K. Ojha <email address hidden> - 2016-09-02 02:10:14 ==
---Problem Description---
Kernel is failed to free the kobject while acknowledging the error log which has been notified two times.

Contact Information = Contact Information = <email address hidden>

---uname output---
Ubuntu 16.04.01 LTS

Machine Type = All power machine

---Debugger---
A debugger is not configured

---Steps to Reproduce---

Steps to reproduce:
1.Boot the system to petitboot.
2. Issue FSP Soft reset of Service Processor From ASMI page.
3. After FSP comes up, boot the system to host OS.
4. In the host OS seen the failure to acknowledge for one of elog.

root@p8wookie ~]# service opal_errd status
Redirecting to /bin/systemctl status opal_errd.service
opal_errd.service - opal_errd (PowerNV platform error handling) Service
   Loaded: loaded (/usr/lib/systemd/system/opal_errd.service; enabled)
   Active: active (running) since Wed 2016-08-03 07:55:03 CDT; 2min 4s ago
  Process: 3452 ExecStart=/usr/libexec/ppc64-diag/opal_errd start (code=exited, status=0/SUCCESS)
 Main PID: 3497 (opal_errd)
   CGroup: /system.slice/opal_errd.service
           ??3497 /usr/sbin/opal_errd

Aug 03 07:57:03 p8wookie.aus.stglabs.ibm.com ELOG[3497]: LID[5018d709]::SRC[B1763435]::Other Subsystems::Informational Event::No service action required
Aug 03 07:57:03 p8wookie.aus.stglabs.ibm.com ELOG[3497]: Failed to acknowledge elog: /sys/firmware/opal/elog/0x5018d709/acknowledge (2:No such file or directory)
Aug 03 07:57:04 p8wookie.aus.stglabs.ibm.com ELOG[3497]: LID[5018d709]::SRC[B1763435]::Other Subsystems::Informational Event::No service action required
Aug 03 07:57:04 p8wookie.aus.stglabs.ibm.com ELOG[3497]: Failed to acknowledge elog: /sys/firmware/opal/elog/0x5018d709/acknowledge (2:No such file or directory)
Aug 03 07:57:05 p8wookie.aus.stglabs.ibm.com ELOG[3497]: LID[5018d709]::SRC[B1763435]::Other Subsystems::Informational Event::No service action required
Aug 03 07:57:05 p8wookie.aus.stglabs.ibm.com ELOG[3497]: Failed to acknowledge elog: /sys/firmware/opal/elog/0x5018d709/acknowledge (2:No such file or directory)
Aug 03 07:57:06 p8wookie.aus.stglabs.ibm.com ELOG[3497]: LID[5018d709]::SRC[B1763435]::Other Subsystems::Informational Event::No service action required
Aug 03 07:57:06 p8wookie.aus.stglabs.ibm.com ELOG[3497]: Failed to acknowledge elog: /sys/firmware/opal/elog/0x5018d709/acknowledge (2:No such file or directory)
Aug 03 07:57:07 p8wookie.aus.stglabs.ibm.com ELOG[3497]: LID[5018d709]::SRC[B1763435]::Other Subsystems::Informational Event::No service action required
Aug 03 07:57:07 p8wookie.aus.stglabs.ibm.com ELOG[3497]: Failed to acknowledge elog: /sys/firmware/opal/elog/0x5018d709/acknowledge (2:No such file or directory)

[root@p8wookie ~]# cd /sys/firmware/opal/elog/
[root@p8wookie elog]# ls
0x5018d709
[root@p8wookie elog]#

Stack trace output:
 no

Oops output:
 no

System Dump Info:
  The system is not configured to capture a system dump.

*Additional Instructions for Contact Information = <email address hidden>:
-Attach sysctl -a output output to the bug.

== Comment: #1 - Mukesh K. Ojha <email address hidden> - 2016-09-02 02:12:45 ==
Upstream commit :

commit a9cbf0b2195b695cbeeeecaa4e2770948c212e9a
Author: Mukesh Ojha <email address hidden>
Date: Mon Aug 22 12:17:44 2016 +0530

    powerpc/powernv : Drop reference added by kset_find_obj()

    In a situation, where Linux kernel gets notified about duplicate error log
    from OPAL, it is been observed that kernel fails to remove sysfs entries
    (/sys/firmware/opal/elog/0xXXXXXXXX) of such error logs. This is because,
    we currently search the error log/dump kobject in the kset list via
    'kset_find_obj()' routine. Which eventually increment the reference count
    by one, once it founds the kobject.

Above patch is the solution of this bugzilla.

Kindly pull this patch in for both Ubuntu 16.04 LTS and Ubuntu 16.10.

-Mukesh

CVE References

bugproxy (bugproxy)
tags: added: architecture-ppc64le bugnameltc-145819 severity-high targetmilestone-inin16041
Changed in ubuntu:
assignee: nobody → Taco Screen team (taco-screen-team)
affects: ubuntu → kernel-package (Ubuntu)
Steve Langasek (vorlon)
affects: kernel-package (Ubuntu) → linux (Ubuntu)
Revision history for this message
Tim Gardner (timg-tpi) wrote :

merged in 4.8-rc5

Changed in linux (Ubuntu Yakkety):
assignee: Taco Screen team (taco-screen-team) → nobody
status: New → Fix Released
Changed in linux (Ubuntu Xenial):
assignee: nobody → Tim Gardner (timg-tpi)
status: New → In Progress
Revision history for this message
Tim Gardner (timg-tpi) wrote :
Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Committed
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2016-09-27 14:40 EDT-------
Hi,

I will verify and update this Bugzilla as soon as i get the system.

-Mukesh

Revision history for this message
bugproxy (bugproxy) wrote :
Download full text (3.9 KiB)

------- Comment From <email address hidden> 2016-09-28 14:42 EDT-------
Before the patch::
===========

root@p8wookie:~# ps -ef | grep opal
root 782 2 0 13:39 ? 00:00:00 [kopald]
root 783 2 0 13:39 ? 00:00:00 [irq/29-opal-elo]
root 784 2 0 13:39 ? 00:00:00 [irq/30-opal-dum]
root 4089 1 0 13:40 ? 00:00:00 /usr/sbin/opal_errd
root 4733 4648 0 13:42 pts/0 00:00:00 grep --color=auto opal

root@p8wookie:~# ls /sys/firmware/opal/elog/
0x50007844

root@p8wookie:~# vi /var/log/syslog
Sep 28 13:42:56 p8wookie ELOG[4089]: LID[50007844]::SRC[B1763435]::Other Subsystems::Informational Event::No service action required
Sep 28 13:42:56 p8wookie ELOG[4089]: Failed to acknowledge elog: /sys/firmware/opal/elog/0x50007844/acknowledge (2:No such file or directory)
Sep 28 13:42:57 p8wookie ELOG[4089]: LID[50007844]::SRC[B1763435]::Other Subsystems::Informational Event::No service action required
Sep 28 13:42:57 p8wookie ELOG[4089]: Failed to acknowledge elog: /sys/firmware/opal/elog/0x50007844/acknowledge (2:No such file or directory)
Sep 28 13:42:58 p8wookie ELOG[4089]: LID[50007844]::SRC[B1763435]::Other Subsystems::Informational Event::No service action required
Sep 28 13:42:58 p8wookie ELOG[4089]: Failed to acknowledge elog: /sys/firmware/opal/elog/0x50007844/acknowledge (2:No such file or directory)
Sep 28 13:42:59 p8wookie ELOG[4089]: LID[50007844]::SRC[B1763435]::Other Subsystems::Informational Event::No service action required
Sep 28 13:42:59 p8wookie ELOG[4089]: Failed to acknowledge elog: /sys/firmware/opal/elog/0x50007844/acknowledge (2:No such file or directory)
Sep 28 13:43:00 p8wookie ELOG[4089]: LID[50007844]::SRC[B1763435]::Other Subsystems::Informational Event::No service action required
Sep 28 13:43:00 p8wookie ELOG[4089]: Failed to acknowledge elog: /sys/firmware/opal/elog/0x50007844/acknowledge (2:No such file or directory)
Sep 28 13:43:01 p8wookie ELOG[4089]: LID[50007844]::SRC[B1763435]::Other Subsystems::Informational Event::No service action required
Sep 28 13:43:01 p8wookie ELOG[4089]: Failed to acknowledge elog: /sys/firmware/opal/elog/0x50007844/acknowledge (2:No such file or directory)
Sep 28 13:43:02 p8wookie ELOG[4089]: LID[50007844]::SRC[B1763435]::Other Subsystems::Informational Event::No service action required
Sep 28 13:43:02 p8wookie ELOG[4089]: Failed to acknowledge elog: /sys/firmware/opal/elog/0x50007844/acknowledge (2:No such file or directory)
Sep 28 13:43:03 p8wookie ELOG[4089]: LID[50007844]::SRC[B1763435]::Other Subsystems::Informational Event::No service action required
Sep 28 13:43:03 p8wookie ELOG[4089]: Failed to acknowledge elog: /sys/firmware/opal/elog/0x50007844/acknowledge (2:No such file or directory)

After the patch:
==========

root@p8wookie:~# uname -a
Linux p8wookie 4.8.0-rc6mukesh+ #2 SMP Wed Sep 28 14:09:57 EDT 2016 ppc64le ppc64le ppc64le GNU/Linux
root@p8wookie:~# cat /etc/os-release
NAME="Ubuntu"
VERSION="16.10 (Yakkety Yak)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.10"
VERSION_ID="16.10"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/...

Read more...

tags: added: verification-done-xenial
removed: verification-needed-xenial
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-09-29 01:44 EDT-------
Hi,

I verified it on Yaketty it work as expected.

But Xenial does not have the above patch.

https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial

-Mukesh

tags: added: verification-done-yakkety
removed: verification-done-xenial
Tim Gardner (timg-tpi)
tags: added: verification-done-xenial
removed: verification-done-yakkety
tags: added: verification-needed-xenial
removed: verification-done-xenial
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-10-04 04:54 EDT-------
Hi Tim,

Can you please remove this tag 'verification-needed-xenial' for now as the fix is not
yet released for Xenial.

Enable the tag once it is released for Xenial.

-Mukesh

Revision history for this message
Tim Gardner (timg-tpi) wrote :

commit 8f1d3b68ff08466f55c55e03f9aa4d989c515095 (powerpc/powernv : Drop reference added by kset_find_obj()) was released in UBUNTU: Ubuntu-4.4.0-39.59 which never made it out of proposed. It has since been superseded by Ubuntu-4.4.0-41.61 due to regressions respins. That is the version that you should be verifying.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-10-04 16:51 EDT-------
system is not booting after selecting host from petitboot with Ubuntu-4.4.0-41.61 kernel
and its is not related to this patch even without this patch also it is falling into below case.

mukesh@mukesh-ThinkPad-T450:~/Downloads/skiboot$ ipmitool -I lanplus -H 9.40.192.56 -P PASSW0RD sol deactivate
mukesh@mukesh-ThinkPad-T450:~/Downloads/skiboot$ ipmitool -I lanplus -H 9.40.192.56 -P PASSW0RD sol activate
[SOL Session operational. Use ~? for help]

Gave up waiting for root device. Common problems:
- Boot args (cat /proc/cmdline)
- Check rootdelay= (did the system wait long enough?)
- Check root= (did the system wait for the right device?)
- Missing modules (cat /proc/modules; ls /dev)
ALERT! UUID=82a543eb-e7c0-4cc5-93f9-3a7394a948c3 does not exist. Dropping to a shell!

BusyBox v1.22.1 (Ubuntu 1:1.22.0-19ubuntu2) built-in shell (ash)
Enter 'help' for a list of built-in commands.

(initramfs)
(initramfs)
(initramfs)
(initramfs)
(initramfs)
(initramfs)
(initramfs)
(initramfs)

-Mukesh

Revision history for this message
Tim Gardner (timg-tpi) wrote :

Mukesh - you need to get your developers to look at this failure to boot.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-10-06 08:52 EDT-------
Before the patch::

root@p8wookie:~# dmesg | less
root@p8wookie:~# cat /etc/os-release
NAME="Ubuntu"
VERSION="16.04.1 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.1 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
UBUNTU_CODENAME=xenial
root@p8wookie:~# uname -a
Linux p8wookie 4.4.19+ #1 SMP Thu Oct 6 05:51:06 CDT 2016 ppc64le ppc64le ppc64le GNU/Linux
root@p8wookie:~#

root@p8wookie:~# ps -ef | grep opal
root 654 2 0 07:10 ? 00:00:00 [kopald]
root 655 2 0 07:10 ? 00:00:00 [irq/29-opal-elo]
root 656 2 0 07:10 ? 00:00:00 [irq/30-opal-dum]
root 3185 1 0 07:17 ? 00:00:00 /usr/sbin/opal_errd
root 3580 2276 0 07:18 pts/0 00:00:00 grep --color=auto opal

root@p8wookie:~# ls /sys/firmware/opal/elog/
0x5005d751

After the patch:

root@p8wookie:/home/ubuntu/xenial# ps -ef | grep opal
root 654 2 0 07:47 ? 00:00:00 [kopald]
root 655 2 0 07:47 ? 00:00:00 [irq/29-opal-elo]
root 656 2 0 07:47 ? 00:00:00 [irq/30-opal-dum]
root 3915 1 0 07:47 ? 00:00:00 /usr/sbin/opal_errd
root 4583 4552 0 07:50 pts/0 00:00:00 grep --color=auto opal
root@p8wookie:/home/ubuntu/xenial#
root@p8wookie:/home/ubuntu/xenial# ls /sys/firmware/opal/elog/
root@p8wookie:/home/ubuntu/xenial# cat /etc/os-release
NAME="Ubuntu"
VERSION="16.04.1 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.1 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
UBUNTU_CODENAME=xenial
root@p8wookie:/home/ubuntu/xenial# uname -a
Linux p8wookie 4.4.21+ #3 SMP Thu Oct 6 07:28:43 CDT 2016 ppc64le ppc64le ppc64le GNU/Linux
root@p8wookie:/home/ubuntu/xenial#

root@p8wookie:/home/ubuntu/xenial# dmesg | grep Duplicate
[ 1.161752] ELOG:Duplicate log =500627d0
root@p8wookie:/home/ubuntu/xenial#

It works as expected.

-Mukesh

tags: added: verification-done-xenial
removed: verification-needed-xenial
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (17.5 KiB)

This bug was fixed in the package linux - 4.4.0-42.62

---------------
linux (4.4.0-42.62) xenial; urgency=low

  * Fix GRO recursion overflow for tunneling protocols (LP: #1631287)
    - tunnels: Don't apply GRO to multiple layers of encapsulation.
    - gro: Allow tunnel stacking in the case of FOU/GUE

  * CVE-2016-7039
    - SAUCE: net: add recursion limit to GRO

linux (4.4.0-41.61) xenial; urgency=low

  [ Kamal Mostafa ]

  * Release Tracking Bug
    - LP: #1628204

  * nvme drive probe failure (LP: #1626894)
    - (fix) NVMe: Don't unmap controller registers on reset

linux (4.4.0-40.60) xenial; urgency=low

  [ Kamal Mostafa ]

  * Release Tracking Bug
    - LP: #1627074

  * Permission denied in CIFS with kernel 4.4.0-38 (LP: #1626112)
    - Fix memory leaks in cifs_do_mount()
    - Compare prepaths when comparing superblocks
    - SAUCE: Fix regression which breaks DFS mounting

  * Backlight does not change when adjust it higher than 50% after S3
    (LP: #1625932)
    - SAUCE: i915_bpo: drm/i915/backlight: setup and cache pwm alternate
      increment value
    - SAUCE: i915_bpo: drm/i915/backlight: setup backlight pwm alternate
      increment on backlight enable

linux (4.4.0-39.59) xenial; urgency=low

  [ Joseph Salisbury ]

  * Release Tracking Bug
    - LP: #1625303

  * thunder: chip errata w/ multiple CQEs for a TSO packet (LP: #1624569)
    - net: thunderx: Fix for issues with multiple CQEs posted for a TSO packet

  * thunder: faulty TSO padding (LP: #1623627)
    - net: thunderx: Fix for HW issue while padding TSO packet

  * CVE-2016-6828
    - tcp: fix use after free in tcp_xmit_retransmit_queue()

  * Sennheiser Officerunner - cannot get freq at ep 0x83 (LP: #1622763)
    - SAUCE: (no-up) ALSA: usb-audio: Add quirk for sennheiser officerunner

  * Backport E3 Skylake Support in ie31200_edac to Xenial (LP: #1619766)
    - EDAC, ie31200_edac: Add Skylake support

  * Ubuntu 16.04 - Full EEH Recovery Support for NVMe devices (LP: #1602724)
    - SAUCE: nvme: Don't suspend admin queue that wasn't created

  * ISST-LTE:pNV: system ben is hung during ST (nvme) (LP: #1620317)
    - blk-mq: Allow timeouts to run while queue is freezing
    - blk-mq: improve warning for running a queue on the wrong CPU
    - blk-mq: don't overwrite rq->mq_ctx

  * lsattr 32bit does not work on 64bit kernel (Inappropriate ioctl error)
    (LP: #1619918)
    - btrfs: bugfix: handle FS_IOC32_{GETFLAGS, SETFLAGS, GETVERSION} in
      btrfs_ioctl

  * radeon: monitor connected to onboard VGA doesn't work with Xenial
    (LP: #1600092)
    - drm/radeon/dp: add back special handling for NUTMEG

  * initramfs includes qle driver, but not firmware (LP: #1623187)
    - qed: add MODULE_FIRMWARE()

  * [Hyper-V] Rebase Hyper-V to 4.7.2 (stable) (LP: #1616677)
    - hv_netvsc: Implement support for VF drivers on Hyper-V
    - hv_netvsc: Fix the list processing for network change event
    - Drivers: hv: vmbus: Introduce functions for estimating room in the ring
      buffer
    - Drivers: hv: vmbus: Use READ_ONCE() to read variables that are volatile
    - Drivers: hv: vmbus: Export the vmbus_set_event() API
    - lcoking/barriers, arch: Use smp barriers...

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-10-17 02:52 EDT-------
(In reply to comment #16)
> This bug was fixed in the package linux - 4.4.0-42.62

Thanks!

-Vasant

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.