Ryzen 3rd gen (3900X) ECC support missing from kernel

Bug #1869235 reported by araemo
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned
Eoan
Fix Released
Undecided
Unassigned

Bug Description

Ubuntu server:
Description: Ubuntu 19.10
Release: 19.10
Installed kernel:
linux-generic:
  Installed: 5.3.0.42.36
(I could not figure out what package to select in the 'in what package did you find this bug' chooser)

Expected behavior: ECC RAM is detected and utilized
Result instead:
Error in dmesg at boot:
EDAC amd64: Error: F0 not found, device 0x1460 (broken BIOS?) (more in the attached KernelErrors.log)

Kernel 5.3 is missing ECC support for some families of AMD Ryzen 3rd gen CPUs.

Support is added via these two commits in the mainline tree:
https://github.com/torvalds/linux/commit/e53a3b267fb0a79db9ca1f1e08b97889b22013e6
https://github.com/torvalds/linux/commit/3e443eb353eda6f4b4796e07f2599683fa752f1d

The second commit actually adds the missing support, but it relies on the first commit: a partial refactor of the AMD ECC code.

I tested myself, followed the instructions at https://wiki.ubuntu.com/Kernel/BuildYourOwnKernel to build my own kernel, cherry-picked the two above commits, and have working ECC support on my self-built kernel:
Linux smaug 5.3.0-43-generic #36 SMP Sat Mar 21 02:33:30 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux:

EDAC MC: Ver: 3.0.0
EDAC amd64: Node 0: DRAM ECC enabled.
EDAC amd64: F17h_M70h detected (node 0).
EDAC MC: UMC0 chip selects:
EDAC amd64: MC: 0: 0MB 1: 0MB
EDAC amd64: MC: 2: 16384MB 3: 16384MB
EDAC MC: UMC1 chip selects:
EDAC amd64: MC: 0: 0MB 1: 0MB
EDAC amd64: MC: 2: 16384MB 3: 16384MB
EDAC amd64: using x16 syndromes.
EDAC amd64: MCT channel count: 2
EDAC MC0: Giving out device to module amd64_edac controller F17h_M70h: DEV 0000:00:18.3 (INTERRUPT)
EDAC PCI0: Giving out device to module amd64_edac controller EDAC PCI controller: DEV 0000:00:18.0 (POLLED)
AMD64 EDAC driver v3.5.0

If this could be included in a future hardware support kernel release, that would be very helpful. I am unsure what the policy is for including this kind of backport in non-LTS kernels, though I know the LTS releases have hardware support updates that include this type of fix.
---
ProblemType: Bug
ApportVersion: 2.20.11-0ubuntu8.6
Architecture: amd64
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CurrentDesktop: XFCE
DistroRelease: Ubuntu 19.10
InstallationDate: Installed on 2020-03-20 (6 days ago)
InstallationMedia: Ubuntu-Server 19.10 "Eoan Ermine" - Release amd64 (20191017)
MachineType: To Be Filled By O.E.M. To Be Filled By O.E.M.
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
Package: linux (not installed)
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=C.UTF-8
 SHELL=/bin/bash
ProcFB: 0 astdrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.3.0-42-generic root=UUID=89cfcc61-dbca-43d3-9a2e-5961df9ae5b4 ro
ProcVersionSignature: Ubuntu 5.3.0-42.34-generic 5.3.18
RelatedPackageVersions:
 linux-restricted-modules-5.3.0-42-generic N/A
 linux-backports-modules-5.3.0-42-generic N/A
 linux-firmware 1.183.4
RfKill:

Tags: eoan uec-images
Uname: Linux 5.3.0-42-generic x86_64
UnreportableReason: This report is about a package that is not installed.
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip libvirt lxd plugdev sudo
_MarkForUpload: False
dmi.bios.date: 10/03/2019
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: P3.30
dmi.board.name: X470D4U2-2T
dmi.board.vendor: ASRockRack
dmi.chassis.asset.tag: To Be Filled By O.E.M.
dmi.chassis.type: 3
dmi.chassis.vendor: To Be Filled By O.E.M.
dmi.chassis.version: To Be Filled By O.E.M.
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvrP3.30:bd10/03/2019:svnToBeFilledByO.E.M.:pnToBeFilledByO.E.M.:pvrToBeFilledByO.E.M.:rvnASRockRack:rnX470D4U2-2T:rvr:cvnToBeFilledByO.E.M.:ct3:cvrToBeFilledByO.E.M.:
dmi.product.family: To Be Filled By O.E.M.
dmi.product.name: To Be Filled By O.E.M.
dmi.product.sku: To Be Filled By O.E.M.
dmi.product.version: To Be Filled By O.E.M.
dmi.sys.vendor: To Be Filled By O.E.M.

CVE References

Revision history for this message
araemo (araemo) wrote :

Attachment was missed during initial report

Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. It seems that your bug report is not filed about a specific source package though, rather it is just filed against Ubuntu in general. It is important that bug reports be filed about source packages so that people interested in the package can find the bugs about it. You can find some hints about determining what package your bug might be about at https://wiki.ubuntu.com/Bugs/FindRightPackage. You might also ask for help in the #ubuntu-bugs irc channel on Freenode.

To change the source package that this bug is filed about visit https://bugs.launchpad.net/ubuntu/+bug/1869235/+editstatus and add the package name in the text box next to the word Package.

[This is an automated message. I apologize if it reached you inappropriately; please just reply to this message indicating so.]

tags: added: bot-comment
Paul White (paulw2u)
affects: ubuntu → linux (Ubuntu)
tags: added: eoan
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1869235

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
araemo (araemo) wrote : AlsaInfo.txt

apport information

tags: added: apport-collected uec-images
description: updated
Revision history for this message
araemo (araemo) wrote : CRDA.txt

apport information

Revision history for this message
araemo (araemo) wrote : CurrentDmesg.txt

apport information

Revision history for this message
araemo (araemo) wrote : IwConfig.txt

apport information

Revision history for this message
araemo (araemo) wrote : Lspci.txt

apport information

Revision history for this message
araemo (araemo) wrote : Lsusb.txt

apport information

Revision history for this message
araemo (araemo) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
araemo (araemo) wrote : ProcInterrupts.txt

apport information

Revision history for this message
araemo (araemo) wrote : ProcModules.txt

apport information

Revision history for this message
araemo (araemo) wrote : PulseList.txt

apport information

Revision history for this message
araemo (araemo) wrote : UdevDb.txt

apport information

Revision history for this message
araemo (araemo) wrote : WifiSyslog.txt

apport information

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Alex Hung (alexhung) wrote :

@araemo,

Thanks for reporting.

I will prepare a SRU for eoan kernel (5.3). Can you try the below kernel (Ubuntu-5.3.0-42.34 + cherry-picked e53a3b267fb0 & 3e443eb353ed). If your problem goes away I will send the patches for SRU reviews.

https://people.canonical.com/~alexhung/LP1869235/

We may need your help test proposed kernel before next kernel updates later as I do not have the hardware to verify it.

Revision history for this message
araemo (araemo) wrote :

Yes, that version worked:
$ dmesg | grep EDAC
[ 0.801863] EDAC MC: Ver: 3.0.0
[ 21.536674] EDAC amd64: Node 0: DRAM ECC enabled.
[ 21.536674] EDAC amd64: F17h_M70h detected (node 0).
[ 21.536704] EDAC MC: UMC0 chip selects:
[ 21.536704] EDAC amd64: MC: 0: 0MB 1: 0MB
[ 21.536705] EDAC amd64: MC: 2: 16384MB 3: 16384MB
[ 21.536707] EDAC MC: UMC1 chip selects:
[ 21.536708] EDAC amd64: MC: 0: 0MB 1: 0MB
[ 21.536708] EDAC amd64: MC: 2: 16384MB 3: 16384MB
[ 21.536708] EDAC amd64: using x16 syndromes.
[ 21.536709] EDAC amd64: MCT channel count: 2
[ 21.536759] EDAC MC0: Giving out device to module amd64_edac controller F17h_M70h: DEV 0000:00:18.3 (INTERRUPT)
[ 21.536766] EDAC PCI0: Giving out device to module amd64_edac controller EDAC PCI controller: DEV 0000:00:18.0 (POLLED)
[ 21.536767] AMD64 EDAC driver v3.5.0

$ sudo edac-util
edac-util: No errors to report.

Changed in linux (Ubuntu Eoan):
status: New → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-eoan' to 'verification-done-eoan'. If the problem still exists, change the tag 'verification-needed-eoan' to 'verification-failed-eoan'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-eoan
Revision history for this message
araemo (araemo) wrote :

Unfortunately, my server had a hardware error that is preventing POST. I will not have the warrantied replacement motherboard within 5 days, so I won't be able to validate this on the live system. I did see that the necessary changes got committed though.

Alex Hung (alexhung)
tags: added: verification-done-eoan
removed: verification-needed-eoan
Revision history for this message
araemo (araemo) wrote :

Further verification now that my replacement motherboard arrived:
$ dmesg | grep EDAC
[ 0.800614] EDAC MC: Ver: 3.0.0
[ 18.710159] EDAC amd64: Node 0: DRAM ECC enabled.
[ 18.710160] EDAC amd64: F17h_M70h detected (node 0).
[ 18.710194] EDAC MC: UMC0 chip selects:
[ 18.710195] EDAC amd64: MC: 0: 0MB 1: 0MB
[ 18.710195] EDAC amd64: MC: 2: 16384MB 3: 16384MB
[ 18.710198] EDAC MC: UMC1 chip selects:
[ 18.710198] EDAC amd64: MC: 0: 0MB 1: 0MB
[ 18.710199] EDAC amd64: MC: 2: 16384MB 3: 16384MB
[ 18.710199] EDAC amd64: using x16 syndromes.
[ 18.710200] EDAC amd64: MCT channel count: 2
[ 18.710273] EDAC MC0: Giving out device to module amd64_edac controller F17h_M70h: DEV 0000:00:18.3 (INTERRUPT)
[ 18.710296] EDAC PCI0: Giving out device to module amd64_edac controller EDAC PCI controller: DEV 0000:00:18.0 (POLLED)
[ 18.710296] AMD64 EDAC driver v3.5.0
$ uname -a
Linux smaug 5.3.0-48-generic #41-Ubuntu SMP Fri Apr 10 06:59:18 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
$ sudo edac-util
edac-util: No errors to report.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 5.3.0-51.44

---------------
linux (5.3.0-51.44) eoan; urgency=medium

  * CVE-2020-11884
    - SAUCE: s390/mm: fix page table upgrade vs 2ndary address mode accesses

 -- Thadeu Lima de Souza Cascardo <email address hidden> Wed, 22 Apr 2020 17:35:41 -0300

Changed in linux (Ubuntu Eoan):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.