[SRU] Include powerpc port upstream fixes to librpmem 1.10 on pmdk package

Bug #1931063 reported by bugproxy
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
Fix Released
Medium
Ubuntu on IBM Power Systems Bug Triage
pmdk (Ubuntu)
Fix Released
Undecided
Unassigned
Hirsute
Fix Released
Undecided
Paride Legovini
Impish
Fix Released
Undecided
Unassigned

Bug Description

[Impact]

On ppc64el the librpmem checks for RPMEM_RAW_BUFF_SIZE and LANE_ALIGN_SIZE are broken because they're not using the reference values for the target architecture.

[Test Plan]

(Adapted from comment #13.)

The upstream source tree contains a test suite that can be used for the verification.

1. Unpack the source package.
2. Configure src/test/testconfig.sh. See example below.
 - To do that copy the testconfig.sh.example to testconfig.sh.
   Fill it with the configurations for your system.
 - Be sure to set the variable PMDK_LIB_PATH_NONDEBUG with the path to
   librpmem.so as installed by the package to test.
 - As you are not on a DAX device use PMEM_FS_DIR_FORCE_PMEM=1.
 - As this will be a librpmem test we need to configure the node
   variables. Also ssh keys must be configured to access the nodes
   without password.
3. Inside the src/test/ directory run "./RUNTESTS -b nondebug rpmem_fip".

Here is an example of testconfig.sh:

PMEM_FS_DIR=/tmp/pmem-fs.d
NON_PMEM_FS_DIR=/tmp/non-pmem-fs.d
PMEM_FS_DIR_FORCE_PMEM=1
RDMAV_FORK_SAFE=1
PMDK_LIB_PATH_NONDEBUG=/usr/lib/powerpc64le-linux-gnu/
TEST_BUILD="nondebug"
TEST_TIMEOUT=6m
TM=1
KEEP_GOING=y
CLEAN_FAILED=y
UNITTEST_LOG_LEVEL=0
UNITTEST_LOG_LEVEL=1
NODE[0]=127.0.0.1
NODE[1]=127.0.0.1
NODE[2]=127.0.0.1
NODE[3]=127.0.0.1
NODE_ADDR[0]=127.0.0.1
NODE_ADDR[1]=127.0.0.1
NODE_ADDR[2]=127.0.0.1
NODE_ADDR[3]=127.0.0.1
NODE_WORKING_DIR[0]=/tmp/node0
NODE_WORKING_DIR[1]=/tmp/node1
NODE_WORKING_DIR[2]=/tmp/node2
NODE_WORKING_DIR[3]=/tmp/node3
NODE_ENV[0]="PMEM_IS_PMEM_FORCE=1 RDMAV_FORK_SAFE=1"
NODE_ENV[1]="PMEM_IS_PMEM_FORCE=1 RDMAV_FORK_SAFE=1"
NODE_ENV[2]="PMEM_IS_PMEM_FORCE=1 RDMAV_FORK_SAFE=1"
NODE_ENV[3]="PMEM_IS_PMEM_FORCE=1 RDMAV_FORK_SAFE=1"
TEST_PROVIDERS=sockets

[Where problems could occur]

On architectures where this was "not a bug" nothing changes, there is just one more indirection in some #defines, but the same values as before will be set. On affected arch (ppc64el) I don't see a possible case of "users relying on the bug". The scope of the fix is limited.

[Development Fix]

Already fixed in 1.11.0-1 (new upstream release, sync from Debian experimental).

[Original Description]

== Comment: #0 - Lucas Alexandre Mello Magalhaes <email address hidden> - 2021-06-04 14:06:47 ==
On PPC64LE there is an issue on librpmem check for RPMEM_RAW_BUFF_SIZE. This is
fixed upstream already. Please include the follow commits to include the fix.

652659830 rpmem: Fix RPMEM_RAW_BUFF_SIZE and LANE_ALIGN_SIZE for powerpc64le
e672c09d9 common: Move page_size.h from common to core

The following patches are fixes to the unity test. Please include them if you need for testing.

bc048c7e4 test: Fix obj_rpmem_heap_state for ppc64le
736e42b1d test: Fix pmempool_sync_remote for ppc64
2e1a6a1da test: fix rpmem_basic for ppc64le
74e3ca419 test: Add create_recovery_file_absolute
ffeb20d6c test: Fix rpmemd_obc POOL_DESC_SIZE redefinition
aa7aae2b4 test: Fix tools/fip includes

Related branches

bugproxy (bugproxy)
tags: added: architecture-ppc64le bugnameltc-193088 severity-medium targetmilestone-inin---
Changed in ubuntu:
assignee: nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
affects: ubuntu → kernel-package (Ubuntu)
Revision history for this message
Frank Heimes (fheimes) wrote :

I think this affects pmdk (rather the Linux kernel).

affects: kernel-package (Ubuntu) → pmdk (Ubuntu)
Changed in ubuntu-power-systems:
assignee: nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
importance: Undecided → Medium
Changed in pmdk (Ubuntu):
assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) → Canonical Server Team (canonical-server)
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2021-06-07 10:25 EDT-------
> I think this affects pmdk (rather the Linux kernel).

Correct.

Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: New → Triaged
Revision history for this message
Frank Heimes (fheimes) wrote : Re: Include powerpc port upstream fixes to librpmem 1.10 on pmdk package

Since the requested packages hould be incl. in pmdk 1.10
and since we have the following versions in the ubuntu releases:
 pmdk | 1.4.1-0ubuntu1~18.04.1 | bionic-updates/universe | source
 pmdk | 1.8-1ubuntu1 | focal | source
 pmdk | 1.9-2 | groovy | source
 pmdk | 1.10-1ubuntu1 | hirsute | source
 pmdk | 1.10-1ubuntu1 | impish | source
I think this affects hirsute and impish only.

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2021-06-17 09:51 EDT-------
(In reply to comment #8)
> Since the requested packages hould be incl. in pmdk 1.10
> and since we have the following versions in the ubuntu releases:
> pmdk | 1.4.1-0ubuntu1~18.04.1 | bionic-updates/universe | source
> pmdk | 1.8-1ubuntu1 | focal | source
> pmdk | 1.9-2 | groovy | source
> pmdk | 1.10-1ubuntu1 | hirsute | source
> pmdk | 1.10-1ubuntu1 | impish | source
> I think this affects hirsute and impish only.

That's correct.

tags: added: server-next
Paride Legovini (paride)
Changed in pmdk (Ubuntu Hirsute):
assignee: nobody → Paride Legovini (paride)
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2021-07-22 08:52 EDT-------
Is there any updates on this issue?

Revision history for this message
Paride Legovini (paride) wrote : Re: Include powerpc port upstream fixes to librpmem 1.10 on pmdk package

According to the pmdk upstream git repository the two commits identified as fixing this bug have been released in version 1.11.0:

https://github.com/pmem/pmdk/commit/652659830
https://github.com/pmem/pmdk/commit/e672c09d9

which is now in Impish, so I'm going to mark the Impish task as Fix Released.

Changed in pmdk (Ubuntu Impish):
status: New → Fix Released
assignee: Canonical Server Team (canonical-server) → nobody
Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: Triaged → In Progress
Paride Legovini (paride)
Changed in pmdk (Ubuntu Hirsute):
status: New → In Progress
summary: - Include powerpc port upstream fixes to librpmem 1.10 on pmdk package
+ [SRU] Include powerpc port upstream fixes to librpmem 1.10 on pmdk
+ package
Revision history for this message
Paride Legovini (paride) wrote :

I am setting up a PPA for preliminary testing before the actual upload. Unfortunately the package build queue for ppc64el is very long at the moment; I'll share a link to the PPA as soon as the package is ready.

description: updated
Revision history for this message
Paride Legovini (paride) wrote :

Hi, so here is the PPA with amd64 and s390x packages including the fix:

https://launchpad.net/~paride/+archive/ubuntu/pmdk-lp1931063

I'll work towards uploading it to -proposed for the formal SRU/verification process, however early feedback on the packages will help and speed things up. @lamm or others: if you can proceed with some testing of those packages I'll greatly appreciate. Thanks!

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2021-07-27 13:11 EDT-------
(In reply to comment #13)
> Hi, so here is the PPA with amd64 and s390x packages including the fix:
>
> https://launchpad.net/~paride/+archive/ubuntu/pmdk-lp1931063
>
> I'll work towards uploading it to -proposed for the formal SRU/verification
> process, however early feedback on the packages will help and speed things
> up. @lamm or others: if you can proceed with some testing of those packages
> I'll greatly appreciate. Thanks!

Thanks @paride! Sure, I will test it here.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2021-07-29 10:36 EDT-------
@pride I'm getting the above issue with the librpmem tests. I've tried testing
the upstream code and the issue still stays on Ubuntu. However I don't get this on other distros, so I'm suspecting it's some libfabric issue.

rpmem_fip/TEST0: SETUP (check/none/nondebug/sockets/GPSPM)
A process has executed an operation involving a call
to the fork() system call to create a child process.

As a result, the libfabric EFA provider is operating in
a condition that could result in memory corruption or
other system errors.

For the libfabric EFA provider to work safely when fork()
is called, you will need to set the following environment
variable:
RDMAV_FORK_SAFE

However, setting this environment variable can result in
signficant performance impact to your application due to
increased cost of memory registration.

You may want to check with your application vendor to see
if an application-level alternative (of not using fork)
exists.

Your job will now abort.

Revision history for this message
Paride Legovini (paride) wrote :

Thanks @lamm for testing. Just to be sure: is log excerpt you just pasted a new issue which is not present in the version currently in Hirsute? In other words, does it look like a regression introduced by the patches which are included in the PPA package?

Revision history for this message
Paride Legovini (paride) wrote :

I tried reproducing the issue by recompiling the source package from my PPA and from the current tip of pmdk git on an up-to-date Hirsute container running on:

$ arch
ppc64le

$ uname -a
Linux paride-h 4.15.0-147-generic #151-Ubuntu SMP Fri Jun 18 19:17:52 UTC 2021 ppc64le ppc64le ppc64le GNU/Linux

A simple "manual" build+test run done with:

 - make
 - make test

seems to fully succeed. Possible caveats of my approach:

 - It's a container
 - Tests not running as root (but doesn't seem to be needed)
 - No real persistent memory device present

Do you have any suggestion on how I can reproduce the "rpmem_fip/TEST0" issue you are hitting? Thanks!

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2021-07-30 10:31 EDT-------
> Do you have any suggestion on how I can reproduce the "rpmem_fip/TEST0"
> issue you are hitting? Thanks!

Hi @paride. I spent some time trying to identify the issue. Indeed it occurs with
the provided package. Here is some steps to test it:

1. Install the provided librpmem and libfabric
2. Clone PMDK repository https://github.com/pmem/pmdk.git
3. Checkout to branch stable-1.10
4. Compile
5. Configure src/test/testconfig.sh. I will put an example on the end of the message
- To do that copy the testconfig.sh.example to testconfig.sh. Fill it with the configurations for your system.
- Be sure to set the variable PMDK_LIB_PATH_NONDEBUG with the path to librpmem.so.
- As you are not on a DAX device use PMEM_FS_DIR_FORCE_PMEM=1.
- As this will be a librpmem test we need to configure the node variables. Also ssh keys must be configured to access the nodes without password. For example, using the testconfig.sh bellow you will need to run "ssh-copy-id 127.0.0.1"
6. Inside the src/test/ directory run "./RUNTESTS -b nondebug rpmem_fip -s TEST0". This will run only one rpmem_fip/TEST0 for nondebug. It's enough to reproduce the
error

Here is an example of testconfig.sh:

PMEM_FS_DIR=/tmp/pmem-fs.d
NON_PMEM_FS_DIR=/tmp/non-pmem-fs.d
PMEM_FS_DIR_FORCE_PMEM=1
PMDK_LIB_PATH_NONDEBUG=/usr/lib/powerpc64le-linux-gnu/
TEST_BUILD="nondebug"
TEST_TIMEOUT=6m
TM=1
KEEP_GOING=y
CLEAN_FAILED=y
UNITTEST_LOG_LEVEL=0
UNITTEST_LOG_LEVEL=1
NODE[0]=127.0.0.1
NODE[1]=127.0.0.1
NODE[2]=127.0.0.1
NODE[3]=127.0.0.1
NODE_ADDR[0]=127.0.0.1
NODE_ADDR[1]=127.0.0.1
NODE_ADDR[2]=127.0.0.1
NODE_ADDR[3]=127.0.0.1
NODE_WORKING_DIR[0]=/tmp/node0
NODE_WORKING_DIR[1]=/tmp/node1
NODE_WORKING_DIR[2]=/tmp/node2
NODE_WORKING_DIR[3]=/tmp/node3
NODE_ENV[0]="PMEM_IS_PMEM_FORCE=1"
NODE_ENV[1]="PMEM_IS_PMEM_FORCE=1"
NODE_ENV[2]="PMEM_IS_PMEM_FORCE=1"
NODE_ENV[3]="PMEM_IS_PMEM_FORCE=1"
TEST_PROVIDERS=sockets

Revision history for this message
Paride Legovini (paride) wrote :

Thanks, I'll follow the steps you provided and see if I can reproduce the issue.

Revision history for this message
Paride Legovini (paride) wrote :

Hi @lamm, I managed to reproduce the "fork() system call" failure you described. For some reason my /tmp/node* directories do not get automatically created/populated, despite testconfig.sh.example saying that "they will be created". However by manually populating them with the required test files I got the save error message as yours.

Here is my take on it:

1. You are right saying that libfabric is involved. That error message has been introduced in this libfabric commit:

  https://github.com/ofiwg/libfabric/commit/b40ce3531dcfc79f3356e2c01701058a8e2ef4f4

AIUI that commit disabled the default usage of a fork-safety mode, as it affected performance and was imperfect in any case. The suggestion is to force a (different/better?) fork-safe mode by setting RDMAV_FORK_SAFE=1.

2. Apparently rpmem_fip/TEST0 has fork() calls, and thus triggers the warning and abort() implemented in that b40ce35 commit. Setting RDMAV_FORK_SAFE=1, i.e. setting the following in testconfig.sh:

NODE_ENV[0]="PMEM_IS_PMEM_FORCE=1 RDMAV_FORK_SAFE=1"

(and similar for the other NODEs) makes rpmem_fip/TEST0 pass for me. So this is probably not a bug in libfabric and strictly speaking not a bug in pmdk, but the pmdk tests may need to be updated to work by default with the newer versions of libfabric. It may be worth filing an upstream pmdk bug.

3. Commit b40ce35 was first released in libfabric 1.11.0, which was first released in Hirsute. This is consistent with the fact that you were not seeing that failure in the pre-Hirsute Ubuntu releases. It is worth checking which version of libfabric is in the other distros you tested. If it's < 1.11.0 then it's all consistent.

4. If we agree we don't have a regression in pmdk here, let's go back to the issue with RPMEM_RAW_BUFF_SIZE and LANE_ALIGN_SIZE. That issue should be fixed in the test packages in my PPA (https://launchpad.net/~paride/+archive/ubuntu/pmdk-lp1931063). By setting RDMAV_FORK_SAFE=1 you should be able to verify, so we can proceed with the SRU for Hirsute.

Let me know WDYT, and thanls again for the feedback and testing.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2021-08-30 15:31 EDT-------
Hi @paride,

@lamm is out of the office, so let me try to reply to your comment.

Nice catch!
I confirmed the tests do pass with RDMAV_FORK_SAFE=1.

I also verified the fix has been integrated and the issue has been fixed.

With that said, I'm closing the issue on my side.

Thanks!

tags: added: targetmilestone-inin2104
removed: targetmilestone-inin---
Paride Legovini (paride)
description: updated
Paride Legovini (paride)
description: updated
description: updated
Revision history for this message
Brian Murray (brian-murray) wrote : Please test proposed package

Hello bugproxy, or anyone else affected,

Accepted pmdk into hirsute-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/pmdk/1.10-1ubuntu1.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-hirsute to verification-done-hirsute. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-hirsute. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in pmdk (Ubuntu Hirsute):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-hirsute
Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: In Progress → Fix Committed
Revision history for this message
Paride Legovini (paride) wrote :

Hi @tulioqm and @lamm,

The package in hirsute-proposed requiring verification is the same published in the test PPA above, modulo a change in the version string. Even if they should be otherwise identical please make sure to test the package from hirsute-proposed (version: 1.10-1ubuntu1.1) and not the PPA one.

I gave a shot at the verification locally and it LGTM, but given that the issue this bug is about is quite specific I'll wait for your verification results before marking the hirsute-proposed package as verification-done. Thanks!

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2021-09-08 14:19 EDT-------
> The package in hirsute-proposed requiring verification is the same published
> in the test PPA above, modulo a change in the version string. Even if they
> should be otherwise identical please make sure to test the package from
> hirsute-proposed (version: 1.10-1ubuntu1.1) and not the PPA one.
>
> I gave a shot at the verification locally and it LGTM, but given that the
> issue this bug is about is quite specific I'll wait for your verification
> results before marking the hirsute-proposed package as verification-done.

I've also tested version 1.10-1ubuntu1.1 and it LGTM too.

Thanks!

Revision history for this message
Frank Heimes (fheimes) wrote :

@tulioqm thank you for the verification (I'm adjusting the tag accordingly ...)

tags: added: verification-done verification-done-hirsute
removed: verification-needed verification-needed-hirsute
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package pmdk - 1.10-1ubuntu1.1

---------------
pmdk (1.10-1ubuntu1.1) hirsute; urgency=medium

  * d/p/lp1931063-rpmem-Fix-RPMEM_RAW_BUFF_SIZE-and-LANE_ALIGN_SIZE.patch:
    Fix RPMEM_RAW_BUFF_SIZE and LANE_ALIGN_SIZE for ppc64el (LP: #1931063)

 -- Paride Legovini <email address hidden> Mon, 26 Jul 2021 14:17:53 +0200

Changed in pmdk (Ubuntu Hirsute):
status: Fix Committed → Fix Released
Revision history for this message
Brian Murray (brian-murray) wrote : Update Released

The verification of the Stable Release Update for pmdk has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: Fix Committed → Fix Released
Revision history for this message
Paride Legovini (paride) wrote :

Upstream mentions the RPMEM_RAW_BUFF_SIZE and LANE_ALIGN_SIZE fix as part of the 1.11.1 release in the release notes [1], however the relevant commits that I mentioned in comment 6 are already in 1.11.0 (I had a look at the actual source package for 1.11.0-2), so I'm still convinced this is Fix Released in Impish.

[1] https://github.com/pmem/pmdk/releases/tag/1.11.1

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2021-10-04 09:10 EDT-------
(In reply to comment #27)
> Upstream mentions the RPMEM_RAW_BUFF_SIZE and LANE_ALIGN_SIZE fix as part of
> the 1.11.1 release in the release notes [1], however the relevant commits
> that I mentioned in comment 6 are already in 1.11.0 (I had a look at the
> actual source package for 1.11.0-2), so I'm still convinced this is Fix
> Released in Impish.

Indeed, the commits are on upstream 1.11.0. The release notes are mistaken. I agree this is Fix Released in Impish.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.