AWS: out of entropy on Graviton 2 instances types (mg6.*)

Bug #1927692 reported by Andrea Righi
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux-aws (Ubuntu)
Fix Released
Undecided
Unassigned
Focal
Fix Released
Undecided
Andrea Righi

Bug Description

[Impact]

AWS Graviton 2 instances do not have enough entropy available at boot, so any task that require entropy (even reading few bytes from /dev/random) will be stuck forever.

[Fix]

The proper fix for this problem is to correctly refill the entropy pool with some real random data using some hardware-generated randomness.

In the meantime a reasonable workaround can be to apply the following upstream commits:

 30c08efec888 random: make /dev/random be almost like /dev/urandom
 48446f198f9a random: ignore GRND_RANDOM in getentropy(2)
 75551dbf112c random: add GRND_INSECURE to return best-effort non-cryptographic bytes
 c6f1deb15878 random: Add a urandom_read_nowait() for random APIs that don't warn
 4c8d062186d9 random: Don't wake crng_init_wait when crng_init == 1

In this way the system will not run out of entropy and will be able to provide best-effort randomness in any case, preventing the out of entropy issue on the AWS Gravion 2 instances.

[Test plan]

Execute the following command on any m6g instance:

  dd bs=32 count=1 if=/dev/random of=/dev/null

This should return quickly, if not it means that the system does not have enough entropy available. When the problem happens this command hangs forever.

[Where problems could occur]

This changes affect the read semantics of /dev/random to be the same as /dev/urandom except that reads will block until the CRNG is ready. This should not materially break any API. Any code that worked without these changes should work at least as well as before. However, applications that have strict randomness requirements might be affected by the provided best-effort randomness, so we may need to apply more commits/changes to introduce a proper hardware entropy support on Graviton 2 instances to provide a better quality of randomness. In the meantime these upstream changes consist a reasonable workaround to prevent applications from hanging forever on the mg6.* instances.

[Other Info]
SF: #00310204

Andrea Righi (arighi)
summary: - out of entropy on Graviton 2 instances types (mg6.*)
+ AWS: out of entropy on Graviton 2 instances types (mg6.*)
Andrea Righi (arighi)
description: updated
Changed in linux-aws (Ubuntu):
status: New → Fix Released
Changed in linux-aws (Ubuntu Focal):
status: New → In Progress
Changed in linux-aws (Ubuntu Focal):
assignee: nobody → Andrea Righi (arighi)
Tim Gardner (timg-tpi)
description: updated
Tim Gardner (timg-tpi)
Changed in linux-aws (Ubuntu Focal):
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
Andrea Righi (arighi)
tags: added: verification-done-focal
removed: verification-needed-focal
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (30.7 KiB)

This bug was fixed in the package linux-aws - 5.4.0-1049.51

---------------
linux-aws (5.4.0-1049.51) focal; urgency=medium

  * focal/linux-aws: 5.4.0-1049.51 -proposed tracker (LP: #1927595)

  * Packaging resync (LP: #1786013)
    - update dkms package versions

  * AWS: out of entropy on Graviton 2 instances types (mg6.*) (LP: #1927692)
    - random: add GRND_INSECURE to return best-effort non-cryptographic bytes
    - random: Don't wake crng_init_wait when crng_init == 1
    - random: Add a urandom_read_nowait() for random APIs that don't warn
    - random: ignore GRND_RANDOM in getentropy(2)
    - random: make /dev/random be almost like /dev/urandom

  [ Ubuntu: 5.4.0-74.83 ]

  * focal/linux: 5.4.0-74.83 -proposed tracker (LP: #1927619)
  * Introduce the 465 driver series, fabric-manager, and libnvidia-nscq
    (LP: #1925522)
    - debian/dkms-versions -- add NVIDIA 465 and migrate 450 to 460
  * linux-image-5.0.0-35-generic breaks checkpointing of container
    (LP: #1857257)
    - SAUCE: overlayfs: fix incorrect mnt_id of files opened from map_files
  * Enable CIFS GCM256 (LP: #1921916)
    - smb3: add defines for new crypto algorithms
    - smb3.1.1: add new module load parm require_gcm_256
    - smb3.1.1: add new module load parm enable_gcm_256
    - smb3.1.1: print warning if server does not support requested encryption type
    - smb3.1.1: rename nonces used for GCM and CCM encryption
    - smb3.1.1: set gcm256 when requested
    - cifs: Adjust key sizes and key generation routines for AES256 encryption
  * locking/qrwlock: Fix ordering in queued_write_lock_slowpath() (LP: #1926184)
    - locking/qrwlock: Fix ordering in queued_write_lock_slowpath()
  * [Ubuntu 21.04] net/mlx5: Fix HW spec violation configuring uplink
    (LP: #1925452)
    - net/mlx5: Fix HW spec violation configuring uplink
  * Focal update: v5.4.114 upstream stable release (LP: #1926493)
    - Revert "scsi: qla2xxx: Retry PLOGI on FC-NVMe PRLI failure"
    - Revert "scsi: qla2xxx: Fix stuck login session using prli_pend_timer"
    - scsi: qla2xxx: Dual FCP-NVMe target port support
    - scsi: qla2xxx: Fix device connect issues in P2P configuration
    - scsi: qla2xxx: Retry PLOGI on FC-NVMe PRLI failure
    - scsi: qla2xxx: Add a shadow variable to hold disc_state history of fcport
    - scsi: qla2xxx: Fix stuck login session using prli_pend_timer
    - scsi: qla2xxx: Fix fabric scan hang
    - net/sctp: fix race condition in sctp_destroy_sock
    - Input: nspire-keypad - enable interrupts only when opened
    - gpio: sysfs: Obey valid_mask
    - dmaengine: dw: Make it dependent to HAS_IOMEM
    - ARM: dts: Drop duplicate sha2md5_fck to fix clk_disable race
    - ARM: dts: Fix moving mmc devices with aliases for omap4 & 5
    - lockdep: Add a missing initialization hint to the "INFO: Trying to register
      non-static key" message
    - arc: kernel: Return -EFAULT if copy_to_user() fails
    - ASoC: max98373: Added 30ms turn on/off time delay
    - neighbour: Disregard DEAD dst in neigh_update
    - ARM: keystone: fix integer overflow warning
    - ARM: omap1: fix building with clang IAS
    - drm/msm: Fix a5xx/a6xx timestamps
    - ASoC: fsl_esai: Fix TDM s...

Changed in linux-aws (Ubuntu Focal):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.