gdb 12.1 generates SIGILL on armhf

Bug #2041396 reported by Zixing Liu
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
gdb
Fix Released
Medium
gdb (Ubuntu)
Fix Released
Undecided
Unassigned
Jammy
Incomplete
Undecided
Unassigned

Bug Description

[ Impact ]

 * GDB 12.1 introduced a regression where it will break program execution when the program contains mixed ARM code and THUMB code.
 * Upstream stated they tested the changes on Ubuntu 20.04 and it went okay.

[ Test Plan ]

Considering the following C program:

```
__attribute__((target("arm"), noinline))
int thumb_func() {
  return 42;
}

__attribute__((target("thumb")))
int main() { return thumb_func(); }
```

If you build it using `gcc repro.c -ggdb3 -Og -o repro` and run the GDB using the following commands ...

```
b 3
r
c
```

(you can save the contents above to a file and run GDB using `gdb -x script ./repro`)

... you will notice GDB broke the program and threw SIGILL.
If you run the program without GDB, the program exits normally.

[ Where problems could occur ]

 * GDB is a complex software. As the patch suggests, it may break other use cases (like single-stepping) entirely.
 * Since this is an ARM-only patch, it's unlikely to affect other CPU architectures. However, it is possible that this fix may break ARM64 execution.

[ Other Info ]

 * This bug has been fixed in GDB 13, but the fix was never backported to GDB 12. You can find the upstream bug in the remote bug watch.

Related branches

Revision history for this message
In , Ximin Luo (infinity0) wrote :

Created attachment 14158
rustc debuginfo test sample #1

1. Compile the attached test file `rustc -g associated-types.rs`.
2. Run `gdb -x dbg.script ./associated-types`, dbg.script as follows:

~~~~
set charset UTF-8
show version
add-auto-load-safe-path /home/infinity0/rustc/./src/etc
set print pretty off
directory /home/infinity0/rustc/./src/etc
file /home/infinity0/rustc/build/armv7-unknown-linux-gnueabihf/test/debuginfo/associated-types.gdb/a
set language rust
break 'associated-types.rs':111
break 'associated-types.rs':118
break 'associated-types.rs':122
break 'associated-types.rs':130
break 'associated-types.rs':137
break 'associated-types.rs':140
run
print arg
continue
print inferred
print explicitly
continue
print arg
continue
print arg
continue
print a
print b
continue
print a
print b
continue
quit
~~~~

This works for all rustc versions (I was able to test 1.13 - 1.59) on gdb 10 but fails with SIGILL on gdb 11.2 armhf Debian.

Other rustc debuginfo tests fail with other signals, SIGSEGV, SIGABRT, etc. More specific details here: https://github.com/rust-lang/rust/issues/96983

Revision history for this message
In , Ximin Luo (infinity0) wrote :

Whoops, I copied a dbg.script with local paths. Just delete those lines and make sure you have the source file associated-types.rs in the current directory, the reproduction still works. You also need to give `RUSTC_BOOTSTRAP=` when compiling `rustc -g` as the file uses some unstable features only meant for testing the rustc compiler.

~~~~
set charset UTF-8
show version
set print pretty off
set language rust
break 'associated-types.rs':111
break 'associated-types.rs':118
break 'associated-types.rs':122
break 'associated-types.rs':130
break 'associated-types.rs':137
break 'associated-types.rs':140
run
print arg
continue
print inferred
print explicitly
continue
print arg
continue
print arg
continue
print a
print b
continue
print a
print b
continue
quit
~~~~

Revision history for this message
In , Ximin Luo (infinity0) wrote :

> `RUSTC_BOOTSTRAP=`

Whoops, this should be `RUSTC_BOOTSTRAP=1`.

Same issue still exists with gdb 12.1 on Debian armhf.

Revision history for this message
In , Ximin Luo (infinity0) wrote :

Switching off ASLR with `setarch -R` and forcing single-threaded mode with `taskset -c 0` has no effect on the bug.

Revision history for this message
In , Luis Machado (luis-gdb-machado) wrote :

Sorry for the delayed reply.

I managed to reproduce this on Ubuntu 22.04 with rustc 1.58.1, but I get a SIGSEGV. On Ubuntu 20.04, with rustc 1.57, I get a SIGILL. The gdb's are the same, top-of-trunk.

This is an issue with displaced stepping in the Arm port of GDB. If you disable it (set displaced-stepping off), the test runs fine.

I'll investigate this.

Revision history for this message
In , Luis Machado (luis-gdb-machado) wrote :

I have a WIP fix. Should hopefully be able to put it on the ML soon.

Revision history for this message
In , Luis Machado (luis-gdb-machado) wrote :
Revision history for this message
In , Cvs-commit (cvs-commit) wrote :

The master branch has been updated by Luis Machado <email address hidden>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=1e5ccb9c5ff4fd8ade4a8694676f99f4abf2d679

commit 1e5ccb9c5ff4fd8ade4a8694676f99f4abf2d679
Author: Luis Machado <email address hidden>
Date: Tue Oct 25 11:01:32 2022 +0100

    Make sure a copy_insn_closure is available when we have a match in copy_insn_closure_by_addr

    PR gdb/29272

    Investigating PR29272, it was mentioned a particular test used to work on
    GDB 10, but it started failing with GDB 11 onwards. I tracked it down to
    some displaced stepping improvements on commit
    187b041e2514827b9d86190ed2471c4c7a352874.

    In particular, one of the corner cases using copy_insn_closure_by_addr got
    silently broken. It is hard to spot because it doesn't have any good tests
    for it, and the situation is quite specific to the Arm target.

    Essentially, the change from the displaced stepping improvements made it so
    we could still invoke copy_insn_closure_by_addr correctly to return the
    pointer to a copy_insn_closure, but it always returned nullptr due to
    the order of the statements in displaced_step_buffer::prepare.

    The way it is now, we first write the address of the displaced step buffer
    to PC and then save the copy_insn_closure pointer.

    The problem is that writing to PC for the Arm target requires figuring
    out if the new PC is thumb mode or not.

    With no copy_insn_closure data, the logic to determine the thumb mode
    during displaced stepping doesn't work, and gives random results that
    are difficult to track (SIGILL, SIGSEGV etc).

    Fix this by reordering the PC write in displaced_step_buffer::prepare
    and, for safety, add an assertion to
    displaced_step_buffer::copy_insn_closure_by_addr so GDB stops right
    when it sees this invalid situation. If this gets broken again in the
    future, it will be easier to spot.

    Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=29272

    Approved-By: Simon Marchi <email address hidden>

Revision history for this message
In , Luis Machado (luis-gdb-machado) wrote :

Fixed. Please reopen if you see any issues.

affects: gdb (Debian) → gdb
Revision history for this message
Zixing Liu (liushuyu-011) wrote :
description: updated
Changed in gdb:
importance: Unknown → Medium
status: Unknown → Fix Released
Changed in gdb (Ubuntu Jammy):
milestone: none → jammy-updates
Changed in gdb (Ubuntu):
milestone: jammy-updates → none
Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

After several hours trying to obtain access to an ARM64 machine where I could test the fix, vorlon kindly provided me with credentials to a machine that's capable of launching an armhf container.

I could reproduce the bug:

# gdb -q ./a.out -ex 'b 3' -ex r -ex c
Reading symbols from ./a.out...
Breakpoint 1, thumb_func () at 1.c:3
3 return 42;
Continuing.

Program received signal SIGILL, Illegal instruction.
0x00401004 in ?? ()
...

And also verify that Liu's package fixes the problem:

# gdb -q ./a.out -ex 'b 3' -ex r -ex c
Reading symbols from ./a.out...
Breakpoint 1 at 0x4d8: file 1.c, line 3.
Starting program: /root/a.out
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".

Breakpoint 1, thumb_func () at 1.c:3
3 return 42;
Continuing.
[Inferior 1 (process 2666) exited with code 052]

Therefore, I sponsored the upload for him.

Revision history for this message
Zixing Liu (liushuyu-011) wrote :

Currently waiting for an SRU team member to process the update through the upload queue.

Revision history for this message
Timo Aaltonen (tjaalton) wrote : Please test proposed package

Hello Zixing, or anyone else affected,

Accepted gdb into jammy-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/gdb/12.1-0ubuntu1~22.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-jammy to verification-done-jammy. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-jammy. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in gdb (Ubuntu):
status: New → Fix Released
Changed in gdb (Ubuntu Jammy):
status: New → Fix Committed
tags: added: verification-needed verification-needed-jammy
Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (gdb/12.1-0ubuntu1~22.04.1)

All autopkgtests for the newly accepted gdb (12.1-0ubuntu1~22.04.1) for jammy have finished running.
The following regressions have been reported in tests triggered by the package:

linux-aws-5.19/5.19.0-1029.30~22.04.1 (arm64)
linux-aws-6.2/6.2.0-1015.15~22.04.1 (arm64)
linux-azure-5.19/5.19.0-1027.30~22.04.2 (arm64)
linux-azure-6.2/6.2.0-1016.16~22.04.1 (arm64)
linux-azure-6.5/6.5.0-1007.7~22.04.1 (arm64)
linux-gcp-5.19/5.19.0-1030.32~22.04.1 (arm64)
linux-gcp-6.2/6.2.0-1018.20~22.04.1 (arm64)
linux-gke/5.15.0-1046.51 (arm64)
linux-hwe-5.19/5.19.0-50.50 (arm64)
linux-hwe-6.2/6.2.0-39.40~22.04.1 (arm64)
linux-lowlatency/5.15.0-91.101 (arm64)
linux-lowlatency-hwe-5.19/5.19.0-1030.30 (arm64)
linux-lowlatency-hwe-6.2/6.2.0-1018.18~22.04.1 (arm64)
linux-nvidia-tegra/5.15.0-1019.19 (arm64)
linux-oracle-5.19/5.19.0-1027.30 (arm64)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/jammy/update_excuses.html#gdb

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Zixing Liu (liushuyu-011) wrote (last edit ):

Verification Report
===================

The test is conducted on an RK3399 device (with 4x ARM Cortex-A53 cores + 2x ARM Cortex-A72 cores).

Test (1) original Rust program test (associated-types.rs)
---------------------------------------------------------

GDB 12.1-0ubuntu1~22.04 (unpatched)
Rust 1.70.0+dfsg0ubuntu1~bpo2-0ubuntu0.22.04.2

Test program source: https://github.com/rust-lang/rust/blob/1.68.2/tests/debuginfo/associated-types.rs

GDB script content:
```
b associated-types.rs:111
r
c
```

Result:

```
Breakpoint 1 at 0x4838: file associated-types.rs, line 111.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".

Breakpoint 1, associated_types::assoc_struct<i32> (arg=...) at associated-types.rs:111
111 zzz(); // #break

Program received signal SIGILL, Illegal instruction.
0x00404ec4 in core::slice::cmp::{impl#5}::equal<u8, u8> (self=..., other=...) at library/core/src/slice/cmp.rs:91
91 library/core/src/slice/cmp.rs: No such file or directory.
```

>>> SRU'ed package:

GDB 12.1-0ubuntu1~22.04.1 (patched)
GCC 11.4.0-1ubuntu1~22.04

Result:
```
Breakpoint 1 at 0x4838: file associated-types.rs, line 111.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".

Breakpoint 1, associated_types::assoc_struct<i32> (arg=...) at associated-types.rs:111
111 zzz(); // #break
[Inferior 1 (process 3621) exited normally]
```

Test (2) simplified C program test (test.c)
-------------------------------------------

GDB 12.1-0ubuntu1~22.04 (unpatched)
GCC 11.4.0-1ubuntu1~22.04

Test program source:
```
__attribute__((target("arm"), noinline))
int thumb_func() {
  return 42;
}

__attribute__((target("thumb")))
int main() { return thumb_func(); }
```

Commands:
```
gcc -Og -ggdb3 test.c -o test
printf "b 3\nr\nc\n" > repro
gdb --batch -x ./repro ./test
```

Result:
```
Breakpoint 1 at 0x4d8: file test.c, line 3.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".

Breakpoint 1, thumb_func () at test.c:3
3 return 42;

Program received signal SIGILL, Illegal instruction.
0x00401004 in ?? ()
```

Disassembly of the crash site (ARM code):

```
=> 0x004004d8 <+0>: mov r0, #42 ; 0x2a
   0x004004dc <+4>: bx lr
```

Disassembly of the call site (Thumb code, +2 bytes):
```
   0x004004e0 <+0>: push {r3, lr}
   0x004004e2 <+2>: blx 0x4004d8 <thumb_func>
=> 0x004004e6 <+6>: pop {r3, pc}
```

>>> SRU'ed package:

GDB 12.1-0ubuntu1~22.04.1 (patched)
GCC 11.4.0-1ubuntu1~22.04

Result:
```
Breakpoint 1 at 0x4d8: file test.c, line 3.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".

Breakpoint 1, thumb_func () at test.c:3
3 return 42;
[Inferior 1 (process 3611) exited with code 052]
```

Conclusion
----------
GDB 12.1-0ubuntu1~22.04.1 package correctly fixed the issue described in the bug report.

tags: added: verification-done verification-done-jammy
removed: verification-needed verification-needed-jammy
Revision history for this message
Brian Murray (brian-murray) wrote :

The where problems could occur section says "it is possible that this fix may break ARM64 execution". How was it verified that this is not broken?

Changed in gdb (Ubuntu Jammy):
status: Fix Committed → Incomplete
Revision history for this message
Zixing Liu (liushuyu-011) wrote :

> The where problems could occur section says "it is possible that this fix may break ARM64 execution". How was it verified that this is not broken?

I believe this could be verified by running the GDB tests inside the Rust testing suite.
Should I queue an autopkgtest against rustc for GDB in this case?

Revision history for this message
Brian Murray (brian-murray) wrote :

Given the current size of the queues that might take some time. Could you run the test locally on the same device which you verified the fix on?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.