Comment 2 for bug 1729850

Revision history for this message
Chris Coulson (chrisccoulson) wrote :

I stepped through 2 builds side-by-side in gdb - one good build built with gcc 7.1, and one bad build, built with gcc 7.2. I managed to narrow it down to a bug in sha256_block_data_order.

One of the first differences I spotted was that the good build branches almost immediately to a NEON code path (sha256_block_data_order_neon), whereas the broken build continues on the non-NEON code path.

If we look at the first few instructions of sha256_block_data_order in a good build:

   0xf7699c60 <+0>: sub r3, pc, #8
   0xf7699c64 <+4>: ldr r12, [pc, #-40] ; 0xf7699c44
   0xf7699c68 <+8>: ldr r12, [r3, r12]

The first instruction basically loads the address of the start of the function in to %r3, which we can see if we step past it:

(gdb) info registers
r0 0x413558 4273496
r1 0x413580 4273536
r2 0x1 1
r3 0xf7699c60 4150893664
r4 0x413558 4273496
r5 0xfffef35c 4294898524
r6 0x0 0
r7 0x413580 4273536
r8 0x0 0
r9 0xf77efab8 4152294072
r10 0xf77b9dec 4152073708
r11 0x0 0
r12 0x0 0
sp 0xfffef2c8 0xfffef2c8
lr 0xf7697e5c -144081316
pc 0xf7699c64 0xf7699c64 <sha256_block_data_order+4>
cpsr 0x80080010 -2146959344
(gdb) p sha256_block_data_order
$1 = {<text variable, no debug info>} 0xf7699c60 <sha256_block_data_order>

The second instruction loads a value from an address 40 bytes before the instruction in to %r12. Looking in sha256-armv4.S, this value is "OPENSSL_armcap_P - sha256_block_data_order", or the offset of OPENSSL_armcap_P from the start of sha256_block_data_order.
The third instruction loads the value of OPENSSL_armcap_P in to %r12.

Stepping through these instructions gives this state:

(gdb) info registers
r0 0x413558 4273496
r1 0x413580 4273536
r2 0x1 1
r3 0xf7699c60 4150893664
r4 0x413558 4273496
r5 0xfffef35c 4294898524
r6 0x0 0
r7 0x413580 4273536
r8 0x0 0
r9 0xf77efab8 4152294072
r10 0xf77b9dec 4152073708
r11 0x0 0
r12 0x3 3
sp 0xfffef2c8 0xfffef2c8
lr 0xf7697e5c -144081316
pc 0xf7699c6c 0xf7699c6c <sha256_block_data_order+12>
cpsr 0x80080010 -2146959344

So the value of OPENSSL_armcap_P is 3, which causes the following instructions to take the NEON path:

   0xf7699c6c <+12>: tst r12, #16
   0xf7699c70 <+16>: bne 0xf769b660 <sha256_block_data_order_armv8>
   0xf7699c74 <+20>: tst r12, #1
   0xf7699c78 <+24>: bne 0xf769aa60 <sha256_block_data_order_neon>