ICE on code that uses vld4q_s16() NEON intrinsic

Bug #803232 reported by Michael K. Edwards
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Linaro GCC
Fix Released
Medium
Richard Sandiford
4.5
Fix Released
Medium
Richard Sandiford
4.6
Fix Released
Medium
Richard Sandiford
Linaro GCC Tracking
Fix Released
Undecided
Unassigned

Bug Description

ICE found in a compiler built from source very close to Linaro GCC 4.6 2011.06, when compiling a reimplementation of libjpeg-turbo trunk's iDCT using GCC NEON intrinsics (instead of hand-coded assembly). Attachments: original preprocessed source code, the result of stripping it down with delta, and the latter with enough missing bits restored to remove compiler warnings. The latter exhibits a somewhat different ICE. I was able to compile the original with an otherwise similarly patched variant of Linaro GCC 4.5 2011.05.

Original ICE:

build@ctbu-bld5:~/CTBU/MonoLake-foreign/opensource-core/build-trees/libjpeg-turbo/simd$ PATH=/opt/x-tools/arm-cortex_a9-
linux-gnueabi/bin:$PATH CTBU_SYSROOT=/home/build/CTBU/MonoLake/build/BLSD/release/dev-fsroot arm-cortex_a9-linux-gcc -Wall -g -O3 -mfpu=neon -c jsimd_arm.i -fPIC -DPIC -o jsimd_arm.o
jsimd_arm.i:12537:1: warning: 'parse_proc_cpuinfo' defined but not used [-Wunused-function]
jsimd_arm.i: In function 'jsimd_idct_ifast':
jsimd_arm.i:13048:1: internal compiler error: in smallest_mode_for_size, at stor-layout.c:451

ICE on the delta-stripped version:

build@ctbu-bld5:~/CTBU/MonoLake-foreign/opensource-core/build-trees/libjpeg-turbo/simd$ ./compile.sh jsimd_arm-min13.i
jsimd_arm-min13.i: In function 'vdupq_n_s16':
jsimd_arm-min13.i:16:1: warning: no return statement in function returning non-void [-Wreturn-type]
jsimd_arm-min13.i: At top level:
jsimd_arm-min13.i:37:1: warning: return type defaults to 'int' [-Wreturn-type]
jsimd_arm-min13.i: In function 'jsimd_idct_ifast':
jsimd_arm-min13.i:48:5: warning: implicit declaration of function 'vst1_u8' [-Wimplicit-function-declaration]
jsimd_arm-min13.i:48:5: warning: implicit declaration of function 'vqshrun_n_s16' [-Wimplicit-function-declaration]
jsimd_arm-min13.i:48:5: warning: implicit declaration of function 'vqaddq_s16' [-Wimplicit-function-declaration]
jsimd_arm-min13.i:52:1: warning: control reaches end of non-void function [-Wreturn-type]
jsimd_arm-min13.i:11:3: warning: 'coef.half[0].val[1]' is used uninitialized in this function [-Wuninitialized]
jsimd_arm-min13.i:41:15: note: 'coef.half[0].val[1]' was declared here
jsimd_arm-min13.i:11:3: warning: 'coef.half[1].val[3]' is used uninitialized in this function [-Wuninitialized]
jsimd_arm-min13.i:41:15: note: 'coef.half[1].val[3]' was declared here
jsimd_arm-min13.i:48:12: warning: 'coef.half[0].val[0]' is used uninitialized in this function [-Wuninitialized]
jsimd_arm-min13.i:52:1: internal compiler error: in smallest_mode_for_size, at stor-layout.c:451

ICE on the repaired version:

build@ctbu-bld5:~/CTBU/MonoLake-foreign/opensource-core/build-trees/libjpeg-turbo/simd$ ./compile.sh jsimd_arm-min12.i
jsimd_arm-min12.i: In function 'jsimd_idct_ifast':
jsimd_arm-min12.i:72:1: error: insn does not satisfy its constraints:
(insn 107 21 22 2 (set (mem/s/c:XI (post_inc:SI (reg/f:SI 4 r4 [182])) [0 MEM[(struct int8x8x4_t *)&coef].__o+0 S64 A128])
        (reg:XI 63 s0)) jsimd_arm-min12.i:40 757 {*neon_movxi}
     (expr_list:REG_INC (reg/f:SI 4 r4 [182])
        (nil)))
jsimd_arm-min12.i:72:1: internal compiler error: in reload_cse_simplify_operands, at postreload.c:403

Related branches

Revision history for this message
Michael K. Edwards (m-k-edwards) wrote :
Revision history for this message
Michael K. Edwards (m-k-edwards) wrote :

Result of repeated delta runs: 52-line minimal test case (with uninitialized variables and other compiler warnings).

Revision history for this message
Michael K. Edwards (m-k-edwards) wrote :

After repairing delta's over-aggressive trimming, I got this test case, which shows a different ICE, possibly related to the oddball treatment of XI_MODE.

Revision history for this message
Michael K. Edwards (m-k-edwards) wrote :

For some reason, GCC failed to warn on the first uninitialized use of dequant. Attaching a new version of the "repaired" source with that fixed. No change in the ICE.

Revision history for this message
Ramana Radhakrishnan (ramana) wrote :

The ICE as per comment #3 appears to be a DUP of an earlier bug #744754 that seemed to have been lost because of some extra test regressions. Pushing out another branch for testing this now to make sure there is nothing latent and for that to be merged.

I still need to investigate the ICE in comment #1 which is what I'd like this bug to remain focused on.

cheers
Ramana

Changed in gcc-linaro:
status: New → Triaged
Revision history for this message
Michael K. Edwards (m-k-edwards) wrote :

The patch at bug #744754 fixes the ICE in comment #3. The ICEs in the original and in the 52-line reduced testcase are unchanged.

Revision history for this message
Michael K. Edwards (m-k-edwards) wrote :
Revision history for this message
Richard Sandiford (rsandifo) wrote :

For avoidance of doubt, the patch in #7 was something Michael and I talked
about as a workaround, in order to unblock Michael's work. It isn't something
we'd submit or apply.

Revision history for this message
Michael K. Edwards (m-k-edwards) wrote :

Yes, a workaround, as stated in the patch description / merge request. I submitted the merge request in order to trigger a run of the gcc testsuite (which showed no changes to existing tests). I'll add tests to the patch that cover the new behavior, and resubmit; but it will still probably not be suitable for merge. (Managing spill/reload of groups of NEON registers is clearly still something of a work in progress.)

Revision history for this message
Richard Sandiford (rsandifo) wrote :
Download full text (4.5 KiB)

I'm testing the attached patch overnight and tomorrow. If all goes well,
I'll submit it on Thursday.

There seems to be a bit of history here. The current choose_reload_regs
code reads:

    if (byte == 0)
      need_mode = mode;
    else
      need_mode
        = smallest_mode_for_size
          (GET_MODE_BITSIZE (mode) + byte * BITS_PER_UNIT,
    GET_MODE_CLASS (mode) == MODE_PARTIAL_INT
    ? MODE_INT : GET_MODE_CLASS (mode));

    if ((GET_MODE_SIZE (GET_MODE (last_reg))
         >= GET_MODE_SIZE (need_mode))

Note that this is the only use of need_mode. I don't believe the mode
that is being calculated here is fundamental in any way, or that it's
used later in the reload process. We have already checked that the mode
change is allowed:

#ifdef CANNOT_CHANGE_MODE_CLASS
    /* Verify that the register it's in can be used in
       mode MODE. */
    && !REG_CANNOT_CHANGE_MODE_P (REGNO (reg_last_reload_reg[regno]),
      GET_MODE (reg_last_reload_reg[regno]),
      mode)
#endif

and have already calculated which hard register we would need to
use after the mode change:

    i = REGNO (last_reg);
    i += subreg_regno_offset (i, GET_MODE (last_reg), byte, mode);

So once we have verified that the register is suitable, we can (and do)
simply use register I in mode MODE.

I think the current mode is a historical left-over. Back in 2000 this code
was a simple check that the old register entirely encompassed the new one:

    i = REGNO (last_reg) + word;
    last_class = REGNO_REG_CLASS (i);
    if ((GET_MODE_SIZE (GET_MODE (last_reg))
         >= GET_MODE_SIZE (mode) + word * UNITS_PER_WORD)

The register we were interested in was (reg:MODE I), and this check made
sure that the old reload register defined every bit of (reg:MODE I).
When CLASS_CANNOT_CHANGE_SIZE was introduced, the code became:

    i = REGNO (last_reg) + word;
    last_class = REGNO_REG_CLASS (i);
    if (
#ifdef CLASS_CANNOT_CHANGE_SIZE
        (TEST_HARD_REG_BIT
         (reg_class_contents[CLASS_CANNOT_CHANGE_SIZE], i)
         ? (GET_MODE_SIZE (GET_MODE (last_reg))
     == GET_MODE_SIZE (mode) + word * UNITS_PER_WORD)
         : (GET_MODE_SIZE (GET_MODE (last_reg))
     >= GET_MODE_SIZE (mode) + word * UNITS_PER_WORD))
#else
        (GET_MODE_SIZE (GET_MODE (last_reg))
         >= GET_MODE_SIZE (mode) + word * UNITS_PER_WORD)
#endif

But I think this was bogus. The new size of the register was:

   GET_MODE_SIZE (mode)

rather than:

   GET_MODE_SIZE (mode) + word * UNITS_PER_WORD

Maybe something like:

   word == 0 && GET_MODE_SIZE (mode) == GET_MODE_SIZE (GET_MODE (last_reg))

would have been more accurate. Anyway, CLASS_CANNOT_CHANGE_SIZE proved
to be too limited, so it was replaced with CLASS_CANNOT_CHANGE_MODE.
The code above then became:

    need_mode = smallest_mode_for_size ((word+1) * UNITS_PER_WORD,
            GET_MODE_CLASS (mode));

    if (
#ifdef CLASS_CANNOT_CHANGE_MODE
        (TEST_HARD_REG_BIT
         (reg_class_contents[(int) CLASS_CANNOT_CHANGE_MODE], i)
         ? ! CLASS_CANNOT_CHANGE_MODE_P (GET_MODE (last_reg),
             need_mode)
         : (GET_MODE_SIZE (GET_MODE (last_reg))
     >= GET_MODE_SIZE (need_mode)))
#else
        (GET_MO...

Read more...

Revision history for this message
Michael K. Edwards (m-k-edwards) wrote :

The patch appears to work for me, in place of the earlier workaround. Thanks!

Michael Hope (michaelh1)
Changed in gcc-linaro:
status: Triaged → In Progress
importance: Undecided → Medium
assignee: nobody → Richard Sandiford (rsandifo)
Michael Hope (michaelh1)
Changed in gcc-linaro:
status: In Progress → Fix Committed
milestone: none → 4.6-2011.07
Revision history for this message
Andrew Stubbs (ams-codesourcery) wrote :
Changed in gcc-linaro-tracking:
milestone: none → 4.7.0
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.