Comment 9 for bug 383240

Revision history for this message
Dave Martin (dave-martin-arm) wrote :

I discussed this with Måns, and on his advice I pulled across the following patches from the ffmpeg trunk (revision numbers shown are relative to svn://svn.ffmpeg.org/ffmpeg/trunk).

r18332 (ARM: NEON optimised add_pixels_clamped)
r18333 (ARM: NEON optimized put_signed_pixels_clamped)
r18535 (Add guaranteed alignment for loading dest pixels in avg_pixels16_neon)
r18601 (ARM asm for AV_RN*())
r18712 (ARM: NEON put_pixels_clamped)
r18713 (ARM: Use fewer register in NEON put_pixels _y2 and _xy2)
r18916 (ARM: NEON VP3 Loop Filter)
r18972 (ARM: add some PLD in NEON IDCT)
r19216 (ARM: slightly faster NEON H264 horizontal loop filter)

They mostly apply cleanly against ffmpeg 3:0.svn 20090303-1ubuntu6 with just a couple of minor manual merges needed. I uncommented FLAVORS += neon in debian/confflags and got a successful build, but I still need to examine and test the result.

NOTE: Because the NEON code makes use of floating-point functionality, I think that we need the following in debian/confflags:
 neon_build_confflags += --shlibdir=/usr/lib/neon/vfp (or /usr/lib/vfp/neon)
(instead of --shlibdir=/usr/lib/neon)

We may also want to add --extra-cflags="-mfpu=neon -mfloat-abi=softfp" for this configuration. According to tests we've done here, -ftree-vectorize is also beneficial by parallelising some computations (this is not enabled by default for -O2). I haven't tested these yet myself though.