Integrate and enable ARMv5TE/v6/VFP and NEON optimisations from ffmpeg trunk for armel

Bug #383240 reported by Dave Martin
32
This bug affects 2 people
Affects Status Importance Assigned to Milestone
ffmpeg (Ubuntu)
Fix Released
Medium
Loïc Minier
Karmic
Fix Released
Medium
Loïc Minier

Bug Description

Binary package hint: ffmpeg

Since quite a lot of ARM architecture optimisations are now integrated in the ffmpeg trunk, ideally Ubuntu should enable them for Karmic onwards.

The related changes have been committed to the ffmpeg trunk at svn://svn.ffmpeg.org/ffmpeg/trunk from 2008-08-25 onwards; however, most of the support is newer than this. Some support was present in the source used for Jaunty, but it was not enabled (no run-time detection was implemented, and the NEON code was statically disabled with configure --disable-neon).

Måns Rullgård (the ffmpeg trunk developer who did the optimisations) is concerned that the hwcaps information provided by the kernel to userspace is not really sufficient for runtime processor feature detection, and is looking at ways of providing more detailed capability information to userspace.

However, this support is speculative at present, and is unlikely to be merged in all the right places in time for Karmic, so we may need a simpler approach, whether it's based on run-time hwcaps checks in libavcodec, or using multiple library build passes and the hwcaps-based shared library selection in libc, or something else.

ffmpeg also contains some VFP, armv5te and armv6 optimisations. Based on the v6+VFP baseline for Karmic, I suggest that these can be turned on statically, when the karmic toolchain has stabilised sufficiently.

Tags: armel
Loïc Minier (lool)
Changed in ffmpeg (Ubuntu Karmic):
assignee: nobody → Canonical Mobile Team (canonical-mobile)
Revision history for this message
Reinhard Tartler (siretart) wrote : Re: [Bug 383240] [NEW] Integrate and enable ARMv5TE/v6/VFP and NEON optimisations from ffmpeg trunk for armel

Dave Martin <email address hidden> writes:

> Public bug reported:
>
> Binary package hint: ffmpeg
>
> Since quite a lot of ARM architecture optimisations are now integrated
> in the ffmpeg trunk, ideally Ubuntu should enable them for Karmic
> onwards.
>
> The related changes have been committed to the ffmpeg trunk at
> svn://svn.ffmpeg.org/ffmpeg/trunk from 2008-08-25 onwards; however, most
> of the support is newer than this. Some support was present in the
> source used for Jaunty, but it was not enabled (no run-time detection
> was implemented, and the NEON code was statically disabled with
> configure --disable-neon).

feel free to backport these changes to the 0.5 branch of ffmpeg!

> Måns Rullgård (the ffmpeg trunk developer who did the optimisations) is
> concerned that the hwcaps information provided by the kernel to
> userspace is not really sufficient for runtime processor feature
> detection, and is looking at ways of providing more detailed capability
> information to userspace.

checking hwcaps is only necessary at load time, not at link time, since
libavcodec is compiled several times on arm for different targets. This
means that the way the package is built, runtime-cpu detection support
is not necessary.

> However, this support is speculative at present, and is unlikely to be
> merged in all the right places in time for Karmic, so we may need a
> simpler approach, whether it's based on run-time hwcaps checks in
> libavcodec, or using multiple library build passes and the hwcaps-based
> shared library selection in libc, or something else.

The latter approach is already implemented since intrepid. ARM/Neon
support has been introduced by Loïc Minier just in time for jaunty.
Since I don't have experience nor hardware I need to rely on
patches by people who have that expertise.

> ffmpeg also contains some VFP, armv5te and armv6 optimisations. Based
> on the v6+VFP baseline for Karmic, I suggest that these can be turned
> on statically, when the karmic toolchain has stabilised sufficiently.

Feel free to propose further compilation runs and compiler switches, we
can integrate them rather easily. confirm with [1] and feel free to
propose patches to that file.

Fussnoten:
[1] http://git.debian.org/?p=pkg-multimedia/ffmpeg-debian.git;a=blob;f=debian/confflags
--
Gruesse/greetings,
Reinhard Tartler, KeyID 945348A4

Paul Larson (pwlars)
tags: added: arm
Paul Larson (pwlars)
tags: added: armel
removed: arm
Revision history for this message
Dave Martin (dave-martin-arm) wrote :

OK, so far so good.

The confflags definitions for NEON look OK, but it's still disabled.

It should be safe to uncomment the FLAVORS += neon line now, because we deliberately disable NEON in the kernel for Jaunty to avoid applications and libraries accidentally believing it works. (This also means neon is not present in the reported hwcaps).

Revision history for this message
Reinhard Tartler (siretart) wrote : Re: [Bug 383240] Re: Integrate and enable ARMv5TE/v6/VFP and NEON optimisations from ffmpeg trunk for armel

Dave Martin <email address hidden> writes:

> It should be safe to uncomment the FLAVORS += neon line now, because we
> deliberately disable NEON in the kernel for Jaunty to avoid applications
> and libraries accidentally believing it works. (This also means neon is
> not present in the reported hwcaps).

So you're suggesting we should deliberatly and knowingly waste buildd
time for a flavor nobody is technically able to use?

Well, we could of course do that, but I'd rather prefer we wouldn't.

--
Gruesse/greetings,
Reinhard Tartler, KeyID 945348A4

Revision history for this message
Loïc Minier (lool) wrote :

I think there are various different things here:
- ffmpeg has a mechanism to do multiple passes (multiple builds) and produce multiple shared libs which can be loaded by glibc's hwcaps-based selection mechanism automatically
- ffmpeg has grown some runtime detection of the feature (instead of build time selection / autodetection on the buildd)
- NEON was broken in hardware on imx51 TO1 which is all hardware which was around for jaunty

What we want:
i) karmic should use the latest ffmpeg
ii) the latest ffmpeg should use NEON automatically at runtime
iii) we want to backport key stuff to some PPA to show the NEON stuff based on jaunty (as it's impractical to play with karmic as it's still quite unstable)
iv) keep the Debian and Ubuntu packaging as identical as possible

I guess i) will happen as ffmpeg updates over the karmic cycle.

ii) and iv) are a bit tricky: are all SIMD features properly runtime detected in the latest ffmpeg? or is there still value in glibc-hwcaps-selected libs? which ARM configure flags can we safely turn on in the main armel build so that it's compatible with the Debian armel architecture expectations?
I think we want to enable everything which is fully runtime detected in the main armel build, then if anything else NEON remains we can consider building the special NEON pass to have an even more optimized library.

iii) should probably be discussed somewhere else I guess, but will happen after the karmic version works anyway :-)

Revision history for this message
Reinhard Tartler (siretart) wrote :

Loïc Minier <email address hidden> writes:

> I think there are various different things here:
> - ffmpeg has a mechanism to do multiple passes (multiple builds) and produce multiple shared libs which can be loaded by glibc's hwcaps-based selection mechanism automatically
> - ffmpeg has grown some runtime detection of the feature (instead of build time selection / autodetection on the buildd)
> - NEON was broken in hardware on imx51 TO1 which is all hardware which was around for jaunty

perhaps the kernel shouldn't advertise NEON hwcaps in this case then?

> What we want:
> i) karmic should use the latest ffmpeg
> ii) the latest ffmpeg should use NEON automatically at runtime
> iii) we want to backport key stuff to some PPA to show the NEON stuff based on jaunty (as it's impractical to play with karmic as it's still quite unstable)
> iv) keep the Debian and Ubuntu packaging as identical as possible
>
> I guess i) will happen as ffmpeg updates over the karmic cycle.

Debian and ubuntu track the FFmpeg 0.5 release. From what I get of this
thread of discussion, I believe the arm related changes have been
implemented in the trunk branch of ffmpeg after 0.5 was branched of. I'd
be against switching back to trunk because of instabilities it would
cause in gstreamer0.10-ffmpeg, mplayer and vlc. You could literally feel
the rate of bugreports dropping down after these package finally agreed
on a common ffmpeg version to use.

I'd rather suggest to backport the changes to the 0.5 release branch
instead of tracking again ffmpeg trunk.

--
Gruesse/greetings,
Reinhard Tartler, KeyID 945348A4

Revision history for this message
Dave Martin (dave-martin-arm) wrote :

To clarify from my POV:

  * For Karmic, we expect there to be additional ARM hardware supported by the Ubuntu release, fully supporting NEON (though this is yet to be confirmed). It will be nice if we can backport the new packages for Jaunty, but that's a secondary concern.
  * I don't expect that ffmpeg will have full runtime detection for ARM features in the Karmic timeframe, so we may still need multiple flavours and the glibc hwcaps mechanism for now. This could be worth investigating again; but the last time I looked, the detection was done via #ifdef.

The basic problem with (ii) is that the information provided by hwcaps is rather coarse. Mans doesn't favour hwcaps and was looking at defining a better mechanism, but it would require kernel changes and so could be time-consuming to stabilise/merge.

Revision history for this message
Loïc Minier (lool) wrote :

@Reinhard: right, and the jaunty kernel is indeed built with CONFIG_NEON off which implies it wont expose the neon hwcap. However people with the affected hardware should run jaunty and karmic can assume sane hardware and neon in the kernel in a near future.

Dave poked Måns Rullgård over email who explained among other things that ffmpeg wouldn't do feature runtime detection on ARM until we have a better kernel interface; so there's nothing desirable in trunk for runtime detection. There might be more optimizations in trunk.

Måns Rullgård recommended the ld.so hwcaps for now which should be simple to uncomment in the rules :-)

Revision history for this message
Reinhard Tartler (siretart) wrote :

excellent, thanks for clarification Loïc!
--
Gruesse/greetings,
Reinhard Tartler, KeyID 945348A4

Paul Larson (pwlars)
Changed in ffmpeg (Ubuntu Karmic):
assignee: Canonical Mobile Team (canonical-mobile) → Ubuntu armel porters (ubuntu-armel)
Revision history for this message
Dave Martin (dave-martin-arm) wrote :

I discussed this with Måns, and on his advice I pulled across the following patches from the ffmpeg trunk (revision numbers shown are relative to svn://svn.ffmpeg.org/ffmpeg/trunk).

r18332 (ARM: NEON optimised add_pixels_clamped)
r18333 (ARM: NEON optimized put_signed_pixels_clamped)
r18535 (Add guaranteed alignment for loading dest pixels in avg_pixels16_neon)
r18601 (ARM asm for AV_RN*())
r18712 (ARM: NEON put_pixels_clamped)
r18713 (ARM: Use fewer register in NEON put_pixels _y2 and _xy2)
r18916 (ARM: NEON VP3 Loop Filter)
r18972 (ARM: add some PLD in NEON IDCT)
r19216 (ARM: slightly faster NEON H264 horizontal loop filter)

They mostly apply cleanly against ffmpeg 3:0.svn 20090303-1ubuntu6 with just a couple of minor manual merges needed. I uncommented FLAVORS += neon in debian/confflags and got a successful build, but I still need to examine and test the result.

NOTE: Because the NEON code makes use of floating-point functionality, I think that we need the following in debian/confflags:
 neon_build_confflags += --shlibdir=/usr/lib/neon/vfp (or /usr/lib/vfp/neon)
(instead of --shlibdir=/usr/lib/neon)

We may also want to add --extra-cflags="-mfpu=neon -mfloat-abi=softfp" for this configuration. According to tests we've done here, -ftree-vectorize is also beneficial by parallelising some computations (this is not enabled by default for -O2). I haven't tested these yet myself though.

Revision history for this message
Dave Martin (dave-martin-arm) wrote :

Having played a bit, --extra-cflags="-mfpu=neon -mfloat-abi=softfp" seems to be _required_ for the neon flavour in order for the NEON optimisations to get built.

-ftree-vectorize probably isn't needed explicitly, because the build seems to default to -O3.

Revision history for this message
Reinhard Tartler (siretart) wrote :

Dave Martin <email address hidden> writes:

> Having played a bit, --extra-cflags="-mfpu=neon -mfloat-abi=softfp"
> seems to be _required_ for the neon flavour in order for the NEON
> optimisations to get built.

Is Mans aware of that? If yes, then the upstream configure file should
be patched to automatically add these settings to CFLAGS.

--
Gruesse/greetings,
Reinhard Tartler, KeyID 945348A4

Revision history for this message
Dave Martin (dave-martin-arm) wrote :

The upstream configure script doesn't currently seem to work this way for any of the ARM extensions. Instead, you seem need to build with the right --extra-cflags as well as enabling the extension(s) you want, such as --enable-armvfp or --enable-neon.

The configure script has the facilities to add cflags in this way though; it's used for the ultraspac vis extension (whatever that is); but I don't know what the upstream community prefers.

In addition to the above, it looks like -march=armv7-a should also be added to extra-cflags when building the NEON flavour; otherwise the ffmpeg configuration assumes that the PLD (cache preload) instruction can't be used because the baseline build architecture is armv5. But it doesn't really make sense to assume that any ARMv7 features may be absent if NEON is present.

This would give --extra-cflags='-march=armv7-a -mfpu=neon -mfloat-abi=softfp' for the neon flavour.

I'll point Måns at this thread and ask for his comments.

Revision history for this message
Mans Rullgard (mansr) wrote :

Let me explain how the ARM feature selection in FFmpeg works. First, there is the question of runtime feature detection.
This is not done on ARM for two reasons: 1) building the entire code for the correct target makes a big difference in
performance, and 2) there is currently no good way to detect processor features from userspace.

The CPU selection flags passed to the compiler are determined by the --cpu configure option. Specific optimisations
are activated by compiling a test file using a representative instruction. The assembler will reject this if the chosen target
does not support that instruction. For this approach to work, the compiler must be either statically configured for the
correct target, or the appropriate flags must be passed. The FFmpeg configure script has no means of determining
correct values for -mfpu and similar flags. If any such flags are required, these must be set using --extra-cflags.

The correct way to configure for Cortex-A8 with NEON looks like this:

  configure --arch=arm --cpu=cortex-a8 --extra-cflags='-mfpu=neon -mfloat-abi=softfp' --cross-prefix=arm-none-linux-gnueabi-

Add other flags such as --sysroot, --enable/disable-stuff as necessary. There is rarely any need to explicitly --disable
anything, since optional features are only enabled if the required support libraries etc are available.

Hope this clarifies the situation.

As for the ultrasparc flags, this is weird and probably broken. I'd fix it, but I don't have the hardware.

Revision history for this message
Mans Rullgard (mansr) wrote :

Ugh, sorry about the line breaks.

Revision history for this message
Reinhard Tartler (siretart) wrote :

Ah, thanks a lot for that explanation, Mans. Given that, now I agree that Dave's approach outlined in comment #9 seems correct to me here.

Dave, please attach your patches to this bug, I can then take care to integrate them into the ffmpeg package.

Revision history for this message
Dave Martin (dave-martin-arm) wrote :

Will do— I think I now have a good patch set, but I want to check it works. I'm currently doing a fresh build.

One other question Mans:

Because the baseline architecture used by the toolchain is -march=armv5t, many of the NEON sources fail to build even with -mfpu=neon, because of the presence of PLD instructions (not supported in v5).

My current fix for this is to add -march=armv7-a in extra-cflags for the whole NEON-enabled build pass: my rationale for this is that the presence of NEON implies v7, which means that v7 features may as well be enabled for the whole pass. (This seems a general problem with -mfpu= and -march/cpu= which allow non-existent architectures, such as v5+NEON or NEON-without-v7 to be selected when used together.)

Is this the right thing to do, or do you have a different suggestion?

Revision history for this message
Mans Rullgard (mansr) wrote :

You should run configure with --cpu=cortex-a8 or --cpu=armv7-a. Do not use --extra-cflags for the -mcpu or -march flags, as this will not give optimal settings. Specifically, it will not allow FFmpeg to use unaligned memory accesses even though armv6 and later supports them.

Revision history for this message
Mans Rullgard (mansr) wrote :

BTW, you'll need to pull the checkin (r19308) I did a few minutes ago too.

Revision history for this message
Loïc Minier (lool) wrote :

Will look into this next week or so when I've setup a beagleboard I expect any time soon :-P

Changed in ffmpeg (Ubuntu Karmic):
assignee: Ubuntu armel porters (ubuntu-armel) → Loïc Minier (lool)
Revision history for this message
Dave Martin (dave-martin-arm) wrote :

Hi again,

I think I now have a stable build, by pulling the following revisions from svn://svn.ffmpeg.org/ffmpeg/trunk:

r18332
r18333
r18535
r18600
r18601
r18712
r18713
r18916
r18917
r18972
r19216
r19308
r19345

The attached patch enables the NEON build flavour and applicable build options in debian/confflags. The patch is against the karmic source, but the upstream Debian source is not much different.

To within that difference, I've succesfully built the upstream source package and the karmic package, and they seem to work OK, with a decent improvement in video decode speed for H.264 (I haven't tested exhaustively).

Paul Larson (pwlars)
Changed in ffmpeg (Ubuntu Karmic):
status: New → Triaged
Revision history for this message
Mans Rullgard (mansr) wrote :

I just checked in a Vorbis optimisation giving 12% speedup. Something to consider including, perhaps.

Revision history for this message
Dave Martin (dave-martin-arm) wrote :

Is any progress expected on merging these changes for Karmic?

Revision history for this message
Mans Rullgard (mansr) wrote :

More good stuff committed:

r19806 ARM: NEON optimised FFT and MDCT
r19817 ARM: faster NEON IMDCT
r19818 Prepare for optimised forward MDCT implementations
r19819 ARM: NEON optimised MDCT

The first two triple the speed of several audio decoders. The last two speed up some audio encoders by an unknown amount.

Revision history for this message
Loïc Minier (lool) wrote :

I've started work on this, but the ffmpeg source package has been building all day on my beagle board so I expect it will take me a couple of days to complete; sorry for the delay.

Changed in ffmpeg (Ubuntu Karmic):
status: Triaged → In Progress
Revision history for this message
Dave Martin (dave-martin-arm) wrote :

Thanks for having a go!

Did you apply the extra changes suggested by Mans too? I've not tried applying those myself yet, but I'd expect them to apply cleanly. FFT/[I]MDCT optimisations are likely to benefit a lot of codecs.

Revision history for this message
Mans Rullgard (mansr) wrote :

You'll want r19846 too, as it fixes a silly mistake I made. I don't think anything was affected by it in practice, but I don't like bugs.

I have another small speedup of [I]MDCT pending too. Hopefully I'll get it in this week.

Revision history for this message
Mans Rullgard (mansr) wrote :

The MDCT improvement is in. However, it required some minor changes scattered all over the place, so considering the speedup is only a few %, it may not be worth the trouble to patch this.

Next up: massive speedup of AAC decoder. I hope to finish it during the week.

Changed in ffmpeg (Ubuntu Karmic):
importance: Undecided → Medium
Revision history for this message
Loïc Minier (lool) wrote :
Download full text (10.9 KiB)

So I'm taking a fresh look on this bug from the start. Sorry I didn't make more time for NEON earlier in the cycle. This is kind of a brain dump of where I stand.

First, back to your initial comment:
"ffmpeg also contains some VFP, armv5te and armv6 optimisations. Based on the v6+VFP baseline for Karmic, I suggest that these can be turned on statically, when the karmic toolchain has stabilised sufficiently."

So what you're saying is that we should configure ffmpeg for armv6 + VFP by default in Ubuntu. I agree and changed:
nooptflags += --disable-armv5te --disable-armv6 --disable-armv6t2
nooptflags += --disable-armvfp --disable-neon

to:
nooptflags += --enable-armv6 --disable-armv6t2
nooptflags += --enable-armvfp --disable-neon

And disabled the now useless VFP flavour.

I didn't add conditions for Debian vs Ubuntu; need to check with Reinhard how he'd like these to look like.

Since NEON implies v7+ and since you said ffmpeg's NEON implementation requires VFP, I've updated the neon flavour (which can be used on Debian and Ubuntu) to use:
        --shlibdir=/usr/lib/neon/vfp --enable-armvfp -mcpu=armv7-a \
        --extra-cflags="-mfpu=neon -mfloat-abi=softfp"

NB: I used -mcpu=armv7-a instead of -mcpu=cortex-a8 because cortex implies the fast_unaligned ffmpeg flag. Could you confirm that implementing NEON implies fast_unaligned?

I looked at all upstream commits with NEON in the commit message (git log --grep=NEON -i or gitk --grep=NEON -i).

Then the first missing NEON commit I found was indeed r18332.

I found these interesting revs:

bd97c6665a522e7f64ee1456ed9c39f7cde7234f 18332
        ARM: NEON optimised add_pixels_clamped
2a9b6b26f8aefcbe16ad3dd62c09e7224f55afb4 18333
        ARM: NEON optimized put_signed_pixels_clamped
e496c588b4dda7ef2372dbd562b3e288e3c7c81c 18535
        Add guaranteed alignment for loading dest pixels in avg_pixels16_neon
d1779570309a76649c83f4969583bf8733b942cb 18712
        ARM: NEON put_pixels_clamped
1ac68a162bd6dcc80da9e950368b1fe13d4dadf7 18713
        ARM: Use fewer register in NEON put_pixels _y2 and _xy2
3104a36996462c14e3b84c852e76280676088add 18916
        ARM: NEON VP3 Loop Filter
2a82c2a332b439a3c929e936909c83dfcc147343 18944
        NEON-OBJS should also be cleared for each subdir.
14727897af399dc8480ca09f39f8f93acb8d3029 18972
        ARM: add some PLD in NEON IDCT
8a81301ef5051b8ee571cb3bd0bf4cfbefaf68a3 19216
        ARM: slightly faster NEON H264 horizontal loop filter
03586fda4fb70f488e61294e5200f43650117535 19345
        ARM: NEON VP3 IDCT
eb75e8538b294d7f097a1f53a546c353011d9471 19438
        Require aligned memory for everything that needs it
1dc725bbd6a1cc6a572aff0702753dc3559d657b 19745
        ARM: handle VFP register arguments in ff_vector_fmul_window_neon()
822b4ce43bcb76899277a893e01eda322264402d 19494
        Only compile in NEON optimizations for H.264 when the H.264 decoder is enabled.
71b3800785cd7aaf6f0aed00be58b7ed007f31bb 19637
        ARM: NEON optimised vorbis_inverse_coupling
6768555cd5b51c60b061f3a41fa4ea5536e9b2e2 19806
        ARM: NEON optimised FFT and MDCT
a4f63...

Revision history for this message
Loïc Minier (lool) wrote :
Revision history for this message
Loïc Minier (lool) wrote :

I started pulling r19955 and r19956, but r19956 has a bunch of changes to the AAC decoder which don't apply probably because I need more aac.c changes. aac.c saw a bunch more commits in the recent history, it's a very active file. I don't think it's reasonable to try to backport its changes.

Then there's the problem that the changes touch dsputil.h which API is used all over the place. I'm not too happy to change it as it might impact other decoders or even other applications, but almost all interesting changes touch it.

So I dropped revs after 19957 except r20151 and am trying another build.

Revision history for this message
Loïc Minier (lool) wrote :
Revision history for this message
Loïc Minier (lool) wrote :

Replying to self:
> NB: I used -mcpu=armv7-a instead of -mcpu=cortex-a8 because cortex implies the fast_unaligned ffmpeg flag. Could you confirm that implementing NEON implies fast_unaligned?

I pulled r19308 (ARM: enable fast_unaligned when --cpu=armv[67] is specified):
--- a/configure
+++ b/configure
@@ -1792,6 +1792,10 @@ if test $cpu != "generic"; then
             add_cflags -mcpu=$cpu
             enable fast_unaligned
         ;;
+ armv[67]*)
+ add_cflags -march=$cpu
+ enable fast_unaligned
+ ;;
         armv*)
             add_cflags -march=$cpu
         ;;

so it's effectively identical to --cpu=cortex*.

Revision history for this message
Loïc Minier (lool) wrote :

I'm considering dropping r19818 and r19819 as they change the size of an ABI struct (AFAIK) in dsputil.h; r19806 only adds specific implementations to dsputil.h so that's ok.

Revision history for this message
Loïc Minier (lool) wrote :

Shall we build the neon pass with --enable-armv6 and --enable-armv6t2 as well?

Revision history for this message
Mans Rullgard (mansr) wrote :

dsputil.h is not part of the public API, so changes there should not affect anything outside libavcodec.

There is no point in explicitly enabling ARM features (e.g. --enable-armv6) as these are automatically selected depending on the --cpu option. A NEON build should have everything enabled, since anything with NEON is also ARMv6[T2] compatible.

As for which changes to include, backporting the more extensive changes probably doesn't make much sense, at least not those depending on other changes to e.g. aac.c. Hopefully we'll make a new release (0.6?) at some point, which will include all these changes.

Revision history for this message
Loïc Minier (lool) wrote :

So latest tree I have fails with:
...
gcc -DHAVE_AV_CONFIG_H -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -I. -I"/build/buildd/ffmpeg-0.5+svn20090706" -D_ISOC99_SOURCE -D_POSIX_C_SOURCE=200112 -I/build/buildd/ffmpeg-0.5+svn20090706/debian/include -mfpu=neon -mfloat-abi=softfp -std=c99 -fomit-frame-pointer -pthread -I/usr/include/schroedinger-1.0 -I/usr/include/liboil-0.3 -g -Wdeclaration-after-statement -Wall -Wno-switch -Wdisabled-optimization -Wpointer-arith -Wredundant-decls -Wno-pointer-sign -Wcast-qual -Wwrite-strings -Wtype-limits -Wundef -O3 -fno-math-errno -fno-signed-zeros -c -o libavcodec/arm/dsputil_neon.o /build/buildd/ffmpeg-0.5+svn20090706/libavcodec/arm/dsputil_neon.c
gcc -DHAVE_AV_CONFIG_H -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -I. -I"/build/buildd/ffmpeg-0.5+svn20090706" -D_ISOC99_SOURCE -D_POSIX_C_SOURCE=200112 -I/build/buildd/ffmpeg-0.5+svn20090706/debian/include -mfpu=neon -mfloat-abi=softfp -std=c99 -fomit-frame-pointer -pthread -I/usr/include/schroedinger-1.0 -I/usr/include/liboil-0.3 -g -Wdeclaration-after-statement -Wall -Wno-switch -Wdisabled-optimization -Wpointer-arith -Wredundant-decls -Wno-pointer-sign -Wcast-qual -Wwrite-strings -Wtype-limits -Wundef -O3 -fno-math-errno -fno-signed-zeros -c -o libavcodec/arm/dsputil_neon_s.o /build/buildd/ffmpeg-0.5+svn20090706/libavcodec/arm/dsputil_neon_s.S
/build/buildd/ffmpeg-0.5+svn20090706/libavcodec/arm/dsputil_neon_s.S: Assembler messages:
/build/buildd/ffmpeg-0.5+svn20090706/libavcodec/arm/dsputil_neon_s.S:748: Error: bad instruction `vfp vdup.32 q8,d0[0]'
/build/buildd/ffmpeg-0.5+svn20090706/libavcodec/arm/dsputil_neon_s.S:749: Error: bad instruction `novfp vld1.32{d16[],d17[]},[sp,:32]'
/build/buildd/ffmpeg-0.5+svn20090706/libavcodec/arm/dsputil_neon_s.S:751: Error: bad instruction `vfp ldr lr,[sp,#12]'
/build/buildd/ffmpeg-0.5+svn20090706/libavcodec/arm/dsputil_neon_s.S:752: Error: bad instruction `novfp ldr lr,[sp,#16]'
make[1]: *** [libavcodec/arm/dsputil_neon_s.o] Error 1

That's in the neon pass which is currently configured with:
--disable-encoder=h263 --disable-encoder=h263p --disable-encoder=mpeg2video --disable-encoder=mpeg4 --disable-encoder=msmpeg4v1 --disable-encoder=msmpeg4v2 --disable-encoder=msmpeg4v3 --extra-version='4:0.5+svn20090706-2ubuntu1~arm2' --prefix=/usr --enable-avfilter --enable-avfilter-lavf --enable-vdpau --enable-bzlib --enable-libgsm --enable-libschroedinger --enable-libspeex --enable-libtheora --enable-libvorbis --enable-pthreads --enable-zlib --disable-stripping --disable-vhook --enable-gpl --enable-postproc --enable-swscale --enable-x11grab --enable-libdc1394 --extra-cflags="-I/build/buildd/ffmpeg-0.5+svn20090706/debian/include" --shlibdir=/usr/lib/neon/vfp --enable-neon --enable-armvfp --enable-shared --extra-cflags="-mfpu=neon -mfloat-abi=softfp" --disable-static --disable-ffmpeg --disable-ffplay

Perhaps I shouldn't have enabled armvfp?

Revision history for this message
Loïc Minier (lool) wrote :

Ah I forgot the --cpu=armv7-a I thought I had added.

Revision history for this message
Loïc Minier (lool) wrote :

Ok, I see your point with --cpu deciding how the other options will turn on/off. I think the Debian packaging is a bit paranoid about getting this or that feature right, but I guess it's ok for armel.

Revision history for this message
Loïc Minier (lool) wrote :

I simply used --cpu=armv6 for the shared pass and --cpu=armv7-a --extra-cflags="-mfpu=neon -mfloat-abi=softfp" for the neon one (float-abi is probably not required but I find it safer); the static one is probably broken, I mentioned in to Reinhard over email.

Revision history for this message
Mans Rullgard (mansr) wrote :

You're probably missing r19474.

Is the armv6 build intended to include VFP code? If so, you might need to add --extra-cflags="-mfpu=vfp -mfloat-abi=softfp" to the configure arguments.

Revision history for this message
Loïc Minier (lool) wrote :

gcc -DHAVE_AV_CONFIG_H -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -I. -I"/build/buildd/ffmpeg-0.5+svn20090706" -D_ISOC99_SOURCE -D_POSIX_C_SOURCE=200112 -I/build/buildd/ffmpeg-0.5+svn20090706/debian/include -mfpu=neon -mfloat-abi=softfp -std=c99 -fomit-frame-pointer -march=armv7-a -pthread -I/usr/include/schroedinger-1.0 -I/usr/include/liboil-0.3 -g -Wdeclaration-after-statement -Wall -Wno-switch -Wdisabled-optimization -Wpointer-arith -Wredundant-decls -Wno-pointer-sign -Wcast-qual -Wwrite-strings -Wtype-limits -Wundef -O3 -fno-math-errno -fno-signed-zeros -c -o libavutil/utils.o /build/buildd/ffmpeg-0.5+svn20090706/libavutil/utils.c
gcc -shared -Wl,-soname,libavutil.so.49 -Wl,-Bsymbolic-functions -Wl,--warn-common -Wl,--as-needed -Wl,-rpath-link,"/build/buildd/ffmpeg-0.5+svn20090706/debian-neon"/libpostproc -Wl,-rpath-link,"/build/buildd/ffmpeg-0.5+svn20090706/debian-neon"/libswscale -Wl,-rpath-link,"/build/buildd/ffmpeg-0.5+svn20090706/debian-neon"/libavfilter -Wl,-rpath-link,"/build/buildd/ffmpeg-0.5+svn20090706/debian-neon"/libavdevice -Wl,-rpath-link,"/build/buildd/ffmpeg-0.5+svn20090706/debian-neon"/libavformat -Wl,-rpath-link,"/build/buildd/ffmpeg-0.5+svn20090706/debian-neon"/libavcodec -Wl,-rpath-link,"/build/buildd/ffmpeg-0.5+svn20090706/debian-neon"/libavutil -Wl,-Bsymbolic -o libavutil/libavutil.so.49 libavutil/adler32.o libavutil/aes.o libavutil/avstring.o libavutil/base64.o libavutil/crc.o libavutil/des.o libavutil/fifo.o libavutil/intfloat_readwrite.o libavutil/lfg.o libavutil/lls.o libavutil/log.o libavutil/lzo.o libavutil/mathematics.o libavutil/md5.o libavutil/mem.o libavutil/random.o libavutil/rational.o libavutil/rc4.o libavutil/sha1.o libavutil/tree.o libavutil/utils.o -lz -lbz2 -pthread -lm -lgsm -lschroedinger-1.0 -lpthread -loil-0.3 -lm -lrt -lspeex -ltheora -logg -lvorbisenc -lvorbis -logg -ldc1394 -lraw1394 -lasound -ldl -lasound -lX11 -lXext -lasound
/usr/bin/ld: libavutil/aes.o: relocation R_ARM_MOVW_ABS_NC against `__stack_chk_guard' can not be used when making a shared object; recompile with -fPIC
libavutil/aes.o: could not read symbols: Bad value

It looks like aes.o is misbuilt, or can't build with the -fPIE stuff we're using by default; but I don't understand why that's NEON specific.

Revision history for this message
Loïc Minier (lool) wrote :

@Mans: yes, the shared build should have VFP enabled; our toolchain defaults to -mfpu=vfp -mfloat-abi=softfp. The shared build passed configure with:
ARCH arm (armv6)
version string suffix 4:0.5+svn20090706-2ubuntu1~arm3
big-endian no
ARMv5TE enabled yes
ARMv6 enabled yes
ARMv6T2 enabled no
ARM VFP enabled yes
IWMMXT enabled no
NEON enabled no
gprof enabled no
debug symbols yes
strip symbols no
optimizations yes
static no
shared yes

But I'll include r19474 as you suggest and pass -mfloat-abi+-mfpu in the shared build too for clarity.

Revision history for this message
Loïc Minier (lool) wrote :

Back to aes.o, this is the shared pass:
gcc -DHAVE_AV_CONFIG_H -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -I. -I"/build/buildd/ffmpeg-0.5+svn20090706" -D_ISOC99_SOURCE -D_POSIX_C_SOURCE=200112 -I/build/buildd/ffmpeg-0.5+svn20090706/debian/include -std=c99 -fomit-frame-pointer -march=armv6 -pthread -I/usr/include/schroedinger-1.0 -I/usr/include/liboil-0.3 -g -Wdeclaration-after-statement -Wall -Wno-switch -Wdisabled-optimization -Wpointer-arith -Wredundant-decls -Wno-pointer-sign -Wcast-qual -Wwrite-strings -Wtype-limits -Wundef -O3 -fno-math-errno -fno-signed-zeros -c -o libavutil/aes.o /build/buildd/ffmpeg-0.5+svn20090706/libavutil/aes.c

This is the neon pass:
gcc -DHAVE_AV_CONFIG_H -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -I. -I"/build/buildd/ffmpeg-0.5+svn20090706" -D_ISOC99_SOURCE -D_POSIX_C_SOURCE=200112 -I/build/buildd/ffmpeg-0.5+svn20090706/debian/include -mfpu=neon -mfloat-abi=softfp -std=c99 -fomit-frame-pointer -march=armv7-a -pthread -I/usr/include/schroedinger-1.0 -I/usr/include/liboil-0.3 -g -Wdeclaration-after-statement -Wall -Wno-switch -Wdisabled-optimization -Wpointer-arith -Wredundant-decls -Wno-pointer-sign -Wcast-qual -Wwrite-strings -Wtype-limits -Wundef -O3 -fno-math-errno -fno-signed-zeros -c -o libavutil/aes.o /build/buildd/ffmpeg-0.5+svn20090706/libavutil/aes.c

So it would seem -mfpu=neon does not generate PIC code in some cases; perhaps -fPIC is missing?

Revision history for this message
Loïc Minier (lool) wrote :

I can reproduce manually with

Revision history for this message
Loïc Minier (lool) wrote :

I can reproduce with a cross-compiler:
./configure --arch=arm --cpu=cortex-a8 --extra-cflags='-mfpu=vfp -mfloat-abi=softfp' --cross-prefix=arm-none-linux-gnueabi-
arm-none-linux-gnueabi-gcc -DHAVE_AV_CONFIG_H -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -I. -I"/build/buildd/ffmpeg-0.5+svn20090706" -D_ISOC99_SOURCE -D_POSIX_C_SOURCE=200112 -I/build/buildd/ffmpeg-0.5+svn20090706/debian/include -std=c99 -fomit-frame-pointer -march=armv6 -pthread -I/usr/include/schroedinger-1.0 -I/usr/include/liboil-0.3 -g -Wdeclaration-after-statement -Wall -Wno-switch -Wdisabled-optimization -Wpointer-arith -Wredundant-decls -Wno-pointer-sign -Wcast-qual -Wwrite-strings -Wtype-limits -Wundef -O3 -fno-math-errno -fno-signed-zeros -c -o libavutil/aes.o libavutil/aes.c

versus:
./configure --arch=arm --cpu=cortex-a8 --extra-cflags='-mfpu=neon -mfloat-abi=softfp' --cross-prefix=arm-none-linux-gnueabi-
arm-none-linux-gnueabi-gcc -DHAVE_AV_CONFIG_H -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -I. -I"/build/buildd/ffmpeg-0.5+svn20090706" -D_ISOC99_SOURCE -D_POSIX_C_SOURCE=200112 -I/build/buildd/ffmpeg-0.5+svn20090706/debian/include -mfpu=neon -mfloat-abi=softfp -std=c99 -fomit-frame-pointer -march=armv7-a -pthread -I/usr/include/schroedinger-1.0 -I/usr/include/liboil-0.3 -g -Wdeclaration-after-statement -Wall -Wno-switch -Wdisabled-optimization -Wpointer-arith -Wredundant-decls -Wno-pointer-sign -Wcast-qual -Wwrite-strings -Wtype-limits -Wundef -O3 -fno-math-errno -fno-signed-zeros -c -o libavutil/aes.o libavutil/aes.c

And arm-none-linux-gnueabi-readelf shows the problematic sections only in the neon case.

Revision history for this message
Loïc Minier (lool) wrote :

libavutil/aes.o.neon: file format elf32-littlearm

Disassembly of section .text:

00000000 <av_aes_init>:
       0: e92d4ff0 push {r4, r5, r6, r7, r8, r9, sl, fp, lr}
       4: e24ddb01 sub sp, sp, #1024 ; 0x400
       8: e24dd00c sub sp, sp, #12 ; 0xc
       c: e3005000 movw r5, #0 ; 0x0
                        c: R_ARM_MOVW_ABS_NC .LANCHOR0
      10: e3405000 movt r5, #0 ; 0x0
                        10: R_ARM_MOVT_ABS .LANCHOR0
      14: e58d1068 str r1, [sp, #104]
      18: e5951ffc ldr r1, [r5, #4092]
      1c: e58d5014 str r5, [sp, #20]
      20: e3510000 cmp r1, #0 ; 0x0
      24: e58d006c str r0, [sp, #108]
      28: e58d2064 str r2, [sp, #100]
      2c: e58d3060 str r3, [sp, #96]
      30: 1a00007e bne 230 <av_aes_init+0x230>
      34: e28dc098 add ip, sp, #152 ; 0x98
      38: e3a03001 mov r3, #1 ; 0x1
...
Disassembly of section .bss:

...
00001ff8 <.LANCHOR1>:
        ...
...

Revision history for this message
Loïc Minier (lool) wrote :

So the regular aes.o for neon has this in the output of arm-none-linux-gnueabi-readelf -a libavutil/aes.o.neon:
...
Relocation section '.rel.text' at offset 0x3a64 contains 30 entries:
 Offset Info Type Sym.Value Sym. Name
0000000c 0000092b R_ARM_MOVW_ABS_NC 00000000 .LANCHOR0
00000010 0000092c R_ARM_MOVT_ABS 00000000 .LANCHOR0
000000a0 00000a2b R_ARM_MOVW_ABS_NC 00001ff8 .LANCHOR1
000000a8 00000a2c R_ARM_MOVT_ABS 00001ff8 .LANCHOR1
00000100 00000b2b R_ARM_MOVW_ABS_NC 00000000 .LANCHOR2
00000104 00000b2c R_ARM_MOVT_ABS 00000000 .LANCHOR2
0000019c 0000092b R_ARM_MOVW_ABS_NC 00000000 .LANCHOR0
...

I prepended -fPIC to the gcc flags and now the reltext table is much smaller and says:
Relocation section '.rel.text' at offset 0x3cbc contains 13 entries:
 Offset Info Type Sym.Value Sym. Name
000002c8 0000271b R_ARM_PLT32 00000000 memcpy
00000320 0000271b R_ARM_PLT32 00000000 memcpy
0000072c 0000271b R_ARM_PLT32 00000000 memcpy
00000940 0000271b R_ARM_PLT32 00000000 memcpy
00000968 00002819 R_ARM_BASE_PREL 00000000 _GLOBAL_OFFSET_TABLE_
0000096c 00001618 R_ARM_GOTOFF32 00000000 .LANCHOR0
00000970 00001718 R_ARM_GOTOFF32 00001ff8 .LANCHOR1
00000974 00001118 R_ARM_GOTOFF32 00000000 .LANCHOR2
00000d70 0000271b R_ARM_PLT32 00000000 memcpy
00001050 0000271b R_ARM_PLT32 00000000 memcpy
000010b0 00002819 R_ARM_BASE_PREL 00000000 _GLOBAL_OFFSET_TABLE_
000010b4 00001718 R_ARM_GOTOFF32 00001ff8 .LANCHOR1
000010b8 00001618 R_ARM_GOTOFF32 00000000 .LANCHOR0
(that's the full table)

Given that I don't see any _ABS thingies, I suspect it is good.

Revision history for this message
Reinhard Tartler (siretart) wrote :

Mans Rullgard <email address hidden> writes:

> dsputil.h is not part of the public API, so changes there should not
> affect anything outside libavcodec.

Unfortunately mplayer does #inlcude dsputil.h, and we carefully need to
make sure that changes to that file do not destabilize mplayer. I've
also checked gstreamer0.10-ffmpeg, but that package does not seem to use
dsputil.h.

If in doubt, we should probably reupload mplayer with the changed
dsputil.h together with the new mplayer.

--
Gruesse/greetings,
Reinhard Tartler, KeyID 945348A4

Revision history for this message
Mans Rullgard (mansr) wrote :

The R_ARM_MOVW_ABS_NC errors are caused by the linkers (static and dynamic) not handling that type of relocation. Only gcc 4.4 and codesourcery 2009q1 generate the movw/movt pairs that result in this relocation entry, so it used to be possible to build non-pic shared libraries.

If you pull in r19672, you can use --enable-pic to force PIC. If you also include r20203, PIC will be enabled automatically for ARM shared libs.

Revision history for this message
Loïc Minier (lool) wrote :

@Mans, thanks for fixing this upstream; I just passed -fPIC -DPIC in extra flags as that was the case for other optimised passes already.

armel and amd64 packages are now available in my PPA, ppa:lool/ppa.

https://launchpad.net/~lool/+archive/ppa

Git repo/branch: git.debian.org:/git/users/lool/pkg-multimedia/ffmpeg.git karmic-armel-opts

The cherry-picked / ported patches are in the karmic-armel-opts-neon branch.

Revision history for this message
Loïc Minier (lool) wrote :

So folks who want to test the PPA packages sadlly have to manually select the proper version for the packages built by the ffmpeg source; 4:0.5+svn20090706-2ubuntu1~arm4 should be used; you can check version on the installed packages with:
grep-status -F Status 'install ok installed' -a -F Source ffmpeg -s Package -s Version

(dctrl-tools package for grep-status)

Revision history for this message
Loïc Minier (lool) wrote :

dsputil.h: not sure it's a good enough test but I played on amd64 an ogv video fine with both the new ffmpeg and with mplayer installed allong the new ffmpeg libs. I need to test the NEON version and test with more codecs when I get home; any recommendations of the most relevant tests here?

Thanks!

Revision history for this message
Dave Martin (dave-martin-arm) wrote :

I did a brief test of your build, Loïc— seems to work OK on a NEON-enabled platform.
I didn't test exhaustively, but at least MPEG2 and H.264 (mp4) appear to work well. I couldn't test audio because there's some other problem with my test platform.

I haven't successfully played any naturally-occurring Theora video: I always get the following error:
[ogg @ <address>]Too old or unsupported Theora (0)
could not open codecs

Is it just that I'm trying to play an obsolete format, or is there some other problem? (So far as I know, there's nothing ARM-specific going on here.)

If anyone can chuck me a link to some playable files for codecs you're interested in, I can try and play them.

Revision history for this message
Dave Martin (dave-martin-arm) wrote :

Extra note: I do get a good speedup when playing video using ffplay, when using the NEON libraries versus the standard (VFP) libraries.

I don't have quantative benchmarks right now.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ffmpeg - 4:0.5+svn20090706-2ubuntu2

---------------
ffmpeg (4:0.5+svn20090706-2ubuntu2) karmic; urgency=low

  [ Reinhard Tartler ]
  * Make arguments of av_set_pts_info() unsigned.
  * update debian/changelog
  * use patch for issue1245 from git.ffmpeg.org
  * Support constant-quant encoding for libtheora, LP: #356322
  * increase swscale compile time width (VOF/VOFW), LP: #443264

  [ Loïc Minier ]
  * Update config for karmic's armel toolchain.
  * Enable neon flavour; LP: #383240.
  * Update NEON confflags to assume v7 and VFP.
  * Add backported NEON patches from ffmpeg trunk; see debian/patches/neon/.
  * Pass proper --cpu and --extra-flags on armel.
  * Pass -fPIC -DPIC to neon pass.

 -- Loic Minier <email address hidden> Tue, 13 Oct 2009 23:56:04 +0200

Changed in ffmpeg (Ubuntu Karmic):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.