[FFmpeg-devel] DSP function ARM NEON patches for hevc

Tomperi Seppo Seppo.Tomperi at vtt.fi
Wed Feb 25 12:12:23 CET 2015


17/02/15 12:44, "Michael Niedermayer" <michael at niedermayer.cc>:

>On Tue, Feb 17, 2015 at 07:33:04AM +0000, Tomperi Seppo wrote:
>> 
>> > On 16 Feb 2015, at 19:54, Michael Niedermayer
>><michael at niedermayer.cc> wrote:
>> > 
>> > On Mon, Feb 16, 2015 at 12:47:36PM +0000, Tomperi Seppo wrote:
>> >> More NEON optimizations for testing. fate-hevc passes on Tegra K1,
>>but these haven't been tested for NEON clobbering.
>> >> 
>> >> -Seppo
>> >> 
>> >> ________________________________________
>> >> From: Tomperi Seppo
>> >> Sent: Monday, February 16, 2015 1:30 PM
>> >> To: Michael Niedermayer
>> >> Cc: Michael Niedermayer; FFmpeg development discussions and patches;
>>Mickaël Raulet
>> >> Subject: RE: [FFmpeg-devel]  DSP function ARM NEON patches for hevc
>> >> 
>> >> Hi Michael,
>> >> 
>> >> Here is a totally shot in a dark fix attempt for NEON register
>>clobbering for deblocking. Could you test it with qemu and check if it
>>works.
>> >> 
>> >> 
>> >> -Seppo
>> >> 
>> >> ________________________________________
>> >> From: Michael Niedermayer [michael at niedermayer.cc]
>> >> Sent: Monday, February 16, 2015 3:28 AM
>> >> To: Tomperi Seppo
>> >> Cc: Michael Niedermayer; FFmpeg development discussions and patches;
>>Mickaël Raulet
>> >> Subject: Re: [FFmpeg-devel]  DSP function ARM NEON patches for hevc
>> >> 
>> >> Hi
>> >> 
>> >> On Sun, Feb 15, 2015 at 08:31:32PM +0000, Tomperi Seppo wrote:
>> >>> Hi!
>> >>> 
>> >>> The reason is chroma deblocking which is using q4 without pushing
>>it to stack. :/
>> >>> Unfortunately I am in Geneve this week and don't have ARM linux
>>board with me so it is not easy to test.
>> >>> 
>> >>> Mickael Raulet: maybe guys at INSA could run tests this week if I
>>make a fix? Could you ask?
>> >> 
>> >> If they cant, then i probably can test it too if its a patch which
>> >> applies cleanly to ffmpeg and testing fate-hevc with
>> >> --enable-neon-clobber-test under qemu is what is needed
>> >> i could test on a arm board too if needed
>> >> 
>> >> 
>> >>> 
>> >>> I also have SAO, qpel and epel NEON patches for latest FFmpeg. They
>>pass fate-hevc on Jetson TK1, but should be iOS and clobber checked.
>> >>> 
>> >>> -Seppo
>> >>> 
>> >>> 
>> >>> ________________________________________
>> >>> From: Michael Niedermayer [michaelni at gmx.at]
>> >>> Sent: Friday, February 13, 2015 5:38 PM
>> >>> To: FFmpeg development discussions and patches
>> >>> Cc: Tomperi Seppo; Mickaël Raulet
>> >>> Subject: Re: [FFmpeg-devel]  DSP function ARM NEON patches for hevc
>> >>> 
>> >>> On Thu, Feb 05, 2015 at 02:22:28PM +0100, Mickaël Raulet wrote:
>> >>>> Michael,
>> >>>> 
>> >>>> Please find some commits that can be cherry picked from
>> >>>> https://github.com/OpenHEVC/FFmpeg/commits/ffmpeg_patch
>> >>>> 
>> >>> 
>> >>>> Optimized deblocking filter (8bits only)
>> >>>> 1b9ee47d2f43b0a029a9468233626102eb1473b8
>> >>> 
>> >>> this breaks the neon clobber test see:
>> >>> 
>>fate.ffmpeg.org/report.cgi?time=20150211030204&slot=armv7l-panda-gcc4.6-c
>>ortexa8-clobber
>> >>> 
>> >>> [...]
>> >>> --
>> >>> Michael     GnuPG fingerprint:
>>9FF2128B147EF6730BADF133611EC787040B0FAB
>> >>> 
>> >>> The worst form of inequality is to try to make unequal things equal.
>> >>> -- Aristotle
>> >>> 
>> >> 
>> >> --
>> >> Michael     GnuPG fingerprint:
>>9FF2128B147EF6730BADF133611EC787040B0FAB
>> >> 
>> >> Opposition brings concord. Out of discord comes the fairest harmony.
>> >> -- Heraclitus
>> > 
>> >> Makefile            |    3
>> >> hevcdsp_init_neon.c |  159 ++++++++
>> >> hevcdsp_qpel_neon.S |  999
>>++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >> 3 files changed, 1160 insertions(+), 1 deletion(-)
>> >> 9fb0b3c33edf085845b7a0fba3ca77d1ba55dd6c
>>0001-hevcdsp-ARM-NEON-optimized-qpel-functions.patch
>> >> From ce06cb2bea4b051995608b11651b185e7a825a4c Mon Sep 17 00:00:00
>>2001
>> >> From: Seppo Tomperi <seppo.tomperi at vtt.fi>
>> >> Date: Wed, 11 Feb 2015 10:20:26 +0000
>> >> Subject: [PATCH] hevcdsp: ARM NEON optimized qpel functions
>> >> 
>> >> ---
>> >> libavcodec/arm/Makefile            |   3 +-
>> >> libavcodec/arm/hevcdsp_init_neon.c | 159 ++++++
>> >> libavcodec/arm/hevcdsp_qpel_neon.S | 999
>>+++++++++++++++++++++++++++++++++++++
>> >> 3 files changed, 1160 insertions(+), 1 deletion(-)
>> >> create mode 100644 libavcodec/arm/hevcdsp_qpel_neon.S
>> > 
>> > 
>> > seems to fail building:
>> > 
>> >        libavformat/utils.o
>> > CC      libavcodec/arm/hevcdsp_init_neon.o
>> > AS      libavcodec/arm/hevcdsp_qpel_neon.o
>> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S: Assembler messages:
>> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:992: Error: expected } --
>>`vld1.32 {d0[0]d0[1]d1[0]d1[1]},[r2],r3'
>> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:992: Error: Neon double or
>>quad precision register expected -- `vld1.32 {},[r2],r3'
>> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:992: Error: Neon double or
>>quad precision register expected -- `vld1.32 {},[r2],r3'
>> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:992: Error: Neon double or
>>quad precision register expected -- `vld1.32 {},[r2],r3'
>> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:992: Error: expected } --
>>`vst1.32 {d0[0]d0[1]d1[0]d1[1]},[r0],r1'
>> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:992: Error: Neon double or
>>quad precision register expected -- `vst1.32 {},[r0],r1'
>> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:992: Error: Neon double or
>>quad precision register expected -- `vst1.32 {},[r0],r1'
>> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:992: Error: Neon double or
>>quad precision register expected -- `vst1.32 {},[r0],r1'
>> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:994: Error: expected } --
>>`vld1.32 {d1[0]d2},[r2]'
>> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:994: Error: Neon double or
>>quad precision register expected -- `vld1.32 {},[r2]'
>> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:994: Error: expected } --
>>`vst1.32 {d1[0]d2},[r0]'
>> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:994: Error: Neon double or
>>quad precision register expected -- `vst1.32 {},[r0]'
>> > make: *** [libavcodec/arm/hevcdsp_qpel_neon.o] Error 1
>> > make: *** Waiting for unfinished jobs....
>> > 
>> > 
>> 
>> These macros compiled for me with Jetson TK1 toolchain and with latest
>>GAS preprocessor, so I thought they are finally ok.
>> But it looks like passing register lists to macros is not handled well
>>by all preprocessors.
>
>plain "arm-linux-gnueabi-gcc-4.5 (Ubuntu/Linaro 4.5.3-12ubuntu2) 4.5.3"
>here, with no preprocessor
>
>
>> 
>> These are quite simple functions copying varying width blocks of pixels
>>using NEON. I could either write out the macros (lots of almost
>>identical functions) or leave the optimisation out totally for now. Or
>>do you have any other ideas?
>
>the following seems to fix it, but i sure do not know why these 2
>lines failed while the others do not seem to fail
>adding , to all works as well
>
>diff --git a/libavcodec/arm/hevcdsp_qpel_neon.S
>b/libavcodec/arm/hevcdsp_qpel_neon.S
>index 14116a6..7b0df2e 100644
>--- a/libavcodec/arm/hevcdsp_qpel_neon.S
>+++ b/libavcodec/arm/hevcdsp_qpel_neon.S
>@@ -989,9 +989,9 @@ function
>ff_hevc_put_qpel_uw_pixels_w\width\()_neon_8, export=1
> endfunc
> .endm
>
>-put_qpel_uw_pixels    4 d0[0] d0[1] d1[0] d1[1]
>+put_qpel_uw_pixels    4 d0[0], d0[1], d1[0], d1[1]
> put_qpel_uw_pixels    8 d0 d1 d2 d3
>-put_qpel_uw_pixels_m 12 d0 d1[0] d2 d3[0]
>+put_qpel_uw_pixels_m 12 d0, d1[0], d2, d3[0]
> put_qpel_uw_pixels   16 q0 q1 q2 q3
> put_qpel_uw_pixels   24 d0-d2 d3-d5 d16-d18 d19-d21
> put_qpel_uw_pixels   32 q0-q1 q2-q3 q8-q9 q10-q11
>
>[...]

Same patch, but with comma separators for these macros.

-Seppo Tomperi

>
>-- 
>Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
>Awnsering whenever a program halts or runs forever is
>On a turing machine, in general impossible (turings halting problem).
>On any real computer, always possible as a real computer has a finite
>number
>of states N, and will either halt in less than N cycles or never halt.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-hevcdsp-ARM-NEON-optimized-qpel-functions.patch
Type: application/octet-stream
Size: 40851 bytes
Desc: 0001-hevcdsp-ARM-NEON-optimized-qpel-functions.patch
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20150225/49a30fb5/attachment.obj>


More information about the ffmpeg-devel mailing list