[FFmpeg-devel] DSP function ARM NEON patches for hevc
Michael Niedermayer
michaelni at gmx.at
Wed Feb 25 18:38:50 CET 2015
On Wed, Feb 25, 2015 at 11:12:23AM +0000, Tomperi Seppo wrote:
> 17/02/15 12:44, "Michael Niedermayer" <michael at niedermayer.cc>:
>
> >On Tue, Feb 17, 2015 at 07:33:04AM +0000, Tomperi Seppo wrote:
> >>
> >> > On 16 Feb 2015, at 19:54, Michael Niedermayer
> >><michael at niedermayer.cc> wrote:
> >> >
> >> > On Mon, Feb 16, 2015 at 12:47:36PM +0000, Tomperi Seppo wrote:
> >> >> More NEON optimizations for testing. fate-hevc passes on Tegra K1,
> >>but these haven't been tested for NEON clobbering.
> >> >>
> >> >> -Seppo
> >> >>
> >> >> ________________________________________
> >> >> From: Tomperi Seppo
> >> >> Sent: Monday, February 16, 2015 1:30 PM
> >> >> To: Michael Niedermayer
> >> >> Cc: Michael Niedermayer; FFmpeg development discussions and patches;
> >>Mickaël Raulet
> >> >> Subject: RE: [FFmpeg-devel] DSP function ARM NEON patches for hevc
> >> >>
> >> >> Hi Michael,
> >> >>
> >> >> Here is a totally shot in a dark fix attempt for NEON register
> >>clobbering for deblocking. Could you test it with qemu and check if it
> >>works.
> >> >>
> >> >>
> >> >> -Seppo
> >> >>
> >> >> ________________________________________
> >> >> From: Michael Niedermayer [michael at niedermayer.cc]
> >> >> Sent: Monday, February 16, 2015 3:28 AM
> >> >> To: Tomperi Seppo
> >> >> Cc: Michael Niedermayer; FFmpeg development discussions and patches;
> >>Mickaël Raulet
> >> >> Subject: Re: [FFmpeg-devel] DSP function ARM NEON patches for hevc
> >> >>
> >> >> Hi
> >> >>
> >> >> On Sun, Feb 15, 2015 at 08:31:32PM +0000, Tomperi Seppo wrote:
> >> >>> Hi!
> >> >>>
> >> >>> The reason is chroma deblocking which is using q4 without pushing
> >>it to stack. :/
> >> >>> Unfortunately I am in Geneve this week and don't have ARM linux
> >>board with me so it is not easy to test.
> >> >>>
> >> >>> Mickael Raulet: maybe guys at INSA could run tests this week if I
> >>make a fix? Could you ask?
> >> >>
> >> >> If they cant, then i probably can test it too if its a patch which
> >> >> applies cleanly to ffmpeg and testing fate-hevc with
> >> >> --enable-neon-clobber-test under qemu is what is needed
> >> >> i could test on a arm board too if needed
> >> >>
> >> >>
> >> >>>
> >> >>> I also have SAO, qpel and epel NEON patches for latest FFmpeg. They
> >>pass fate-hevc on Jetson TK1, but should be iOS and clobber checked.
> >> >>>
> >> >>> -Seppo
> >> >>>
> >> >>>
> >> >>> ________________________________________
> >> >>> From: Michael Niedermayer [michaelni at gmx.at]
> >> >>> Sent: Friday, February 13, 2015 5:38 PM
> >> >>> To: FFmpeg development discussions and patches
> >> >>> Cc: Tomperi Seppo; Mickaël Raulet
> >> >>> Subject: Re: [FFmpeg-devel] DSP function ARM NEON patches for hevc
> >> >>>
> >> >>> On Thu, Feb 05, 2015 at 02:22:28PM +0100, Mickaël Raulet wrote:
> >> >>>> Michael,
> >> >>>>
> >> >>>> Please find some commits that can be cherry picked from
> >> >>>> https://github.com/OpenHEVC/FFmpeg/commits/ffmpeg_patch
> >> >>>>
> >> >>>
> >> >>>> Optimized deblocking filter (8bits only)
> >> >>>> 1b9ee47d2f43b0a029a9468233626102eb1473b8
> >> >>>
> >> >>> this breaks the neon clobber test see:
> >> >>>
> >>fate.ffmpeg.org/report.cgi?time=20150211030204&slot=armv7l-panda-gcc4.6-c
> >>ortexa8-clobber
> >> >>>
> >> >>> [...]
> >> >>> --
> >> >>> Michael GnuPG fingerprint:
> >>9FF2128B147EF6730BADF133611EC787040B0FAB
> >> >>>
> >> >>> The worst form of inequality is to try to make unequal things equal.
> >> >>> -- Aristotle
> >> >>>
> >> >>
> >> >> --
> >> >> Michael GnuPG fingerprint:
> >>9FF2128B147EF6730BADF133611EC787040B0FAB
> >> >>
> >> >> Opposition brings concord. Out of discord comes the fairest harmony.
> >> >> -- Heraclitus
> >> >
> >> >> Makefile | 3
> >> >> hevcdsp_init_neon.c | 159 ++++++++
> >> >> hevcdsp_qpel_neon.S | 999
> >>++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> >> 3 files changed, 1160 insertions(+), 1 deletion(-)
> >> >> 9fb0b3c33edf085845b7a0fba3ca77d1ba55dd6c
> >>0001-hevcdsp-ARM-NEON-optimized-qpel-functions.patch
> >> >> From ce06cb2bea4b051995608b11651b185e7a825a4c Mon Sep 17 00:00:00
> >>2001
> >> >> From: Seppo Tomperi <seppo.tomperi at vtt.fi>
> >> >> Date: Wed, 11 Feb 2015 10:20:26 +0000
> >> >> Subject: [PATCH] hevcdsp: ARM NEON optimized qpel functions
> >> >>
> >> >> ---
> >> >> libavcodec/arm/Makefile | 3 +-
> >> >> libavcodec/arm/hevcdsp_init_neon.c | 159 ++++++
> >> >> libavcodec/arm/hevcdsp_qpel_neon.S | 999
> >>+++++++++++++++++++++++++++++++++++++
> >> >> 3 files changed, 1160 insertions(+), 1 deletion(-)
> >> >> create mode 100644 libavcodec/arm/hevcdsp_qpel_neon.S
> >> >
> >> >
> >> > seems to fail building:
> >> >
> >> > libavformat/utils.o
> >> > CC libavcodec/arm/hevcdsp_init_neon.o
> >> > AS libavcodec/arm/hevcdsp_qpel_neon.o
> >> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S: Assembler messages:
> >> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:992: Error: expected } --
> >>`vld1.32 {d0[0]d0[1]d1[0]d1[1]},[r2],r3'
> >> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:992: Error: Neon double or
> >>quad precision register expected -- `vld1.32 {},[r2],r3'
> >> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:992: Error: Neon double or
> >>quad precision register expected -- `vld1.32 {},[r2],r3'
> >> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:992: Error: Neon double or
> >>quad precision register expected -- `vld1.32 {},[r2],r3'
> >> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:992: Error: expected } --
> >>`vst1.32 {d0[0]d0[1]d1[0]d1[1]},[r0],r1'
> >> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:992: Error: Neon double or
> >>quad precision register expected -- `vst1.32 {},[r0],r1'
> >> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:992: Error: Neon double or
> >>quad precision register expected -- `vst1.32 {},[r0],r1'
> >> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:992: Error: Neon double or
> >>quad precision register expected -- `vst1.32 {},[r0],r1'
> >> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:994: Error: expected } --
> >>`vld1.32 {d1[0]d2},[r2]'
> >> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:994: Error: Neon double or
> >>quad precision register expected -- `vld1.32 {},[r2]'
> >> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:994: Error: expected } --
> >>`vst1.32 {d1[0]d2},[r0]'
> >> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:994: Error: Neon double or
> >>quad precision register expected -- `vst1.32 {},[r0]'
> >> > make: *** [libavcodec/arm/hevcdsp_qpel_neon.o] Error 1
> >> > make: *** Waiting for unfinished jobs....
> >> >
> >> >
> >>
> >> These macros compiled for me with Jetson TK1 toolchain and with latest
> >>GAS preprocessor, so I thought they are finally ok.
> >> But it looks like passing register lists to macros is not handled well
> >>by all preprocessors.
> >
> >plain "arm-linux-gnueabi-gcc-4.5 (Ubuntu/Linaro 4.5.3-12ubuntu2) 4.5.3"
> >here, with no preprocessor
> >
> >
> >>
> >> These are quite simple functions copying varying width blocks of pixels
> >>using NEON. I could either write out the macros (lots of almost
> >>identical functions) or leave the optimisation out totally for now. Or
> >>do you have any other ideas?
> >
> >the following seems to fix it, but i sure do not know why these 2
> >lines failed while the others do not seem to fail
> >adding , to all works as well
> >
> >diff --git a/libavcodec/arm/hevcdsp_qpel_neon.S
> >b/libavcodec/arm/hevcdsp_qpel_neon.S
> >index 14116a6..7b0df2e 100644
> >--- a/libavcodec/arm/hevcdsp_qpel_neon.S
> >+++ b/libavcodec/arm/hevcdsp_qpel_neon.S
> >@@ -989,9 +989,9 @@ function
> >ff_hevc_put_qpel_uw_pixels_w\width\()_neon_8, export=1
> > endfunc
> > .endm
> >
> >-put_qpel_uw_pixels 4 d0[0] d0[1] d1[0] d1[1]
> >+put_qpel_uw_pixels 4 d0[0], d0[1], d1[0], d1[1]
> > put_qpel_uw_pixels 8 d0 d1 d2 d3
> >-put_qpel_uw_pixels_m 12 d0 d1[0] d2 d3[0]
> >+put_qpel_uw_pixels_m 12 d0, d1[0], d2, d3[0]
> > put_qpel_uw_pixels 16 q0 q1 q2 q3
> > put_qpel_uw_pixels 24 d0-d2 d3-d5 d16-d18 d19-d21
> > put_qpel_uw_pixels 32 q0-q1 q2-q3 q8-q9 q10-q11
> >
> >[...]
>
> Same patch, but with comma separators for these macros.
applied
thanks
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
The worst form of inequality is to try to make unequal things equal.
-- Aristotle
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 181 bytes
Desc: Digital signature
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20150225/18bf6c1a/attachment.asc>
More information about the ffmpeg-devel
mailing list