[FFmpeg-devel] [PATCH] swscale/arm: add yuv2planeX_8_neon
Matthieu Bouron
matthieu.bouron at gmail.com
Mon Apr 11 16:18:04 CEST 2016
On Mon, Apr 11, 2016 at 9:58 AM, Benoit Fouet <benoit.fouet at free.fr> wrote:
> Hi,
>
> (again, thanks to both of you for documenting all this assembly /NEON code)
>
> On 09/04/2016 10:22, Matthieu Bouron wrote:
>
>> From: Matthieu Bouron <matthieu.bouron at stupeflix.com>
>>
>> ---
>>
>> Hello,
>>
>> The following patch add yuv2planeX_8_neon function for the arm platform.
>> It is
>> currently restricted to 8-bit per component sources until I fix fate
>> issues
>> with 10-bit sources (the dnxhd-*-10bit tests fail but I haven't figured
>> out yet
>> where it comes from).
>>
>> Matthieu
>>
>> ---
>> libswscale/arm/Makefile | 1 +
>> libswscale/arm/output.S | 78
>> ++++++++++++++++++++++++++++++++++++++++++++++++
>> libswscale/arm/swscale.c | 7 +++++
>> libswscale/utils.c | 3 +-
>> 4 files changed, 88 insertions(+), 1 deletion(-)
>> create mode 100644 libswscale/arm/output.S
>>
>> [...]
>>
>> diff --git a/libswscale/arm/output.S b/libswscale/arm/output.S
>> new file mode 100644
>> index 0000000..4437447
>> --- /dev/null
>> +++ b/libswscale/arm/output.S
>> @@ -0,0 +1,78 @@
>>
>
> [...]
>
>
> +function ff_yuv2planeX_8_neon, export=1
>> + push {r4-r12, lr}
>> + vpush {q4-q7}
>> + ldr r4, [sp, #104] @
>> dstW
>> + ldr r5, [sp, #108] @
>> dither
>> + ldr r6, [sp, #112] @
>> offset
>> + vld1.8 {d0}, [r5] @
>> load 8x8-bit dither values
>> + tst r6, #0 @
>> check offsetting which can be 0 or 3 only
>> + beq 1f
>> + vext.u8 d0, d0, d0, #3 @
>> honor offseting which can be 3 only
>> +1: vmovl.u8 q0, d0 @
>> extend dither to 16-bit
>> + vshll.u16 q1, d0, #12 @
>> extend dither to 32-bit with left shift by 12 (part 1)
>> + vshll.u16 q2, d1, #12 @
>> extend dither to 32-bit with left shift by 12 (part 2)
>> + mov r7, #0 @
>> i = 0
>> +2: vmov.u8 q3, q1 @
>> initialize accumulator with dithering values (part 1)
>> + vmov.u8 q4, q2 @
>> initialize accumulator with dithering values (part 2)
>> + mov r8, r1 @
>> tmpFilterSize = filterSize
>> + mov r9, r2 @
>> srcp
>> + mov r10, r0 @
>> filterp
>> +3: ldr r11, [r9], #4 @
>> get pointer @ src[j]
>> + ldr r12, [r9], #4 @
>> get pointer @ src[j+1]
>> + add r11, r11, r7, lsl #1 @
>> &src[j][i]
>> + add r12, r12, r7, lsl #1 @
>> &src[j+1][i]
>> + vld1.16 {q5}, [r11] @
>> read 8x16-bit @ src[j ][i + {0..7}]: A,B,C,D,E,F,G,H
>> + vld1.16 {q6}, [r12] @
>> read 8x16-bit @ src[j+1][i + {0..7}]: I,J,K,L,M,N,O,P
>> + ldr r11, [r10], #4 @
>> read 2x16-bit coeffs (X, Y) at (filter[j], filter[j+1])
>> + vmov.16 q7, q5 @
>> copy 8x16-bit @ src[j ][i + {0..7}] for following inplace zip instruction
>> + vmov.16 q8, q6 @
>> copy 8x16-bit @ src[j+1][i + {0..7}] for following inplace zip instruction
>> + vzip.16 q7, q8 @
>> A,I,B,J,C,K,D,L,E,M,F,N,G,O,H,L
>>
>
> nit: O,H,P
Fixed.
Patch updated fixing fate issues with 10-bit sources (the code was not
honoring offsetting: tst r6, #0 has been replaced with cmp r6, #0).
If there is no objection, I will push the patch in the next hours.
Thanks for the review,
Matthieu
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-swscale-arm-add-yuv2planeX_8_neon.patch
Type: text/x-patch
Size: 8728 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20160411/ad83ff87/attachment.bin>
More information about the ffmpeg-devel
mailing list