[FFmpeg-devel] [aarch64] improve performance of ff_hscale_8_to_15_neon

Ronald S. Bultje rsbultje at gmail.com
Thu Nov 28 09:31:01 EET 2019


Hi,

On Thu, Nov 28, 2019 at 2:08 AM Ronald S. Bultje <rsbultje at gmail.com> wrote:

> Hi,
>
> On Wed, Nov 27, 2019 at 3:28 PM Sebastian Pop <sebpop at gmail.com> wrote:
>
>> On Wed, Nov 27, 2019 at 2:13 PM Clément Bœsch <u at pkh.me> wrote:
>> > Yeah I will by the end of the week. I wrote that a few years ago so I
>> need
>> > to take some time to get back in the context.
>>
>> Thanks Clément for your help.
>>
>> >
>> > BTW, that's quite a huge speed improvement you're bringing in, are you
>> > sure you are always allowed to read up to filter[3]?
>>
>> I will check.
>> Otherwise we can version the code and keep the existing code along for
>> vector factor 2.
>
>
> utils.c allocates h{Chr,Lum}Filter and they appear to be padded.
>

Figure I should be more specific heresince there's multiple allocation
paths. I mean this one:

    // NOTE: the +3 is for the MMX(+1) / SSE(+3) scaler which reads over
the end
    FF_ALLOC_ARRAY_OR_GOTO(NULL, *filterPos, (dstW + 3),
sizeof(**filterPos), fail);

Ronald


More information about the ffmpeg-devel mailing list