[FFmpeg-devel] [PATCH] Add support for "omp simd" pragma.

Tue Jan 12 22:46:45 EET 2021

Jan 12, 2021, 19:28 by Reimar.Doeffinger at gmx.de:

>>
>> On 10 Jan 2021, at 19:55, Lynne <dev at lynne.ee> wrote:
>>
>> Jan 10, 2021, 17:43 by Reimar.Doeffinger at gmx.de:
>>
>>> From: Reimar Döffinger <Reimar.Doeffinger at gmx.de>
>>>
>>> real    0m15.040s
>>> user    0m18.874s (80.7% of original)
>>> sys     0m0.168s
>>>
>>
>> I think I have to disagree.
>> The performance gains are marginal,
>>
>
> It’s almost 20%. At least for this combination of
> codec and stream a large amount of time is spend in
> non-DSP functions, so even hand-written assembler
> won’t give you huge gains.
>
It's non-guaranteed 20% on a single system. It could change, and it could very
well mess up like gcc does with autovectorization, which we still explicitly disable
because FATE fails (-fno-tree-vectorize, and I was the one who sent an RFC to
try to undo it somewhat recently. Even though it was an RFC the reaction from devs
was quite cold).


>> its definitely something the compiler should
>> be able to decide on its own,
>>
>
> So you object to unlikely() macros as well?
> It’s really just giving the compiler a hint it should try, though I admit the configure part makes it
> look otherwise.
>
I'm more against the macro and changes to the code itself. If you can make it
work without adding a macro to individual loops or the likes of av_cold/av_hot or
any other changes to the code, I'll be more welcoming.
I really _hate_ compiler hints. Take a look at the upipe source code to see what
a cthulian monstrosity made of hint flags looks like. Every single branch had
a cold/hot macro and it was the project's coding style. It's completely irredeemable.


>> Most of the loops this is added to are trivially SIMDable.
>>
>
> How many hours of effort do you consider “trivial”?
> Especially if it’s someone not experienced?
> It might be fairly trivial with intrinsics, however
> many of your counter-arguments also apply
> to intrinsics (and to a degree inline assembly).
> That’s btw not just a rhetorical question because
> I’m pretty sure I am not going to all the trouble
> to port more of the arm 32-bit assembler functions
> since it’s a huge PITA, and I was wondering if there
> was a point to even have a try with intrinsics...
>
Intrinsics and inline assembly are a whole different thing than magic
macros that tell and force the compiler what a well written compiler
should already very well know about.


>> Just because no one has
>> had the motivation to do SIMD for a pretty unpopular codec doesn't mean we should
>> compromise.
>>
>
> If you think of AArch64 specifically, I can
> kind of agree.
> However I wouldn’t say the word “compromise”
> is appropriate when there’s a good chance nothing
> better will ever come to exist.
> But the real point is not AArch64, that is just
> a very convenient test platform.
> The point is to raise the minimum bar.
> A new architecture, RISC-V for example or something
> else should not be stuck at scalar performance
> until someone actually gets around to implementing
> assembler optimizations.
> And just to be clear: I don’t actually care about
> HEVC, it just seemed a nice target to do some
> experiments.
>
I already said all that can be said here: this will halt efforts on actually
optimizing the code in exchange for naive trust in compilers.
New platforms will be stuck at scalar performance anyway until
the compilers for the arch are smart enough to deal with vectorization.