[FFmpeg-devel] [PATCH 2/2] lavc/vc1dsp: R-V V mspel_pixels

flow gg hlefthleft at gmail.com
Sun Apr 7 08:38:54 EEST 2024


ping

flow gg <hlefthleft at gmail.com> 于2024年3月8日周五 17:46写道:

> Alright, using m8, but for now don't add code to address dependencies in
> loops that have a minor impact. Updated in the reply
>
> Rémi Denis-Courmont <remi at remlab.net> 于2024年3月8日周五 17:08写道:
>
>>
>>
>> Le 8 mars 2024 02:45:46 GMT+02:00, flow gg <hlefthleft at gmail.com> a
>> écrit :
>> >> Isn't it also faster to max LMUL for the adds here?
>> >
>> >It requires the use of one more vset, making the time slightly longer:
>> >147.7 (m1), 148.7 (m8 + vset).
>>
>> A variation of 0.6% on a single set of kernels will end up below
>> measurement noise in real overall codec usage. And then reducing the
>> I-cache contention can improve performance in other ways. Larger LMUL
>> should also improve performance on bigger cores with more ALUs. So it's not
>> all black and white.
>>
>> My personal preference is to keep the code small if it makes almost no
>> difference but I'm not BDFL.
>>
>> >Also this might not be much noticeable on C908, but avoiding sequential
>> >dependencies on the address registers may help. I mean, avoid using as
>> >address
>> >operand a value that was calculated by the immediate previous
>> instruction.
>> >
>> >> Okay, but the test results haven't changed..
>> >It would add more than ten lines of code, perhaps shorter code will
>> better?
>>
>> I don't know. There are definitely in-order vector cores coming, and data
>> dependencies will hurt them. But I don't know if anyone will care about
>> FFmpeg on those.
>> _______________________________________________
>> ffmpeg-devel mailing list
>> ffmpeg-devel at ffmpeg.org
>> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>>
>> To unsubscribe, visit link above, or email
>> ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".
>>
>


More information about the ffmpeg-devel mailing list