[FFmpeg-devel] [PATCH] avcodec/magicyuv: add SIMD for median of 10bits

James Almer jamrial at gmail.com
Sun Dec 25 20:14:39 EET 2016


On 12/25/2016 1:11 PM, Ronald S. Bultje wrote:
> Hi,
> 
> On Sat, Dec 24, 2016 at 9:29 AM, Paul B Mahol <onemda at gmail.com> wrote:
> 
>> On 12/24/16, Ronald S. Bultje <rsbultje at gmail.com> wrote:
>>> Hi,
>>>
>>> On Sat, Dec 24, 2016 at 6:09 AM, Paul B Mahol <onemda at gmail.com> wrote:
>>>
>>>> On 12/24/16, Ronald S. Bultje <rsbultje at gmail.com> wrote:
>>>>> Hi,
>>>>>
>>>>> On Fri, Dec 23, 2016 at 6:18 PM, James Almer <jamrial at gmail.com>
>> wrote:
>>>>>
>>>>>> On 12/23/2016 8:00 PM, Ronald S. Bultje wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> On Fri, Dec 23, 2016 at 12:32 PM, Paul B Mahol <onemda at gmail.com>
>>>> wrote:
>>>>>>>
>>>>>>>> diff --git a/libavcodec/lossless_videodsp.h
>> b/libavcodec/lossless_
>>>>>>>> videodsp.h
>>>>>>>>
>>>>>>> [..]
>>>>>>>
>>>>>>>> @@ -32,6 +32,7 @@ typedef struct LLVidDSPContext {
>>>>>>>>
>>>>>>> [..]
>>>>>>>
>>>>>>>> +    void (*add_magy_median_pred_int16)(uint16_t *dst, const
>>>> uint16_t
>>>>>>>> *top, const uint16_t *diff, unsigned mask, int w, int *left, int
>>>>>> *left_top);
>>>>>>>>
>>>>>>>
>>>>>>> That seems wrong. Why would you add a magicuv-specific function to
>>>>>>> losslessdsp-context which is intended for functions shared between
>>>> many
>>>>>>> (not just one) lossless codecs? You probably want a new dsp for
>>>> magicyuv
>>>>>>> specifically.
>>>>>>>
>>>>>>> I know this is tedious, but we're very specifically trying to
>> prevent
>>>>>>> dsputil from ever happening again.
>>>>>>>
>>>>>>> Ronald
>>>>>>
>>>>>> Some functions in this dsp are used only by huffyuv. Only one is used
>>>>>> by
>>>>>> both huffyuv and magicyuv.
>>>>>> To properly apply what you mention, it would need to be split in two,
>>>>>> huffyuvdsp and lldsp, then this new function added to a new dsp
>> called
>>>>>> magicyuvdsp.
>>>>>
>>>>>
>>>>> That would be even better, yes.
>>>>
>>>> What about yasm code?
>>>>
>>>> I wanted that to be commented.
>>>
>>>
>>> It's like dithering, it uses the immediately adjacent pixel in the next
>>> loop iteration, can you really simd this effectively?
>>
>> Apparently, and someone is making money from it.
> 
> 
> The parallelizable portion of it is the top-topleft, and you seem to do
> that already. Other than that, I don't see much to be done. You can
> probably use some mmxext instructions like pshufw to make life easier, but
> I think you'll always be limited by the inherent limitation.
> 
> Ronald

He can turn the movq + psrlq + psllq + por at the end of the loop into two
movq + palignr for an ssse3 version of the function (still using mmx regs),
but not much more than that i guess.
And even that will probably not make a noticeable difference, assuming it's
actually faster.



More information about the ffmpeg-devel mailing list