[FFmpeg-devel] [PATCH] yadif port to libavfitler

Alexander Strange astrange
Mon May 24 03:20:29 CEST 2010


On May 23, 2010, at 9:03 PM, Baptiste Coudurier wrote:

> On 5/23/10 4:22 AM, Michael Niedermayer wrote:
>> On Sun, May 23, 2010 at 04:10:57AM -0700, Baptiste Coudurier wrote:
>>> Hi Michael,
>>> 
>>> On 5/23/10 3:53 AM, Michael Niedermayer wrote:
>>>> On Thu, May 20, 2010 at 07:52:31PM -0700, Baptiste Coudurier wrote:
>>>>> On 05/20/2010 01:37 AM, Baptiste Coudurier wrote:
>>>>>> Hi guys,
>>>>>> 
>>>>>> Here is my first attempt to port yadif to libavfilter.
>>>>>> 
>>>>>> I chose the request_frame paradigm since the filter has delay.
>>>>>> What happens when no prev frame is available ? Original code seems to
>>>>>> read from
>>>> 
>>>> read from?
>>>> (it obviously should use spatial/directional interpolation for the very
>>>>   first frame)
>>>> 
>>>> 
>>>>>> 
>>>>>> I'd like extensive review if possible and especially instructions on how
>>>>>> to retrieve cpu flags at runtime for MMX.
>>>> 
>>>> hmm, i guess we need to move libavcodec/x86/cpuid.c to libavutil
>>>> 
>>>> 
>>>>>> 
>>>>>> Also Michael, can you please check the calls to filter() ?
>>>> 
>>>> what do you want me to check exactly?
>>> 
>>> The parameters in the filter() call. I'm not sure I fully understand the
>>> logic width fields and parity. Should it be 1 ^ tff if we output only
>>> frames for now ?
>> 
>> parity selects if we interpolate even or odd lines iirc so that should stay
>> (that is from memory havnt rechecked the code)
>> 
>> 
>>> 
>>>>>> I'm also not sure how the field output mode should be supported.
>>>> 
>>>> for every 2nd request_frame() we would just call our sources
>>>> request_frame()
>>>> (the variant with calling 2 start/slice/end in one request_frame() might
>>>> work
>>>>   too but i thinks its less ideal)
>>>> 
>>>> 
>>>>> 
>>>>> Patch updated, offseting the buffers as needed (please double check), and
>>>>> memleak free :>
>>>>> Questions still stands :)
>>>>> The filter works without copying any input picture which is nice.
>>>> 
>>>> does md5 match to mplayers ?
>>> 
>>> Is there a md5 in ffmpeg available already ? Otherwise I can cook something
>>> up.
>> 
>> hmm, seems not, i am just spotting crcenc.c which uses adler32 and our
>> md5 utility code
> 
> Thanks to Reimar we have one now, framemd5 should be applied asap, it's very useful when porting filters.
> 
>>> 
>>>> [...]
>>>>> +static void filter_line_mmx2(AVFilterContext *ctx, uint8_t *dst,
>>>>> +                             uint8_t *prev, uint8_t *cur, uint8_t *next,
>>>>> +                             int w, int refs, int parity)
>>>>> +{
>>>>> +    YADIFContext *yadif = ctx->priv;
>>>>> +    static const uint64_t pw_1 = 0x0001000100010001ULL;
>>>>> +    static const uint64_t pb_1 = 0x0101010101010101ULL;
>>>>> +    const int mode = yadif->mode;
>>>>> +    uint64_t tmp0, tmp1, tmp2, tmp3;
>>>>> +    int x;
>>>>> +
>>>>> +#define FILTER\
>>>>> +    for(x=0; x<w; x+=4){\
>>>>> +        __asm__ volatile(\
>>>>> +            "pxor      %%mm7, %%mm7 \n\t"\
>>>>> +            LOAD4("(%[cur],%[mrefs])", %%mm0) /* c = cur[x-refs] */\
>>>>> +            LOAD4("(%[cur],%[prefs])", %%mm1) /* e = cur[x+refs] */\
>>>>> +            LOAD4("(%["prev2"])", %%mm2) /* prev2[x] */\
>>>>> +            LOAD4("(%["next2"])", %%mm3) /* next2[x] */\
>>>>> +            "movq      %%mm3, %%mm4 \n\t"\
>>>>> +            "paddw     %%mm2, %%mm3 \n\t"\
>>>>> +            "psraw     $1,    %%mm3 \n\t" /* d = (prev2[x] +
>>>>> next2[x])>>1 */\
>>>>> +            "movq      %%mm0, %[tmp0] \n\t" /* c */\
>>>>> +            "movq      %%mm3, %[tmp1] \n\t" /* d */\
>>>>> +            "movq      %%mm1, %[tmp2] \n\t" /* e */\
>>>>> +            "psubw     %%mm4, %%mm2 \n\t"\
>>>>> +            PABS(      %%mm4, %%mm2) /* temporal_diff0 */\
>>>>> +            LOAD4("(%[prev],%[mrefs])", %%mm3) /* prev[x-refs] */\
>>>>> +            LOAD4("(%[prev],%[prefs])", %%mm4) /* prev[x+refs] */\
>>>>> +            "psubw     %%mm0, %%mm3 \n\t"\
>>>>> +            "psubw     %%mm1, %%mm4 \n\t"\
>>>>> +            PABS(      %%mm5, %%mm3)\
>>>>> +            PABS(      %%mm5, %%mm4)\
>>>>> +            "paddw     %%mm4, %%mm3 \n\t" /* temporal_diff1 */\
>>>>> +            "psrlw     $1,    %%mm2 \n\t"\
>>>>> +            "psrlw     $1,    %%mm3 \n\t"\
>>>>> +            "pmaxsw    %%mm3, %%mm2 \n\t"\
>>>>> +            LOAD4("(%[next],%[mrefs])", %%mm3) /* next[x-refs] */\
>>>>> +            LOAD4("(%[next],%[prefs])", %%mm4) /* next[x+refs] */\
>>>>> +            "psubw     %%mm0, %%mm3 \n\t"\
>>>>> +            "psubw     %%mm1, %%mm4 \n\t"\
>>>>> +            PABS(      %%mm5, %%mm3)\
>>>>> +            PABS(      %%mm5, %%mm4)\
>>>>> +            "paddw     %%mm4, %%mm3 \n\t" /* temporal_diff2 */\
>>>>> +            "psrlw     $1,    %%mm3 \n\t"\
>>>>> +            "pmaxsw    %%mm3, %%mm2 \n\t"\
>>>>> +            "movq      %%mm2, %[tmp3] \n\t" /* diff */\
>>>>                                   ^^^^^^
>>>> code like this does not wor with all gcc versions, (it needs a equivalent
>>>> of
>>>> the NAMED_ASM_ARGS from mplayer
>>> 
>>> How should this be changed ?
>>> Or can I just enable it for a specific gcc version to get it into svn
>>> quickly ?
>> 
>> depends on if you are afraid of mans
>> i wouldnt mind but mans will want a proper configure check i think
>> 
>> 
>>> 
>>>> [...]
>>>>> +
>>>>> +static void filter(AVFilterContext *ctx, AVFilterPicRef *dstpic,
>>>>> +                   int parity, int tff)
>>>>> +{
>>>>> +    YADIFContext *yadif = ctx->priv;
>>>>> +    int y, i;
>>>>> +
>>>>> +    for (i = 0; i<   3; i++) {
>>>>> +        int is_chroma = !!i;
>>>>> +        int w = dstpic->w>>   is_chroma;
>>>>> +        int h = dstpic->h>>   is_chroma;
>>>>> +        int refs = yadif->cur->linesize[i];
>>>>> +
>>>>> +        for (y = 0; y<   h; y++) {
>>>>> +            if ((y ^ parity)&   1) {
>>>>> +                uint8_t *prev =&yadif->prev->data[i][y*refs];
>>>>> +                uint8_t *cur  =&yadif->cur ->data[i][y*refs];
>>>>> +                uint8_t *next =&yadif->next->data[i][y*refs];
>>>>> +                uint8_t *dst  =&dstpic->data[i][y*dstpic->linesize[i]];
>>>>> +                yadif->filter_line(ctx, dst, prev, cur, next, w, refs,
>>>>> parity ^ tff);
>>>>> +            } else {
>>>>> +                memcpy(&dstpic->data[i][y*dstpic->linesize[i]],
>>>>> +&yadif->cur->data[i][y*refs], w);
>>>> 
>>>> switching from fast_memcpy() (that works with SIMD) to memcpy() possibly
>>>> looses speed
>>> 
>>> Do we have a replacement ?
>> 
>> i dont think we do.
>> 
>> 
>>> 
>>>> [...]
>>>>> +static int query_formats(AVFilterContext *ctx)
>>>>> +{
>>>>> +    static const enum PixelFormat pix_fmts[] = {
>>>>> +        PIX_FMT_YUV420P,
>>>>> +        PIX_FMT_GRAY8,
>>>>> +        PIX_FMT_NONE
>>>>> +    };
>>>> 
>>>> it should be trivial to support other yuv formats
>>>> 
>>>> other improvments that could be tried once its all in svn are to
>>>> try to pass draw_slice() calls through from the source and maybe to
>>>> pass get_video_buffer() through to avoid the every 2nd line memcpy.
>>> 
>>> Well, with the delay it might be tricky.
>>> In short what's required before svn inclusion ?
>> 
>> cpuid + matching md5 to mplayer
>> (id also like to see double frame output to work if possible but if thats
>>  a problem we could look into it later)
> 
> I'd go for later :)
> 
>> and if fast_memcpy isnt ported we need a todo/fixme somewhere so we dont
>> forget it
> 
> md5 matches, cpuid is a bit complicated to port since MM_FLAGS are defined in avcodec.h
> 
> Should memcpy be ported to libavutil ?

I'm not confident fastmemcpy is always faster than system memcpy.
Different levels of nontemporal hinting should be useful for improving cache behavior in specific cases, but Darwin memcpy has about the same tuning as mplayer's and is more up to date.

Also IIRC mplayer svn's fastmemcpybench doesn't try to use the copied data in the fastmemcpy bench, so whatever is the most nontemporal always wins, and you might not always want that.



More information about the ffmpeg-devel mailing list