[FFmpeg-devel] [PATCH] yadif port to libavfitler
Michael Niedermayer
michaelni
Sun May 23 13:22:11 CEST 2010
On Sun, May 23, 2010 at 04:10:57AM -0700, Baptiste Coudurier wrote:
> Hi Michael,
>
> On 5/23/10 3:53 AM, Michael Niedermayer wrote:
>> On Thu, May 20, 2010 at 07:52:31PM -0700, Baptiste Coudurier wrote:
>>> On 05/20/2010 01:37 AM, Baptiste Coudurier wrote:
>>>> Hi guys,
>>>>
>>>> Here is my first attempt to port yadif to libavfilter.
>>>>
>>>> I chose the request_frame paradigm since the filter has delay.
>>>> What happens when no prev frame is available ? Original code seems to
>>>> read from
>>
>> read from?
>> (it obviously should use spatial/directional interpolation for the very
>> first frame)
>>
>>
>>>>
>>>> I'd like extensive review if possible and especially instructions on how
>>>> to retrieve cpu flags at runtime for MMX.
>>
>> hmm, i guess we need to move libavcodec/x86/cpuid.c to libavutil
>>
>>
>>>>
>>>> Also Michael, can you please check the calls to filter() ?
>>
>> what do you want me to check exactly?
>
> The parameters in the filter() call. I'm not sure I fully understand the
> logic width fields and parity. Should it be 1 ^ tff if we output only
> frames for now ?
parity selects if we interpolate even or odd lines iirc so that should stay
(that is from memory havnt rechecked the code)
>
>>>> I'm also not sure how the field output mode should be supported.
>>
>> for every 2nd request_frame() we would just call our sources
>> request_frame()
>> (the variant with calling 2 start/slice/end in one request_frame() might
>> work
>> too but i thinks its less ideal)
>>
>>
>>>
>>> Patch updated, offseting the buffers as needed (please double check), and
>>> memleak free :>
>>> Questions still stands :)
>>> The filter works without copying any input picture which is nice.
>>
>> does md5 match to mplayers ?
>
> Is there a md5 in ffmpeg available already ? Otherwise I can cook something
> up.
hmm, seems not, i am just spotting crcenc.c which uses adler32 and our
md5 utility code
>
>> [...]
>>> +static void filter_line_mmx2(AVFilterContext *ctx, uint8_t *dst,
>>> + uint8_t *prev, uint8_t *cur, uint8_t *next,
>>> + int w, int refs, int parity)
>>> +{
>>> + YADIFContext *yadif = ctx->priv;
>>> + static const uint64_t pw_1 = 0x0001000100010001ULL;
>>> + static const uint64_t pb_1 = 0x0101010101010101ULL;
>>> + const int mode = yadif->mode;
>>> + uint64_t tmp0, tmp1, tmp2, tmp3;
>>> + int x;
>>> +
>>> +#define FILTER\
>>> + for(x=0; x<w; x+=4){\
>>> + __asm__ volatile(\
>>> + "pxor %%mm7, %%mm7 \n\t"\
>>> + LOAD4("(%[cur],%[mrefs])", %%mm0) /* c = cur[x-refs] */\
>>> + LOAD4("(%[cur],%[prefs])", %%mm1) /* e = cur[x+refs] */\
>>> + LOAD4("(%["prev2"])", %%mm2) /* prev2[x] */\
>>> + LOAD4("(%["next2"])", %%mm3) /* next2[x] */\
>>> + "movq %%mm3, %%mm4 \n\t"\
>>> + "paddw %%mm2, %%mm3 \n\t"\
>>> + "psraw $1, %%mm3 \n\t" /* d = (prev2[x] +
>>> next2[x])>>1 */\
>>> + "movq %%mm0, %[tmp0] \n\t" /* c */\
>>> + "movq %%mm3, %[tmp1] \n\t" /* d */\
>>> + "movq %%mm1, %[tmp2] \n\t" /* e */\
>>> + "psubw %%mm4, %%mm2 \n\t"\
>>> + PABS( %%mm4, %%mm2) /* temporal_diff0 */\
>>> + LOAD4("(%[prev],%[mrefs])", %%mm3) /* prev[x-refs] */\
>>> + LOAD4("(%[prev],%[prefs])", %%mm4) /* prev[x+refs] */\
>>> + "psubw %%mm0, %%mm3 \n\t"\
>>> + "psubw %%mm1, %%mm4 \n\t"\
>>> + PABS( %%mm5, %%mm3)\
>>> + PABS( %%mm5, %%mm4)\
>>> + "paddw %%mm4, %%mm3 \n\t" /* temporal_diff1 */\
>>> + "psrlw $1, %%mm2 \n\t"\
>>> + "psrlw $1, %%mm3 \n\t"\
>>> + "pmaxsw %%mm3, %%mm2 \n\t"\
>>> + LOAD4("(%[next],%[mrefs])", %%mm3) /* next[x-refs] */\
>>> + LOAD4("(%[next],%[prefs])", %%mm4) /* next[x+refs] */\
>>> + "psubw %%mm0, %%mm3 \n\t"\
>>> + "psubw %%mm1, %%mm4 \n\t"\
>>> + PABS( %%mm5, %%mm3)\
>>> + PABS( %%mm5, %%mm4)\
>>> + "paddw %%mm4, %%mm3 \n\t" /* temporal_diff2 */\
>>> + "psrlw $1, %%mm3 \n\t"\
>>> + "pmaxsw %%mm3, %%mm2 \n\t"\
>>> + "movq %%mm2, %[tmp3] \n\t" /* diff */\
>> ^^^^^^
>> code like this does not wor with all gcc versions, (it needs a equivalent
>> of
>> the NAMED_ASM_ARGS from mplayer
>
> How should this be changed ?
> Or can I just enable it for a specific gcc version to get it into svn
> quickly ?
depends on if you are afraid of mans
i wouldnt mind but mans will want a proper configure check i think
>
>> [...]
>>> +
>>> +static void filter(AVFilterContext *ctx, AVFilterPicRef *dstpic,
>>> + int parity, int tff)
>>> +{
>>> + YADIFContext *yadif = ctx->priv;
>>> + int y, i;
>>> +
>>> + for (i = 0; i< 3; i++) {
>>> + int is_chroma = !!i;
>>> + int w = dstpic->w>> is_chroma;
>>> + int h = dstpic->h>> is_chroma;
>>> + int refs = yadif->cur->linesize[i];
>>> +
>>> + for (y = 0; y< h; y++) {
>>> + if ((y ^ parity)& 1) {
>>> + uint8_t *prev =&yadif->prev->data[i][y*refs];
>>> + uint8_t *cur =&yadif->cur ->data[i][y*refs];
>>> + uint8_t *next =&yadif->next->data[i][y*refs];
>>> + uint8_t *dst =&dstpic->data[i][y*dstpic->linesize[i]];
>>> + yadif->filter_line(ctx, dst, prev, cur, next, w, refs,
>>> parity ^ tff);
>>> + } else {
>>> + memcpy(&dstpic->data[i][y*dstpic->linesize[i]],
>>> +&yadif->cur->data[i][y*refs], w);
>>
>> switching from fast_memcpy() (that works with SIMD) to memcpy() possibly
>> looses speed
>
> Do we have a replacement ?
i dont think we do.
>
>> [...]
>>> +static int query_formats(AVFilterContext *ctx)
>>> +{
>>> + static const enum PixelFormat pix_fmts[] = {
>>> + PIX_FMT_YUV420P,
>>> + PIX_FMT_GRAY8,
>>> + PIX_FMT_NONE
>>> + };
>>
>> it should be trivial to support other yuv formats
>>
>> other improvments that could be tried once its all in svn are to
>> try to pass draw_slice() calls through from the source and maybe to
>> pass get_video_buffer() through to avoid the every 2nd line memcpy.
>
> Well, with the delay it might be tricky.
> In short what's required before svn inclusion ?
cpuid + matching md5 to mplayer
(id also like to see double frame output to work if possible but if thats
a problem we could look into it later)
and if fast_memcpy isnt ported we need a todo/fixme somewhere so we dont
forget it
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Old school: Use the lowest level language in which you can solve the problem
conveniently.
New school: Use the highest level language in which the latest supercomputer
can solve the problem without the user falling asleep waiting.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100523/d850f83f/attachment.pgp>
More information about the ffmpeg-devel
mailing list