[FFmpeg-devel] [PATCH 2/2] Add hflip filter.

Stefano Sabatini stefano.sabatini-lala
Thu Aug 12 20:35:39 CEST 2010


On date Thursday 2010-08-12 12:49:25 -0400, Ronald S. Bultje encoded:
> Hi,
> 
> On Thu, Aug 12, 2010 at 12:39 PM, Stefano Sabatini
> <stefano.sabatini-lala at poste.it> wrote:
> > On date Wednesday 2010-08-04 14:23:49 +0200, Michael Niedermayer encoded:
> >> On Sat, Jul 31, 2010 at 02:07:29AM +0200, Stefano Sabatini wrote:
> > [...]
> >> > +static void draw_slice(AVFilterLink *inlink, int y, int h, int slice_dir)
> >> > +{
> >> > + ? ?FlipContext *flip = inlink->dst->priv;
> >> > + ? ?AVFilterPicRef *inpic ?= inlink->cur_pic;
> >> > + ? ?AVFilterPicRef *outpic = inlink->dst->outputs[0]->outpic;
> >> > + ? ?uint8_t *inrow, *outrow;
> >> > + ? ?int i, j, plane, step, hsub, vsub;
> >> > +
> >> > + ? ?for (plane = 0; plane < 4 && inpic->data[plane]; plane++) {
> >> > + ? ? ? ?step = flip->max_step[plane];
> >> > + ? ? ? ?hsub = (plane == 1 || plane == 2) ? flip->hsub : 0;
> >> > + ? ? ? ?vsub = (plane == 1 || plane == 2) ? flip->vsub : 0;
> >> > +
> >> > + ? ? ? ?outrow = outpic->data[plane] + (y>>vsub) * outpic->linesize[plane];
> >> > + ? ? ? ?inrow ?= inpic ->data[plane] + (y>>vsub) * inpic ->linesize[plane] + ((inlink->w >> hsub) - 1) * step;
> >> > + ? ? ? ?for (i = 0; i < h>>vsub; i++) {
> >> > + ? ? ? ? ? ?for (j = 0; j < (inlink->w >> hsub); j++)
> >> > + ? ? ? ? ? ? ? ?memcpy(outrow + j*step, inrow - j*step, step);
> >>
> >> variable length memcpy on a per pixel base is slow
> >
> > Updated.
> >
> > I didn't manage to understand how bswap/dsputils may be used, I don't
> > know if that would improve it.
> 
> You could create a VideoFilterDSPContext (or a
> HFlipVideoFilterDSPContext), add a function hflip to it, and then any
> one of us could optimize it. E.g. for RGBA32, where step is probably
> 4, we would read it as 8/16-bytes-at-once, flip them using e.g. pshufw
> or something, (do the same for the opposite pixels at the end of the
> row, ) and then write them out again -> you just did 2x 2/4 pixels at
> once. By using multiple registries and making sure there's enough
> padding (which I think is always the case), this'd get even faster,
> also because for at least the left read/write, we can use aligned r/w
> which is faster.
> 
> Not sure if that's what Michael meant, but I guess it's sort of in the
> right direction.

OK I see thanks, I suggest anyway to commit this simple variant, and
then work on the optimizations.

Regards.
-- 
FFmpeg = Foolish and Foolish Magic Portable Enigmatic God



More information about the ffmpeg-devel mailing list