[FFmpeg-devel] [PATCH 2/2] Add hflip filter.

Ronald S. Bultje rsbultje
Mon Aug 16 15:07:42 CEST 2010


Hi,

On Thu, Aug 12, 2010 at 2:35 PM, Stefano Sabatini
<stefano.sabatini-lala at poste.it> wrote:
> On date Thursday 2010-08-12 12:49:25 -0400, Ronald S. Bultje encoded:
>> On Thu, Aug 12, 2010 at 12:39 PM, Stefano Sabatini
>> <stefano.sabatini-lala at poste.it> wrote:
>> > On date Wednesday 2010-08-04 14:23:49 +0200, Michael Niedermayer encoded:
>> >> On Sat, Jul 31, 2010 at 02:07:29AM +0200, Stefano Sabatini wrote:
>> > [...]
>> >> > +static void draw_slice(AVFilterLink *inlink, int y, int h, int slice_dir)
>> >> > +{
>> >> > + ? ?FlipContext *flip = inlink->dst->priv;
>> >> > + ? ?AVFilterPicRef *inpic ?= inlink->cur_pic;
>> >> > + ? ?AVFilterPicRef *outpic = inlink->dst->outputs[0]->outpic;
>> >> > + ? ?uint8_t *inrow, *outrow;
>> >> > + ? ?int i, j, plane, step, hsub, vsub;
>> >> > +
>> >> > + ? ?for (plane = 0; plane < 4 && inpic->data[plane]; plane++) {
>> >> > + ? ? ? ?step = flip->max_step[plane];
>> >> > + ? ? ? ?hsub = (plane == 1 || plane == 2) ? flip->hsub : 0;
>> >> > + ? ? ? ?vsub = (plane == 1 || plane == 2) ? flip->vsub : 0;
>> >> > +
>> >> > + ? ? ? ?outrow = outpic->data[plane] + (y>>vsub) * outpic->linesize[plane];
>> >> > + ? ? ? ?inrow ?= inpic ->data[plane] + (y>>vsub) * inpic ->linesize[plane] + ((inlink->w >> hsub) - 1) * step;
>> >> > + ? ? ? ?for (i = 0; i < h>>vsub; i++) {
>> >> > + ? ? ? ? ? ?for (j = 0; j < (inlink->w >> hsub); j++)
>> >> > + ? ? ? ? ? ? ? ?memcpy(outrow + j*step, inrow - j*step, step);
>> >>
>> >> variable length memcpy on a per pixel base is slow
>> >
>> > Updated.
>> >
>> > I didn't manage to understand how bswap/dsputils may be used, I don't
>> > know if that would improve it.
>>
>> You could create a VideoFilterDSPContext (or a
>> HFlipVideoFilterDSPContext), add a function hflip to it, and then any
>> one of us could optimize it. E.g. for RGBA32, where step is probably
>> 4, we would read it as 8/16-bytes-at-once, flip them using e.g. pshufw
>> or something, (do the same for the opposite pixels at the end of the
>> row, ) and then write them out again -> you just did 2x 2/4 pixels at
>> once. By using multiple registries and making sure there's enough
>> padding (which I think is always the case), this'd get even faster,
>> also because for at least the left read/write, we can use aligned r/w
>> which is faster.
>>
>> Not sure if that's what Michael meant, but I guess it's sort of in the
>> right direction.
>
> OK I see thanks, I suggest anyway to commit this simple variant, and
> then work on the optimizations.
[..]
> +            case 3:
> +            {
> +                uint8_t *in  =  inrow;
> +                uint8_t *out = outrow;
> +                for (j = 0; j < (inlink->w >> hsub); j++, out += 3, in -= 3) {
> +                    out[0] = in[0];
> +                    out[1] = in[1];
> +                    out[2] = in[2];
> +                }
> +            }
> +            break;

You can use a uint16+t + uint8_t write here instead of 3 uint8_t writes.

[..]
+            default:
+                for (j = 0; j < (inlink->w >> hsub); j++)
+                    memcpy(outrow + j*step, inrow - j*step, step);
+            }

Do we have pixelformats like this?

Can you write an implementation that uses a draw_slice[5] vfunc array,
which is easier to modify into a HFlipVFilterBlaDSPContext, which you
address like func[FFMIN(step, 5)-1](..), so that optimized functions
can be written easier?

Ronald



More information about the ffmpeg-devel mailing list