[FFmpeg-devel] libavcodec/proresdec : add qmat dsp with SSE2, AVX2 simd

Martin Vignali martin.vignali at gmail.com
Thu Oct 12 12:36:12 EEST 2017


2017-10-10 3:16 GMT+02:00 Ivan Kalvachev <ikalvachev at gmail.com>:

> On 10/9/17, Martin Vignali <martin.vignali at gmail.com> wrote:
> > 2017-10-07 18:16 GMT+02:00 Ronald S. Bultje <rsbultje at gmail.com>:
> >
> >> Hi Martin,
> >>
> >> On Sat, Oct 7, 2017 at 11:49 AM, Martin Vignali <
> martin.vignali at gmail.com>
> >> wrote:
> >>
> >> > 2017-10-07 17:30 GMT+02:00 Ronald S. Bultje <rsbultje at gmail.com>:
> >> > > On Sat, Oct 7, 2017 at 10:22 AM, Martin Vignali <
> >> > martin.vignali at gmail.com>
> >> > > wrote:
> >> > > > Patch in attach add a new dsp
> >> > > > for manipulation of qmat
> >> > > >
> >> > > > for now, i move this code inside
> >> > > >
> >> > > > for (i = 0; i < 64; i++) {
> >> > > >         qmat_luma_scaled  [i] = ctx->qmat_luma  [i] * qscale;
> >> > > >         qmat_chroma_scaled[i] = ctx->qmat_chroma[i] * qscale;
> >> > > > }
> >> > > >
> >> > > > i add a special case for qscale == 1
> >> > > > and SSE2, AVX2 optimization
> >> > >
> >> > > This loop only executes once per slice. We typically do not
> >> SIMD-optimize
> >> > > at that level, because it won't give significant speed gains...
> >> >
> >> > Ok didn't know that.
> >> > I mostly follow, what there are already done, like in
> >> blockdsp.clear_block
> >> >
> >>
> >> Right, so consider that blockdsp is done per block (16x16 pixels), not
> per
> >> slice.
> >>
> > Ok on principle (only improve, a func which is called quite often)
>
> It's more of:  We can't refuse code that makes a measurable improvement.
>
> Also have in mind that compilers are getting smarter and this code is
> good target for auto-vectorization. Of course FFmpeg disables is,
> because of long history of compiler bugs related to it.
>
> >> You could remove this entirely from the slice processing code by simply
> >> pre-calculating the values in the init function once for the whole
> stream,
> >> there's only 224 qscale values so it's 224*64*2 multiplications, which
> is
> >> (in the context of prores) virtually negligible.
> >>
> >
> > Not sure, we can do that for prores decoder
> > the qmat seems to be set on the decode frame header func
> > (based on the header of the frame).
>
> You can at least check if the qscale has changed and avoid recalculation.
> I think that the lgpl decoder does that.
>
> Yes you're right, the lgpl decoder only calculate it, if qscale (and qmat)
doesn't change
I will take a look on this

Thanks

Martin


More information about the ffmpeg-devel mailing list