[FFmpeg-devel] [PATCH 3/6] diracdec: add 10-bit Deslauriers-Dubuc 9, 7 (9_7) vertical high-pass function
Rostislav Pehlivanov
atomnuker at gmail.com
Thu Jul 19 19:09:56 EEST 2018
On 19 July 2018 at 16:52, James Darnley <jdarnley at obe.tv> wrote:
> On 2018-07-19 17:26, Rostislav Pehlivanov wrote:
> > On 19 July 2018 at 15:52, James Darnley <jdarnley at obe.tv> wrote:
> >
> >> int32_t *b1, int32_t *b2, int
> >> b1[i] = COMPOSE_DIRAC53iH0(b0[i], b1[i], b2[i]);
> >> }
> >>
> >> +static void dd97_vertical_hi_sse2(int32_t *b0, int32_t *b1, int32_t
> *b2,
> >> + int32_t *b3, int32_t *b4, int width)
> >> +{
> >> + int i = width & ~3;
> >> + ff_dd97_vertical_hi_sse2(b0, b1, b2, b3, b4, i);
> >> + for(; i<width; i++)
> >> + b2[i] = COMPOSE_DD97iH0(b0[i], b1[i], b2[i], b3[i], b4[i]);
> >> +
> >> +}
> >>
> >
> >
> > This, along with the rest of the patchset: what's up with the hybrid
> > implementations? Couldn't you put the second part in the asm code as
> well?
> > Now there are 2 function calls instead of 1.
>
> The 8-bit code does this and I just followed it lead. I believe this is
> done because we cannot write junk data beyond what we think is the end
> of the line because this might be one of the higher depths and the
> coeffs for the next level sit beyond the end of the line.
>
> But now it has just occurred to me that maybe you meant "why didn't you
> do the scalar operations in SIMD?", is that what you meant? Answer is
> because it didn't occur to me at the time. Aside from that I always
> write do-while loops in assembly because I can usually guarantee 1 run
> of the block.
>
> I can certainly look at making that change.
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
Yep, I think you ought to put the scalar code in the asm.
More information about the ffmpeg-devel
mailing list