[FFmpeg-devel] [PATCH 3/6] diracdec: add 10-bit Deslauriers-Dubuc 9, 7 (9_7) vertical high-pass function

Thu Jul 19 19:09:56 EEST 2018

On 19 July 2018 at 16:52, James Darnley <jdarnley at obe.tv> wrote:

> On 2018-07-19 17:26, Rostislav Pehlivanov wrote:
> > On 19 July 2018 at 15:52, James Darnley <jdarnley at obe.tv> wrote:
> >
> >> int32_t *b1, int32_t *b2, int
> >>          b1[i] = COMPOSE_DIRAC53iH0(b0[i], b1[i], b2[i]);
> >>  }
> >>
> >> +static void dd97_vertical_hi_sse2(int32_t *b0, int32_t *b1, int32_t
> *b2,
> >> +                                  int32_t *b3, int32_t *b4, int width)
> >> +{
> >> +    int i = width & ~3;
> >> +    ff_dd97_vertical_hi_sse2(b0, b1, b2, b3, b4, i);
> >> +    for(; i<width; i++)
> >> +        b2[i] = COMPOSE_DD97iH0(b0[i], b1[i], b2[i], b3[i], b4[i]);
> >> +
> >> +}
> >>
> >
> >
> > This, along with the rest of the patchset: what's up with the hybrid
> > implementations? Couldn't you put the second part in the asm code as
> well?
> > Now there are 2 function calls instead of 1.
>
> The 8-bit code does this and I just followed it lead.  I believe this is
> done because we cannot write junk data beyond what we think is the end
> of the line because this might be one of the higher depths and the
> coeffs for the next level sit beyond the end of the line.
>
> But now it has just occurred to me that maybe you meant "why didn't you
> do the scalar operations in SIMD?", is that what you meant?  Answer is
> because it didn't occur to me at the time.  Aside from that I always
> write do-while loops in assembly because I can usually guarantee 1 run
> of the block.
>
> I can certainly look at making that change.
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>

Yep, I think you ought to put the scalar code in the asm.