[FFmpeg-devel] [PATCH 2/9] lavfi/nlmeans: add SIMD-friendly assumptions for compute_safe_ssd_integral_image
Clément Bœsch
u at pkh.me
Mon May 7 19:00:36 EEST 2018
On Mon, May 07, 2018 at 12:14:37AM +0200, Michael Niedermayer wrote:
> On Sun, May 06, 2018 at 01:40:53PM +0200, Clément Bœsch wrote:
> > SIMD code will not have to deal with padding itself. Overwriting in that
> > function may have been possible but involve large overreading of the
> > sources. Instead, we simply make sure the width to process is always a
> > multiple of 16. Additionally, there must be some actual area to process
> > so the SIMD code can have its boundary checks after processing the first
> > pixels.
> > ---
> > libavfilter/vf_nlmeans.c | 25 ++++++++++++++++++-------
> > 1 file changed, 18 insertions(+), 7 deletions(-)
> >
> > diff --git a/libavfilter/vf_nlmeans.c b/libavfilter/vf_nlmeans.c
> > index d222d3913e..21f981a605 100644
> > --- a/libavfilter/vf_nlmeans.c
> > +++ b/libavfilter/vf_nlmeans.c
> > @@ -157,6 +157,9 @@ static void compute_safe_ssd_integral_image_c(uint32_t *dst, int dst_linesize_32
> > {
> > int x, y;
> >
> > + /* SIMD-friendly assumptions allowed here */
> > + av_assert2(!(w & 0xf) && w >= 16 && h >= 1);
> > +
> > for (y = 0; y < h; y++) {
> > uint32_t acc = dst[-1] - dst[-dst_linesize_32 - 1];
> >
> > @@ -257,9 +260,16 @@ static void compute_ssd_integral_image(uint32_t *ii, int ii_linesize_32,
> > // to compare the 2 sources pixels
> > const int startx_safe = FFMAX(s1x, s2x);
> > const int starty_safe = FFMAX(s1y, s2y);
> > - const int endx_safe = FFMIN(s1x + w, s2x + w);
> > + const int u_endx_safe = FFMIN(s1x + w, s2x + w); // unaligned
> > const int endy_safe = FFMIN(s1y + h, s2y + h);
> >
> > + // deduce the safe area width and height
> > + const int safe_pw = (u_endx_safe - startx_safe) & ~0xf;
> > + const int safe_ph = endy_safe - starty_safe;
> > +
> > + // adjusted end x position of the safe area after width of the safe area gets aligned
> > + const int endx_safe = startx_safe + safe_pw;
> > +
> > // top part where only one of s1 and s2 is still readable, or none at all
> > compute_unsafe_ssd_integral_image(ii, ii_linesize_32,
> > 0, 0,
> > @@ -273,24 +283,25 @@ static void compute_ssd_integral_image(uint32_t *ii, int ii_linesize_32,
> > 0, starty_safe,
> > src, linesize,
> > offx, offy, e, w, h,
> > - startx_safe, endy_safe - starty_safe);
> > + startx_safe, safe_ph);
> >
> > // main and safe part of the integral
> > av_assert1(startx_safe - s1x >= 0); av_assert1(startx_safe - s1x < w);
> > av_assert1(starty_safe - s1y >= 0); av_assert1(starty_safe - s1y < h);
> > av_assert1(startx_safe - s2x >= 0); av_assert1(startx_safe - s2x < w);
> > av_assert1(starty_safe - s2y >= 0); av_assert1(starty_safe - s2y < h);
> > - compute_safe_ssd_integral_image_c(ii + starty_safe*ii_linesize_32 + startx_safe, ii_linesize_32,
> > - src + (starty_safe - s1y) * linesize + (startx_safe - s1x), linesize,
> > - src + (starty_safe - s2y) * linesize + (startx_safe - s2x), linesize,
> > - endx_safe - startx_safe, endy_safe - starty_safe);
> > + if (safe_pw && safe_ph)
> > + dsp->compute_safe_ssd_integral_image(ii + starty_safe*ii_linesize_32 + startx_safe, ii_linesize_32,
> > + src + (starty_safe - s1y) * linesize + (startx_safe - s1x), linesize,
> > + src + (starty_safe - s2y) * linesize + (startx_safe - s2x), linesize,
> > + safe_pw, safe_ph);
>
>
> i think this is or i am missing some change
>
> libavfilter/vf_nlmeans.c: In function ‘compute_ssd_integral_image’:
> libavfilter/vf_nlmeans.c:294:9: error: ‘dsp’ undeclared (first use in this function)
> dsp->compute_safe_ssd_integral_image(ii + starty_safe*ii_linesize_32 + startx_safe, ii_linesize_32,
> ^
> libavfilter/vf_nlmeans.c:294:9: note: each undeclared identifier is reported only once for each function it appears in
> libavfilter/vf_nlmeans.c: At top level:
> libavfilter/vf_nlmeans.c:153:13: warning: ‘compute_safe_ssd_integral_image_c’ defined but not used [-Wunused-function]
> static void compute_safe_ssd_integral_image_c(uint32_t *dst, int dst_linesize_32,
> ^
> make: *** [libavfilter/vf_nlmeans.o] Error 1
> make: *** Waiting for unfinished jobs....
Yeah I made a mistake while splitting commit, this is fixed locally. At
this step it's supposed to still be calling
compute_safe_ssd_integral_image_c() directly (but its last 2 parameters
changed).
--
Clément B.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20180507/c1632df5/attachment.sig>
More information about the ffmpeg-devel
mailing list