[MPlayer-dev-eng] possible bugs in vf_decimate filter
Michael Niedermayer
michaelni at gmx.at
Mon Oct 16 14:08:21 CEST 2006
Hi
On Sun, Oct 15, 2006 at 02:51:00PM -0600, Loren Merritt wrote:
> On Sun, 15 Oct 2006, Rich Felker wrote:
>
> >I'm the author so I'll comment.
> >
> >On Sat, Oct 14, 2006 at 07:09:32AM -0700, Trent Piepho wrote:
> >>The decimate filter calculates 8x8 SADs over the image. The loop that
> >>calls the SAD function increments x and y by 4 each time, rather than 8.
> >>This means all the pixels, except for the outer four, are included in four
> >>SAD calculations instead of one.
> >
> >This is intentional. Ideally it would increment by 1 each time, but
> >that would be much slower and not much more accurate. The idea is to
> >look for maximal change over _any_ 8x8 block, not just
> >aligned-to-8-pixels 8x8 blocks.
>
> OK, but it would be faster to calculate non-overlapping 4x4 blocks, and
> then add 4 adjacent block sums.
yes, and for 8x8 blocks at every shift a simple vf_boxblur.c like algorithm
could be used
and the current asm can be improved somewhat:
"1: \n\t"
"movq (%%"REG_S"), %%mm0 \n\t"
"movq (%%"REG_S"), %%mm2 \n\t"
"add %%"REG_a", %%"REG_S" \n\t"
"movq (%%"REG_D"), %%mm1 \n\t"
"add %%"REG_b", %%"REG_D" \n\t"
"psubusb %%mm1, %%mm2 \n\t"
"psubusb %%mm0, %%mm1 \n\t"
"movq %%mm2, %%mm0 \n\t"
"movq %%mm1, %%mm3 \n\t"
"punpcklbw %%mm7, %%mm0 \n\t"
"punpcklbw %%mm7, %%mm1 \n\t"
"punpckhbw %%mm7, %%mm2 \n\t"
"punpckhbw %%mm7, %%mm3 \n\t"
"paddw %%mm0, %%mm4 \n\t"
"paddw %%mm1, %%mm4 \n\t"
"paddw %%mm2, %%mm4 \n\t"
"paddw %%mm3, %%mm4 \n\t"
this can be done faster by:
"por %%mm2, %%mm1\n\t"
"movq %%mm1, %%mm3 \n\t"
"punpcklbw %%mm7, %%mm1 \n\t"
"punpckhbw %%mm7, %%mm3 \n\t"
"paddw %%mm1, %%mm4 \n\t"
"paddw %%mm3, %%mm5 \n\t"
the last also adds the left 4 and right 4 into 2 different registers so that
4x4 blocks are calclated
"decl %%ecx \n\t"
id use a "cmp %%"REG_S", ... here, some cpus have a dissike for inc/dec as
inc/dec just change part of the flags which creates a dependancy to the
previous flag value
its also posible to count toward zero and use (base, index) style to read
stuff, that would be 1 instruction less
"jnz 1b \n\t"
"movq %%mm4, (%%"REG_d") \n\t"
"emms \n\t"
emms should be farther outside
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
In the past you could go to a library and read, borrow or copy any book
Today you'd get arrested for mere telling someone where the library is
More information about the MPlayer-dev-eng
mailing list