[FFmpeg-devel] [PATCH] vf_overlay: add support to RGBA packed input and output

Mon Oct 31 11:14:07 CET 2011

On date Monday 2011-10-31 00:10:32 +0100, Michael Niedermayer encoded:
> On Mon, Oct 31, 2011 at 12:06:35AM +0100, Stefano Sabatini wrote:
> > On date Sunday 2011-10-30 22:47:31 +0100, Michael Niedermayer encoded:
> > > On Sun, Oct 30, 2011 at 10:34:41PM +0100, Stefano Sabatini wrote:
> > > > On date Sunday 2011-10-30 14:42:38 +0100, Michael Niedermayer encoded:
> > > > > On Sat, Oct 29, 2011 at 04:47:41PM +0200, Stefano Sabatini wrote:
> > > > [...]
> > > > > > +                switch (alpha) {
> > > > > > +                case 0:
> > > > > > +                    break;
> > > > > > +                case 255:
> > > > > > +                    d[dr] = s[sr];
> > > > > > +                    d[dg] = s[sg];
> > > > > > +                    d[db] = s[sb];
> > > > > > +                    break;
> > > > > > +                default:
> > > > > > +                    // main_value = main_value * (1 - alpha) + overlay_value * alpha
> > > > > 
> > > > > > +                    // apply a fast approximation: X/255 ~ (X+128)/256
> > > > > 
> > > > > please use +128*257>>16 (which is exact)
> > > > 
> > > > Uhm I suppose you meant:
> > > > ((X * 257) + 257)>> 16
> > > 
> > > i think we want round to nearest which is
> > > (x+127)/255
> > > or
> > > ((x+127)*257 + 257)>>16
> > > 
> > > this can be simplified to
> > > ((x+128)*257)>>16
> > > 
> > > (above all untested!)
> > > 
> > > 
> > > > 
> > > > For the interested reader:
> > > > research.swtch.com/2008/01/division-via-multiplication.html
> > > > (or read TAOCP if you want the long version ;-)).
> > > > 
> > > > Then I tested with the plain version:
> > > > 22001580 dezicycles in first, 2 runs, 0 skips
> > > > 22377187 dezicycles in first, 4 runs, 0 skips
> > > > 22358670 dezicycles in first, 8 runs, 0 skips
> > > > 22430178 dezicycles in first, 16 runs, 0 skips
> > > > 27048690 dezicycles in first, 32 runs, 0 skips
> > > > 24722512 dezicycles in first, 64 runs, 0 skips
> > > > 23467227 dezicycles in first, 128 runs, 0 skips
> > > > 22707239 dezicycles in first, 256 runs, 0 skips
> > > > 22325824 dezicycles in first, 512 runs, 0 skips
> > > > 22106139 dezicycles in first, 1024 runs, 0 skips
> > > > 22007162 dezicycles in first, 2048 runs, 0 skips
> > > > 21959926 dezicycles in first, 4096 runs, 0 skips
> > > > 21978105 dezicycles in first, 8192 runs, 0 skips
> > > > 21927611 dezicycles in first, 16384 runs, 0 skips
> > > > 21889967 dezicycles in first, 32768 runs, 0 skips
> > > > 
> > > > With the optmized variant:
> > > > 20987625 dezicycles in first, 2 runs, 0 skips
> > > > 20781405 dezicycles in first, 4 runs, 0 skips
> > > > 20581886 dezicycles in first, 8 runs, 0 skips
> > > > 20787228 dezicycles in first, 16 runs, 0 skips
> > > > 21084062 dezicycles in first, 32 runs, 0 skips
> > > > 21028600 dezicycles in first, 64 runs, 0 skips
> > > > 20786884 dezicycles in first, 128 runs, 0 skips
> > > > 20671322 dezicycles in first, 256 runs, 0 skips
> > > > 20563223 dezicycles in first, 512 runs, 0 skips
> > > > 20527375 dezicycles in first, 1024 runs, 0 skips
> > > > 20481658 dezicycles in first, 2048 runs, 0 skips
> > > > 20452863 dezicycles in first, 4096 runs, 0 skips
> > > > 20535609 dezicycles in first, 8192 runs, 0 skips
> > > > 20503526 dezicycles in first, 16384 runs, 0 skips
> > > > 20465800 dezicycles in first, 32768 runs, 0 skips
> > > > 
> > > 
> > > > But I confess that I always build ffmpeg with optimizations disabled
> > > 
> > > you really should not when doing optimizations
> > > 
> > > 
> > > > (for easing debugging) and I suppose that most decent compilers
> > > > will know all about these numerical tricks, so I'm not sure if
> > > > these hand-crafted optimizations are worth the code obfuscation.
> > > 
> > > gcc has to proof that x*257 wont overflow, is within the range
> > > where its valid and that x is not negative.
> > > so i wouldnt bet that it reliably can do this on its own
> > > A human will often just know something isnt negative while a compiler
> > > might just not be able to proof it. In this case it might work out,
> > > i havnt checked what gcc creates out of the divide
> > 
> > Makes sense, thanks for sharing your insight.
> > 
> > Patches updated, I used a macro for the fast 255 division which should
> > ease readability.
> 
> the patches look very nice, thanks

Applied.
-- 
FFmpeg = Friendly & Fundamentalist Meaningless Powered Efficient Gadget