[FFmpeg-devel] [PATCH] vf_overlay: add support to RGBA packed input and output

Stefano Sabatini stefasab at gmail.com
Sun Oct 30 22:34:41 CET 2011


On date Sunday 2011-10-30 14:42:38 +0100, Michael Niedermayer encoded:
> On Sat, Oct 29, 2011 at 04:47:41PM +0200, Stefano Sabatini wrote:
[...]
> > +                switch (alpha) {
> > +                case 0:
> > +                    break;
> > +                case 255:
> > +                    d[dr] = s[sr];
> > +                    d[dg] = s[sg];
> > +                    d[db] = s[sb];
> > +                    break;
> > +                default:
> > +                    // main_value = main_value * (1 - alpha) + overlay_value * alpha
> 
> > +                    // apply a fast approximation: X/255 ~ (X+128)/256
> 
> please use +128*257>>16 (which is exact)

Uhm I suppose you meant:
((X * 257) + 257)>> 16

For the interested reader:
research.swtch.com/2008/01/division-via-multiplication.html
(or read TAOCP if you want the long version ;-)).

Then I tested with the plain version:
22001580 dezicycles in first, 2 runs, 0 skips
22377187 dezicycles in first, 4 runs, 0 skips
22358670 dezicycles in first, 8 runs, 0 skips
22430178 dezicycles in first, 16 runs, 0 skips
27048690 dezicycles in first, 32 runs, 0 skips
24722512 dezicycles in first, 64 runs, 0 skips
23467227 dezicycles in first, 128 runs, 0 skips
22707239 dezicycles in first, 256 runs, 0 skips
22325824 dezicycles in first, 512 runs, 0 skips
22106139 dezicycles in first, 1024 runs, 0 skips
22007162 dezicycles in first, 2048 runs, 0 skips
21959926 dezicycles in first, 4096 runs, 0 skips
21978105 dezicycles in first, 8192 runs, 0 skips
21927611 dezicycles in first, 16384 runs, 0 skips
21889967 dezicycles in first, 32768 runs, 0 skips

With the optmized variant:
20987625 dezicycles in first, 2 runs, 0 skips
20781405 dezicycles in first, 4 runs, 0 skips
20581886 dezicycles in first, 8 runs, 0 skips
20787228 dezicycles in first, 16 runs, 0 skips
21084062 dezicycles in first, 32 runs, 0 skips
21028600 dezicycles in first, 64 runs, 0 skips
20786884 dezicycles in first, 128 runs, 0 skips
20671322 dezicycles in first, 256 runs, 0 skips
20563223 dezicycles in first, 512 runs, 0 skips
20527375 dezicycles in first, 1024 runs, 0 skips
20481658 dezicycles in first, 2048 runs, 0 skips
20452863 dezicycles in first, 4096 runs, 0 skips
20535609 dezicycles in first, 8192 runs, 0 skips
20503526 dezicycles in first, 16384 runs, 0 skips
20465800 dezicycles in first, 32768 runs, 0 skips

But I confess that I always build ffmpeg with optimizations disabled
(for easing debugging) and I suppose that most decent compilers
will know all about these numerical tricks, so I'm not sure if
these hand-crafted optimizations are worth the code obfuscation.

I'll push soon if I read no more comments.
-- 
FFmpeg = Fabulous Fantastic Mysterious Pitiless Evil Genius
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-vf_overlay-enable-RGB-path.patch
Type: text/x-diff
Size: 9946 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20111030/ac64f041/attachment.bin>


More information about the ffmpeg-devel mailing list