[FFmpeg-devel] [PATCH] vf_overlay: add support to RGBA packed input and output
Stefano Sabatini
stefasab at gmail.com
Sat Oct 29 00:56:15 CEST 2011
On date Thursday 2011-10-27 01:01:40 +0200, Michael Niedermayer encoded:
> On Thu, Oct 27, 2011 at 12:25:43AM +0200, Stefano Sabatini wrote:
> > From 72b3c79a550961b3e215e5f1e6d42da3c362751e Mon Sep 17 00:00:00 2001
> > From: Stefano Sabatini <stefasab at gmail.com>
> > Date: Mon, 24 Oct 2011 20:00:21 +0200
> > Subject: [PATCH] vf_overlay: add support to RGB packed input and output
> >
> > Also add support to alpha pre-multiplication in the RGBA path.
> >
> > Based on the work of Mark Himsley <mark at mdsh.com>.
> >
> > See thread:
> > Subject: [FFmpeg-devel] libavfilter: extending overlay filter
> > Date: Sun, 13 Mar 2011 14:18:42 +0000
> > ---
> > doc/filters.texi | 15 +++++-
> > libavfilter/vf_overlay.c | 134 +++++++++++++++++++++++++++++++++++++++------
> > 2 files changed, 129 insertions(+), 20 deletions(-)
[...]
> > for (i = 0; i < height; i++) {
> > uint8_t *d = dp, *s = sp;
> > for (j = 0; j < width; j++) {
>
> > - d[r] = (d[r] * (0xff - s[3]) + s[0] * s[3] + 128) >> 8;
> > - d[1] = (d[1] * (0xff - s[3]) + s[1] * s[3] + 128) >> 8;
> > - d[b] = (d[b] * (0xff - s[3]) + s[2] * s[3] + 128) >> 8;
> > - d += 3;
> > - s += 4;
> > + // compute the blend multiplication of overlay over the main
> > + alpha = s[over->overlay_rgba_map[A]];
> > + // if the main channel has an alpha channel, alpha has to be calculated
> > + // to create an un-premultiplied (straight) alpha value
> > + if (over->main_has_alpha) {
> > + // apply the general equation:
> > + // alpha = alpha_overlay / ((alpha_main + alpha_overlay) - alpha_main * alpha_overlay)
> > + //
> > + // if alpha_main = 0 => alpha = 0
> > + // if alpha_main = 1 => alpha = alpha_overlay
> > + switch (alpha) {
> > + case 0:
> > + case 0xff:
> > + break;
> > + default:
> > + // the un-premultiplied calculation is:
> > + // (255 * 255 * overlay_alpha) / ( 255 * (overlay_alpha + main_alpha) - (overlay_alpha * main_alpha) )
> > + alpha =
> > + // the next line is a faster version of: 255 * 255 * alpha
> > + ( (alpha << 16) - (alpha << 9) + alpha )
> > + / (
> > + // the next line is a faster version of: 255 * (blend + d[over->inout_rgba_map[A]])
> > + ((alpha + d[over->main_rgba_map[A]]) << 8 ) - (alpha + d[over->main_rgba_map[A]])
> > + - d[over->main_rgba_map[A]] * alpha
> > + );
> > + }
> > + }
> > + switch (alpha) {
> > + case 0:
> > + break;
> > + case 0xff:
> > + d[over->main_rgba_map[R]] = s[over->overlay_rgba_map[R]];
> > + d[over->main_rgba_map[G]] = s[over->overlay_rgba_map[G]];
> > + d[over->main_rgba_map[B]] = s[over->overlay_rgba_map[B]];
> > + break;
> > + default:
> > + d[over->main_rgba_map[R]] = (d[over->main_rgba_map[R]] * (255 - alpha) + s[over->overlay_rgba_map[R]] * alpha) / 255;
> > + d[over->main_rgba_map[G]] = (d[over->main_rgba_map[G]] * (255 - alpha) + s[over->overlay_rgba_map[G]] * alpha) / 255;
> > + d[over->main_rgba_map[B]] = (d[over->main_rgba_map[B]] * (255 - alpha) + s[over->overlay_rgba_map[B]] * alpha) / 255;
> > + }
> > + if (over->main_has_alpha) {
> > + switch (alpha) {
> > + case 0:
> > + break;
> > + case 0xff:
> > + d[over->main_rgba_map[A]] = s[over->overlay_rgba_map[A]];
> > + break;
> > + default:
> > + d[over->main_rgba_map[A]] = (
> > + (d[over->main_rgba_map[A]] << 8) + (0x100 - d[over->main_rgba_map[A]]) * s[over->overlay_rgba_map[A]]
> > + ) >> 8;
> > + }
> > + }
>
> please benchmark this with START/STOP_TIMER against the previous code
RGB path was disabled before this one, I split the present patch and
did some tests.
* Test with no alpha in the main input
before alpha premultiplication
1287135 dezicycles in first, 2 runs, 0 skips
1335442 dezicycles in first, 4 runs, 0 skips
1245555 dezicycles in first, 8 runs, 0 skips
1162359 dezicycles in first, 16 runs, 0 skips
1144390 dezicycles in first, 32 runs, 0 skips
1134602 dezicycles in first, 64 runs, 0 skips
1133281 dezicycles in first, 128 runs, 0 skips
1114852 dezicycles in first, 256 runs, 0 skips
1108999 dezicycles in first, 512 runs, 0 skips
1101536 dezicycles in first, 1024 runs, 0 skips
1096821 dezicycles in first, 2048 runs, 0 skips
1090508 dezicycles in first, 4096 runs, 0 skips
1085896 dezicycles in first, 8192 runs, 0 skips
1084802 dezicycles in first, 16384 runs, 0 skips
1083604 dezicycles in first, 32768 runs, 0 skips
after alpha premultiplication
1224390 dezicycles in second, 2 runs, 0 skips
1202235 dezicycles in second, 4 runs, 0 skips
1191453 dezicycles in second, 8 runs, 0 skips
1183031 dezicycles in second, 16 runs, 0 skips
1230087 dezicycles in second, 32 runs, 0 skips
1227492 dezicycles in second, 64 runs, 0 skips
1230488 dezicycles in second, 128 runs, 0 skips
1215128 dezicycles in second, 256 runs, 0 skips
1207364 dezicycles in second, 512 runs, 0 skips
1199813 dezicycles in second, 1024 runs, 0 skips
1195857 dezicycles in second, 2048 runs, 0 skips
1193954 dezicycles in second, 4096 runs, 0 skips
1194128 dezicycles in second, 8192 runs, 0 skips
1187481 dezicycles in second, 16384 runs, 0 skips
1181874 dezicycles in second, 32768 runs, 0 skips
* Test with alpha in the main input:
28684935 dezicycles in first, 2 runs, 0 skips
28553902 dezicycles in first, 4 runs, 0 skips
28776015 dezicycles in first, 8 runs, 0 skips
29073680 dezicycles in first, 16 runs, 0 skips
28816918 dezicycles in first, 32 runs, 0 skips
28908704 dezicycles in first, 64 runs, 0 skips
28745401 dezicycles in first, 128 runs, 0 skips
28614980 dezicycles in first, 256 runs, 0 skips
28609710 dezicycles in first, 512 runs, 0 skips
28537037 dezicycles in first, 1024 runs, 0 skips
28517850 dezicycles in first, 2048 runs, 0 skips
28466515 dezicycles in first, 4096 runs, 0 skips
28438388 dezicycles in first, 8192 runs, 0 skips
28440383 dezicycles in first, 16384 runs, 0 skips
28426314 dezicycles in first, 32768 runs, 0 skips
33347880 dezicycles in second, 2 runs, 0 skips
33131272 dezicycles in second, 4 runs, 0 skips
38018970 dezicycles in second, 8 runs, 0 skips
48715928 dezicycles in second, 16 runs, 0 skips
44290285 dezicycles in second, 32 runs, 0 skips
43696766 dezicycles in second, 64 runs, 0 skips
38599173 dezicycles in second, 128 runs, 0 skips
36112571 dezicycles in second, 256 runs, 0 skips
34737837 dezicycles in second, 512 runs, 0 skips
34066213 dezicycles in second, 1024 runs, 0 skips
33640178 dezicycles in second, 2048 runs, 0 skips
33368757 dezicycles in second, 4096 runs, 0 skips
33233522 dezicycles in second, 8192 runs, 0 skips
33132908 dezicycles in second, 16384 runs, 0 skips
33062949 dezicycles in second, 32768 runs, 0 skips
Results are as expected, alpha pre-multiplication is significantly
slower but it may also be what the user wants, so I could make it
optional (and preserve the original alpha?, enabled by default?).
--
FFmpeg = Fabulous Fancy Magnificient Practical Ecstatic Gadget
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-vf_overlay-use-opt.h-API-for-setting-options.patch
Type: text/x-diff
Size: 3031 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20111029/405ba59a/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-vf_overlay-enable-RGB-path.patch
Type: text/x-diff
Size: 9686 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20111029/405ba59a/attachment-0001.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0003-vf_overlay-add-support-to-alpha-pre-multiplication-i.patch
Type: text/x-diff
Size: 3682 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20111029/405ba59a/attachment-0002.bin>
More information about the ffmpeg-devel
mailing list