[FFmpeg-soc] [soc]: r704 - dirac/libavcodec/dirac.c
Michael Niedermayer
michaelni at gmx.at
Sat Aug 11 23:28:24 CEST 2007
On Sat, Aug 11, 2007 at 11:20:23PM +0200, marco wrote:
> Author: marco
> Date: Sat Aug 11 23:20:23 2007
> New Revision: 704
>
> Log:
> optimize loops for the 9/7 IDWT
>
> Modified:
> dirac/libavcodec/dirac.c
>
> Modified: dirac/libavcodec/dirac.c
> ==============================================================================
> --- dirac/libavcodec/dirac.c (original)
> +++ dirac/libavcodec/dirac.c Sat Aug 11 23:20:23 2007
> @@ -1757,7 +1757,7 @@ STOP_TIMER("idwt53")
> static int dirac_subband_idwt_97(AVCodecContext *avctx,
> int *data, int level) {
> DiracContext *s = avctx->priv_data;
> - int *synth;
> + int *synth, *synthline;
> int x, y;
> int width = subband_width(avctx, level);
> int height = subband_height(avctx, level);
> @@ -1799,90 +1799,101 @@ START_TIMER
> */
>
> /* Vertical synthesis: Lifting stage 1. */
> + synthline = synth;
> for (x = 0; x < synth_width; x++)
> + synthline[x] -= ( synthline[synth_width]
> + + synthline[synth_width]
> + 2) >> 2;
> + synthline = synth + (synth_width << 1);
> for (y = 1; y < height - 1; y++) {
> for (x = 0; x < synth_width; x++) {
> + synthline[x] -= ( synthline[x - synth_width]
> + + synthline[x + synth_width]
> + 2) >> 2;
> }
> + synthline += synth_width << 1;
> }
> + synthline = synth + (synth_height - 2) * synth_width;
> for (x = 0; x < synth_width; x++)
> + synthline[x] -= ( synthline[x - synth_width]
> + + synthline[x + synth_width]
> + 2) >> 2;
>
> /* Vertical synthesis: Lifting stage 2. */
> + synthline = synth + synth_width;
> for (x = 0; x < synth_width; x++)
> + synthline[x] += ( -synthline[x - synth_width]
> + + 9 * synthline[x - synth_width]
> + + 9 * synthline[x + synth_width]
> + - synthline[x + 3 * synth_width]
> + 8) >> 4;
> + synthline = synth + (synth_width << 1);
> for (y = 1; y < height - 2; y++) {
> for (x = 0; x < synth_width; x++) {
performing lifting pass X over the whole image and then pass X+1 over the
whole image is not very cache friendly
it would be better to perform lifting pass X for a line then pass X+1 for
whatever line(s) it can be performed with the data which became available
and then do pass X for the next line, ...
also look at snow.c::lift() maybe something like that could be used to
simplify the code
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Dictatorship naturally arises out of democracy, and the most aggravated
form of tyranny and slavery out of the most extreme liberty. -- Plato
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-soc/attachments/20070811/40d99a11/attachment.pgp>
More information about the FFmpeg-soc
mailing list