[FFmpeg-devel] [PATCH 2/3] Indeo 5 decoder: common DSP functions

Michael Niedermayer michaelni
Sun Jan 10 23:43:50 CET 2010


On Sun, Jan 10, 2010 at 09:58:37PM +0100, Maxim wrote:
> Michael Niedermayer schrieb:
> > On Sun, Jan 10, 2010 at 01:22:17PM +0200, Kostya wrote:
> >   
> >> On Sat, Jan 09, 2010 at 05:43:40PM +0200, Kostya wrote:
> >>     
> >>> On Sat, Jan 09, 2010 at 03:47:39PM +0100, Michael Niedermayer wrote:
> >>>       
> >>>> On Sat, Jan 09, 2010 at 04:40:30PM +0200, Kostya wrote:
> >>>>         
> >>>>> On Fri, Jan 08, 2010 at 11:41:23PM +0100, Michael Niedermayer wrote:
> >>>>>           
> >>>>>> On Sun, Jan 03, 2010 at 12:56:36PM +0200, Kostya wrote:
> >>>>>> [...]
> >>>>>>             
> >>>>>>> void ff_ivi_recompose53(const IVIPlaneDesc *plane, uint8_t *dst,
> >>>>>>>               
> >>>>> [function body skipped]
> >>>>>           
> >>>>>> is this mess faster than some more readable variant?
> >>>>>>             
> >>>>> Here's more readable variant by me, checked to be bitexact but it's
> >>>>> significantly slower (> 10%), I'd rather leave old one.
> >>>>>           
> >>>> I also prefer speed, what about an implementation using lifting?
> >>>>         
> >>> I'll try to implement it.
> >>>       
> >> Hmm, after some experiments I'd rather leave original version.
> >> Even grouping variables together in array gives significant performance
> >> drop. And pure lifting transform is not applicable here either because
> >> band data is grouped and it will take at least two passes (hor/vert)
> >> with conditions for missing bands and requires an additional temp
> >> buffer.
> >>     
> >
> > So you can improve snow 5/3 performance by using this code?
> >
> > My point is that i dont really care which code but iam slightly alergic to
> > code duplication and i dont see why this should be faster here while slower
> > in snow than lifting ...
> > So please elaborate if you think snow and this have a different optimal
> > implementation
> >
> >   
> 
> At the time of development of this code I did some performance research
> regarding this filter. I observed two important points where the
> performance can be improved:
> 
> - Doing the vertical and horizontal filtering separately requires an
> additional temp buffer what doens't use the cache memory effectively,
> especially in the case of large images. Therefore the one-pass filtering
> was more preferable. Moverover, all previously calculated values must be
> reused whenever possible...



> 
> - Due to data decimation in the encoder an upsampling step (inserting a
> zero value between each pair of the filter coefficients) is needed in
> the decoder. This leads to an high amount of redundant calculations,
> because the half of them operates on zeros. This can be optimized by
> using two separate filter for odd/even pixels. So calculating four
> pixels at once (two vertical + two horizontal ones) using separate
> filters works as fast as the lifting technique because those can be
> simplified alot.

could you show me the lifting code that you used to compare?


[...]

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Frequently ignored awnser#1 FFmpeg bugs should be sent to our bugtracker. User
questions about the command line tools should be sent to the ffmpeg-user ML.
And questions about how to use libav* should be sent to the libav-user ML.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100110/7f912e52/attachment.pgp>



More information about the ffmpeg-devel mailing list