[FFmpeg-devel] [FFMpeg-Devel] Ideas for changes to libpostproc

Wed Mar 18 03:20:26 CET 2015

On Wed, Mar 18, 2015 at 02:30:29AM +0100, Michael Niedermayer wrote:
> On Tue, Mar 17, 2015 at 08:39:02PM -0400, Tucker DiNapoli wrote:
> > This isn't really a patch, but it's easiest to express my ideas in the form of
> > code. As a patch it creates a single file which is mostly composed of a rewrite
> > of the main postprocessing loop. I've tried to express most of my ideas in
> > the form of changes to the code, but in cases where that would be too much
> > work, or wouldn't make sense in this file I've written my ideas in comments.
> > 
> > I'm mostly looking for opinions/critisims on my ideas, not necessarily the 
> > code itself. I'm fully willing to change code, but I'm more intrested in
> > weather or not my ideas make sense or not.
> > 
> 
> > Updating libpostproc is something I plan to do for the google summer of code,
> > so I can't make all the changes I'd like now. I need to have some sort of 
> > qualification task complete within the next week, I've submitted some patches to
> 
> it would be good to have some patch next week, sure, but there is
> more time, the 27th is the deadline for submiting an application to
> google, there is more time for the qualification task
> 
> 
> [...]
> 
> > +                else if(mode & V_DEBLOCK){
> > +                    //Not sure how to convert this to simd, I was thinking vertClassify
> > +                    //would return a mask classifying multiple blocks, but even if it
> > +                    //does I'm not sure how to run the filters
> > +
> > +                    //I guess I could test the mask, and if it's not uniform
> > +                    //run both filters and choose which one to use for each block
> > +                    //based on the mask
> 
> yes, you have correctly analyzed the situation.
> It would be possible to fall back to call the MMX code multiple times
> when the type differs and makes AVX/SSE inppossible or both filters
> could be run in AVX/SSE and then some mask & combine could be used
> 
> One possibility to move towards this in manageable steps could be to
> first change the existing code so instead of doing
> for each "8x8" block do
>     h filter (categorize and apply filter based on that)
>     transpose
>     v filter (categorize and apply filter based on that)
>     transpose
>     dering
>     ...
> 
> -for each "8x8" block do
> +for each 4 8x8 blocks do
> +    for i in 4 do
> 
> then the next step:
> +    H categorize 4 blocks
>      for i in 4 do
> -       H categorize
>         H Filter depending on categorize
> 
> then here one could add
>      H categorize 4 blocks
> +    if all have the same categorization
> +       H Filter in AVX2
> +    else if 2 match
> +       H Filter in SSE2
> +    else
>      for i in 4 do
>         H Filter depending on categorize
> 
> or the same could be done with the next step in the filtering
> pipeline
> 
> 
> also iam not sure its worth it to have the main loop block size
> variable, it might be easier to always go by steps of 4 8-pixel blocks
> horizontally and 1 8pixel block vertically

also about the postprocessing functions there are several which
are not very important, *X1Filter was just for experimentation, dont
waste time on that

The most important are do_a_deblock() and dering() probably
do_a_deblock() also might be easier than do*LowPass / do*DefFilter
*Classify to optimize as it already works with masks instead of a
single category field

do_a_deblock() is supposed to be an accurate implementation while
do*LowPass / do*DefFilter / *Classify are a approximations

in principle optimizing do_a_deblock() and dering() should
be enough as any system with AVX/SSE will be more then fast
enough to do the accurate ones. So you could after these and
anything else you like to work on in libpostproc, work on optimizing
other parts of the codebase in summer if you like

[...]

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

In fact, the RIAA has been known to suggest that students drop out
of college or go to community college in order to be able to afford
settlements. -- The RIAA
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 181 bytes
Desc: Digital signature
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20150318/85dea4b9/attachment.asc>