[FFmpeg-devel] [Patch] [avfilter] refactor floating point based interpolation and introduce in vf_lenscorrection

Daniel Oberhoff danieloberhoff at gmail.com
Wed Aug 20 13:00:29 CEST 2014


---
 Daniel Oberhoff 
 daniel.oberhoff at gmail.com



On Aug 20, 2014, at 12:51 PM, Michael Niedermayer <michaelni at gmx.at> wrote:

> On Wed, Aug 20, 2014 at 12:20:06PM +0200, Daniel Oberhoff wrote:
>> 
>> ---
>> Daniel Oberhoff 
>> daniel.oberhoff at gmail.com
>> 
>> 
>> 
>> On Aug 20, 2014, at 12:15 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
>> 
>>> On Wed, Aug 20, 2014 at 11:54:08AM +0200, Daniel Oberhoff wrote:
>>>> 
>>>> ---
>>>> Daniel Oberhoff 
>>>> daniel.oberhoff at gmail.com
>>>> 
>>>> 
>>>> 
>>>> On Aug 20, 2014, at 11:47 AM, Michael Niedermayer <michaelni at gmx.at> wrote:
>>>> 
>>>>> On Wed, Aug 20, 2014 at 08:14:31AM +0200, Daniel Oberhoff wrote:
>>>>>> 
>>>>>> 
>>>>>> Von meinem iPhone gesendet
>>>>>> 
>>>>>>> Am 20.08.2014 um 03:16 schrieb Michael Niedermayer <michaelni at gmx.at>:
>>>>>>> 
>>>>>>>> On Wed, Aug 20, 2014 at 12:12:49AM +0200, Daniel Oberhoff wrote:
>>>>>>>> Hello,
>>>>>>>> 
>>>>>>>> As a follow-up to my last patch I now factored out the floating point based interpolation from transform.h/transform.c and applied it in the vf_lenscorrection filter I introduced in my last patch. What I did not do is also factor out fixed point based interpolation as used in vf_rotate and vf_perspective and maybe others as could probably also be done. Also I did not look further for uses of floating point based interpolation. Basically I just wanted to introduce interpolation in vf_lenscorrection without code duplication. As a side note I also tried to introduce fixed point calculations to vf_lenscorrection but found myself effectively doing floating point “by hand” since due to the high order of the algorithm (up to 4th order) it is very hard to keep track of the right amount of pre/post-comma digits for a given step in the algorithm and given parameters and it felt rather futile after a while.
>>>>>>> 
>>>>>>>> Looking forward to reviews :).
>>>>>>> 
>>>>>>> why did you use the deshake code and not the vf_perspective code ?!
>>>>>>> i suspect this will be quite a bit harder to get into shape for a
>>>>>>> common API
>>>>>>> 
>>>>>> 
>>>>>> As i Said, perspective does fixed point calculations. Also would you care to ekaborate what exactly are your whoes with this API?
>>>>> 
>>>>> there where further commments further down in the reply, that list
>>>>> some problems
>>>>> 
>>>>> In addition, being float based is already a problem on its own,
>>>>> we dont really want it to be float based. Its very likely slower and
>>>>> its not bitexact making tests harder.
>>>> 
>>>> Well, as I had outlined fixed point is not a viable option for my algorithm.
>>> 
>>> blending 4 or 16 8bit samples linearly together works fine in fixed
>>> point. Please see the vf_perspective code, its exactly doing
>>> that.
>>> 
>>> If you have a problem with fixed point, please show exactly where this
>>> problem is and ill try to help
>>> 
>>> 
>>>> 
>>>>> besides a API that works 1 sample at a time redundantly recalculates
>>>>> the Coefficients for each color plane, also doing 1 sample per call
>>>>> isnt very SIMD friendly
>>>>> 
>>>>> or to see this from another point of view
>>>>> what we would like to have is a SSE*/AVX* based generic resampling
>>>>> code using fixed point (with 16bit coeffs for 8bit samples for example)
>>>>> 
>>>>> Theres no need to implement SSE/AVX but the API should be a step
>>>>> toward this, and the patch feels like its going the opposit direction
>>>>> this is even worse as this is common API, so other filters could
>>>>> start using this increasing the amount of code that would eventually
>>>>> be changed/updated/rewritten
>>>> 
>>>> There is some contradiction in your comments. You want to move towards vectorization (btw. how would you do that in ffmpeg? I had previously proposed an sse version using intrinsics and builtin operators, but that was dismissed as not being portable and I was asked to write assembly instead, which I am not proficient in), but you want to do all planes simultanously, even though they are far apart in memory. Imho the gain by being memory local (which is also what you want for vectorization) far outweights the calculation cost for the coefficients.
>>> 
>>> I think you misunderstand how cpu caches work
>>> 
>>> theres no reason why reading from 2 or 3 planes sequentially would
>>> be significnatly slower
>>> also i wonder if this filter shouldnt be working in RGB space.
>> 
>> Well, that would mean I would have to significantly rewrite my algorithm, leading to the question why this was not brought up before.
> 
> The only difference between a packed RGB plane and planar YUV is
> the step between pixels, which would be 1 for YUV planar and
> 3 or 4 for packed RGB
> 
> thats not significantly rewriting anything
> the whole yuv filter code is
> *out++ =  isvalid ? indata[y * inlinesize + x] : 0;
> 
> in rgb you do
> *out++ =  isvalid ? indata[y * inlinesize + 3*x+0] : 0;
> *out++ =  isvalid ? indata[y * inlinesize + 3*x+1] : 0;
> *out++ =  isvalid ? indata[y * inlinesize + 3*x+2] : 0;
> 
> thats 3 lines

As said before, the formats I care about are YUV planar. If you want RGB packed support I will copy/paste the kernel, since writing one kernel generic leads to bad performance penalties...

> 
>> 
>>> camera sensors work in RGB not YUV and correct gamma correction
>>> for interpolation also is a RGB space thing.
>>> now RGB is generally packed pixel which would avoid the multiple
>>> planes
>> 
>> But the codecs work in YUV planar, and filtering is most efficiently done there, betwen the decoder and the encoder, right? Also modern networked cameras all stream h246 and not packed RGB.
> 
> yes, its best to support both, high quality needs RGB, and some
> low quality with incorrect gamma interpolation can be done in yuv

so basically this is about introducing RGB packed support in vf_lenscorrection now? and then support that in the new interpolation api?

> 
> 
>> 
>>> of course if someone wants to use the filter as a quick correction
>>> for a random video downloaded somewhere thats likely in YUV space
>> 
>> For me going to YUV made everyting around 4x faster..
>> 
>> Anyhow, this starts to feel a little like “perfect is the enemy of good” and would lead me to drop this refactoring and instead hack interpolation into vf_lenscorrection directly...
> 
> what you write sounds a bit like you try to avoid making any
> improvments to the filter. Not sure i understand that
> 
> Of course its work to add rgb support, of course its work to get

I can do this, I simply did not understand that all this was about that.

> gamma correction right, of course its work to implement a fixed point
> bicubic filter, …

this will not help me at all, since the filter can not use fixed point. I would rather leave this to someone who is more motivated.

> you dont have to do any but it should be kind of a maintainers goal
> to improve the code while reading your mails it makes me belive
> your goal is to keep the filter as lowest quality as possible ;)

I will simply ignore the childishnes in that comment. If RGB packed is deemed important then I will implement it in a separate patch, and then adapt this patch to support that also (by supplying specialized routines). Still it would have been nice if the need for RGB packed would have been brought up before.

Best



More information about the ffmpeg-devel mailing list