[FFmpeg-devel] [PATCH 20/24] sws: add a function for scaling dst slices

Fri Jun 11 18:01:20 EEST 2021

On Thu, Jun 10, 2021 at 05:49:48PM +0200, Anton Khirnov wrote:
> Quoting Michael Niedermayer (2021-06-01 15:02:27)
> > On Mon, May 31, 2021 at 09:55:11AM +0200, Anton Khirnov wrote:
> > > Currently existing sws_scale() accepts as input a user-determined slice
> > > of input data and produces an indeterminate number of output lines.
> > 
> > swscale() should return the number of lines output
> > it does "return dstY - lastDstY;"
> 
> But you do not know the number of lines beforehand.
> I suppose one could assume that the line counts will always be the same
> for any run with the same parameters (strictly speaking this is not
> guaranteed) and store them after the first frame, but then the first
> scale call is not parallel. And it would be quite ugly.
> 

> > 
> > 
> > > Since the calling code does not know the amount of output, it cannot
> > > easily parallelize scaling by calling sws_scale() simultaneously on
> > > different parts of the frame.
> > > 
> > > Add a new function - sws_scale_dst_slice() - that accepts as input the
> > > entire input frame and produces a specified slice of the output. This
> > > function can be called simultaneously on different slices of the output
> > > frame (using different sws contexts) to implement slice threading.
> > 
> > an API that would allow starting before the whole frame is available
> > would have reduced latency and better cache locality. Maybe that can
> > be added later too but i wanted to mention it because the documentation
> > exlicitly says "entire input"
> 
> That would require some way of querying how much input is required for
> each line. I dot not feel sufficiently familiar with sws architecture to
> see an obvious way of implementing this. And then making use of this
> information would require a significantly more sophisticated way of
> dispatching work to threads.

hmm, isnt the filter calculated by initFilter() (for the vertical stuff)
basically listing the input/output relation ?
(with some special cases like cascaded_context maybe)
its a while since i worked on swscale so maybe iam forgetting something

Maybe that can be (easily) used ?

> 
> Or are you proposing some specific alternative way of implementing this?
> 
> > 
> > Also there are a few tables between the multiple SwsContext which are
> > identical, it would be ideal if they can be shared between threads
> > I guess such sharing would need to be implemented before the API is
> > stable otherwise adding it later would require application to be changed
> 
> In my tests, the differences are rather small. E.g. scaling
> 2500x3000->3000x3000 with 32 threads uses only ~15% more memory than
> with 1 thread.
> 
> And I do not see an obvious way to implement this that would be worth
> the extra complexity. Do you?

Well, dont we for every case of threading in the codebase
cleanly split the context in one thread local and one shared?
I certainly will not dispute that its work to do that. But we
did it in every case because its the "right thing" to do for a
clean implemtation. So i think we should aim toward that too here
But maybe iam missing something ?

Thanks

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Why not whip the teacher when the pupil misbehaves? -- Diogenes of Sinope
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20210611/0954f184/attachment.sig>