[FFmpeg-devel] PATCH: allow load_input_picture, load_input_picture to be architecture dependent

Tue Jul 24 00:30:38 CEST 2007

Hi

On Mon, Jul 23, 2007 at 06:06:23PM -0400, Robin Getz wrote:
> On Thu 19 Jul 2007 09:11, Michael Niedermayer pondered:
> > On Thu, Jul 19, 2007 at 07:35:55AM -0400, Marc Hoffman wrote:
> > > We would be me ++ folks using Blackfin in real systems that are
> > > waiting for better system performance.
> > 
> > doing the copy in the background like you originally did requires
> > a few more modifications than you did, that is you would have to add
> > checks to several points so that we dont read the buffer before the 
> > specfic part has been copied, this sounds quite hackish and iam not
> > happy about it 
> 
> architecture specific optimisations are never a happy thing.

no, most of them are clean and well seperated but this dma memcpy thing
is a mess and has no chance to reach svn unless someone shows first that
all alternatives are worse (benchmarks absolutely required)
alternatives are, using the preserve flag and changing ffmpeg.c
and doing the dma copy but wait until its done

if these 2 are slower than a correct implementation with all the needed
checks and locks in place than we can see if the gain (seen in the
benchmark) is worth the mess (seen in the patch)

> 
> I would think that with the proper defines
> 
> #ifdef USE_NONBLOCKINGCPY
> #extern non_blocking_memcpy(void *dest, const void *src, size_t n);
> #extern non_blocking_memcpy_done(void *dest);
> #else
> #define non_blocking_memcpy(dest, src, n) memcpy(dest, src, n)
> #define non_blocking_memcpy_done
> #endif
> 
> it could be made less "hackish" - and still provide the optimisation.

the buffer is immedeatly needed after the copy, its just not the whole
buffer which is, its rather used from top to bottom so with 
spin locks or equivalent placed all over the place and some way to figure
out how much has been copied its possible

also you could change the code more significantly to make the memcpy + 
done possible but it would add 1 frame delay and as said require some
changes
all in all i do not think this is wroth it ...

> 
> > is mpeg4 encoding speed on blackfin really that important?
> 
> There are lots of people waiting for it to get better than it is. (Like me)
> 
> > cant you just optimize memcpy() in a compatible non background way?
> 
> memcpy is already as optimized as it can be
>   - it is already in assembly
>   - doing int (32-bit) copies when possible.
>   - The loop comes down to:
>     MNOP || [P0++] = R3 || R3 = [I1++];
>    Which is a read/write in a single instruction cycle (if things are all
>     in cache). This coupled with zero overhead hardware loops makes things
>     as fast as they can be.
>   
> The things that slow this down are cache misses, cache flushes, external 
> memory page open/close - things you can't avoid. If we could be doing compute
> at the same time - it could make up for some of these stalls.

it should be faster to read several and then write several things instead
of read 1 write it, read next write it, ...

also you can do the memcpy() with the DMA thing, you just have to wait for
it to finish before returning

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

The misfortune of the wise is better than the prosperity of the fool.
-- Epicurus
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070724/f82bb794/attachment.pgp>