[FFmpeg-devel] [PATCH][VAAPI][2/6] Add common data structures and helpers (take 3)
Michael Niedermayer
michaelni
Mon Mar 9 15:10:31 CET 2009
On Mon, Mar 09, 2009 at 11:56:18AM +0100, Gwenole Beauchesne wrote:
> On Sun, 8 Mar 2009, Michael Niedermayer wrote:
>
> >>> realloc() returning NULL means the original is still there just that
> >>> it failed re allocating it
> >>
> >> 7.20.3.4 indeed confirms what you say but I couldn't find a single
> >> code in lavc operating that way:
> >> new_buffer = av_*_realloc(buffer, new_size);
> >> if (!new_buffer) {
> >> av_freep(&buffer);
> >> // do whatever else/return -1
> >> }
> >> buffer = new_buffer;
> >>
> >> would this be what you had in mind?
> >
> > yes
> >
> > also it might make sense to add a
> > av_realloc_and_free()
> > that does free the original in case of fail and replace all code that
> > expects these semantics rather ...
>
> why not make it the default behaviour for av_realloc(), though a user
> could override that function? I mean, it's generally used as buffer=
> av_realloc(buffer, new_size);
>
> >>> also glibc memcpy() is shit, even more so for copying ito non system
> >>> memory
> >>> you should maybe look at mplayer which has some memcpy written for
> >>> that.
> >>
> >> Hmmm, I think this statement no longer holds for some years now. ;-)
> >> Even Agner's doesn't bring that much, if any performance gain.
> >> Besides, system libcs generally provide the best memcpy() tuned for
> >> the underlying processor and memory hierarchy (caches geometry et
> >> al.). This is true for Apple's (commpage provided functions) and even
> >> glibc, though depending on several factors (distributor, architecture).
> >
> > glibc and "best" in the same paragraph makes me want to puke
> >
> > anyway, actual numbers: (done 3x to show that they are stable)
> >
> > k7 : cpu clocks=170059006 = 98361us (1016.663fps) 1525.0MB/s
> > mmx: cpu clocks=293085663 = 169516us (589.915fps) 884.9MB/s
> > sse: cpu clocks=170377116 = 98544us (1014.775fps) 1522.2MB/s
> > c: cpu clocks=195054405 = 112817us (886.391fps) 1329.6MB/s
>
> Good numbers but:
> 1) Those are for an aligned case
> 2) For a large buffer (1.5 MB) it seems
> 3) slices are generally around 20 KB for 720 H.264 and around 60 KB for
> 1080 H.264 for sample streams I have around, and source buffer aligned on
> 2-byte boundaries or not at all.
>
> For mutually aligned on 8-bytes boundaries case, we have (K8 in 32-bit
> mode):
>
> (fast_memcpy/sse2)
> 16384 0.09 10204.49 0.95
> 16384 0.09 10210.21 1.00
> 24576 0.09 10539.97 0.97
> 24576 0.09 10660.10 0.99
> 32768 0.09 10705.95 1.00
> 32768 0.09 10687.72 1.00
> 49152 0.09 10482.79 1.02
> 49152 0.09 10566.90 0.99
> 65536 0.09 10539.16 1.00
> 65536 0.09 10546.25 1.00
> 98304 0.18 5264.49 2.00
> 98304 0.18 5266.05 1.00
> 131072 0.18 5281.90 1.00
> 131072 0.18 5282.21 1.00
>
> (memcpy/mmx)
> 16384 0.04 21997.10 1.00
> 16384 0.04 21997.37 1.00
> 24576 0.04 21870.70 1.01
> 24576 0.04 21870.79 1.00
> 32768 0.04 21374.74 1.02
> 32768 0.04 21380.27 1.00
> 49152 0.09 10498.95 2.04
> 49152 0.09 10499.11 1.00
> 65536 0.11 8829.70 1.19
> 65536 0.11 8829.66 1.00
> 98304 0.11 8818.30 1.00
> 98304 0.11 8818.50 1.00
> 131072 0.11 8811.01 1.00
> 131072 0.11 8811.13 1.00
>
> (memcpy/libc)
> 16384 0.05 19301.84 1.01
> 16384 0.05 19301.02 1.00
> 24576 0.05 19479.85 0.99
> 24576 0.05 19479.76 1.00
> 32768 0.05 19214.58 1.01
> 32768 0.05 19215.24 1.00
> 49152 0.11 8563.23 2.24
> 49152 0.11 8563.03 1.00
> 65536 0.13 7217.21 1.19
> 65536 0.13 7217.20 1.00
> 98304 0.13 7201.32 1.00
> 98304 0.13 7203.47 1.00
> 131072 0.13 7197.95 1.00
> 131072 0.13 7197.79 1.00
>
> (memcpy/agner)
> 16384 0.04 22421.78 1.00
> 16384 0.04 22422.20 1.00
> 24576 0.04 22422.01 1.00
> 24576 0.04 22421.31 1.00
> 32768 0.04 22418.09 1.00
> 32768 0.04 22418.23 1.00
> 49152 0.08 12430.12 1.80
> 49152 0.08 12430.02 1.00
> 65536 0.09 10604.25 1.17
> 65536 0.09 10601.27 1.00
> 98304 0.09 10619.99 1.00
> 98304 0.09 10620.01 1.00
> 131072 0.09 10627.03 1.00
> 131072 0.09 10627.05 1.00
>
> Agner's is indeed the best, then fast_memcpy/mmx (the very old one), then
> libc, then fast_memcpy/sse2.
could you post the source that was used to generate above numbers?
At least the mplayer devs might be interrested if agners version is
faster.
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
He who knows, does not speak. He who speaks, does not know. -- Lao Tsu
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20090309/335812f6/attachment.pgp>
More information about the ffmpeg-devel
mailing list