[FFmpeg-devel] [PATCH][VAAPI][2/6] Add common data structures and helpers (take 3)

Michael Niedermayer michaelni
Mon Mar 9 15:10:31 CET 2009


On Mon, Mar 09, 2009 at 11:56:18AM +0100, Gwenole Beauchesne wrote:
> On Sun, 8 Mar 2009, Michael Niedermayer wrote:
> 
> >>> realloc() returning NULL means the original is still there just that
> >>> it failed re allocating it
> >>
> >> 7.20.3.4 indeed confirms what you say but I couldn't find a single
> >> code in lavc operating that way:
> >> new_buffer = av_*_realloc(buffer, new_size);
> >> if (!new_buffer) {
> >>      av_freep(&buffer);
> >>      // do whatever else/return -1
> >> }
> >> buffer = new_buffer;
> >>
> >> would this be what you had in mind?
> >
> > yes
> >
> > also it might make sense to add a
> > av_realloc_and_free()
> > that does free the original in case of fail and replace all code that
> > expects these semantics rather ...
> 
> why not make it the default behaviour for av_realloc(), though a user 
> could override that function? I mean, it's generally used as buffer= 
> av_realloc(buffer, new_size);
> 
> >>> also glibc memcpy() is shit, even more so for copying ito non system
> >>> memory
> >>> you should maybe look at mplayer which has some memcpy written for
> >>> that.
> >>
> >> Hmmm, I think this statement no longer holds for some years now. ;-)
> >> Even Agner's doesn't bring that much, if any performance gain.
> >> Besides, system libcs generally provide the best memcpy() tuned for
> >> the underlying processor and memory hierarchy (caches geometry et
> >> al.). This is true for Apple's (commpage provided functions) and even
> >> glibc, though depending on several factors (distributor, architecture).
> >
> > glibc and "best" in the same paragraph makes me want to puke
> >
> > anyway, actual numbers: (done 3x to show that they are stable)
> >
> > k7 : cpu clocks=170059006 = 98361us  (1016.663fps)  1525.0MB/s
> > mmx: cpu clocks=293085663 = 169516us  (589.915fps)  884.9MB/s
> > sse: cpu clocks=170377116 = 98544us  (1014.775fps)  1522.2MB/s
> > c: cpu clocks=195054405 = 112817us  (886.391fps)  1329.6MB/s
> 
> Good numbers but:
> 1) Those are for an aligned case
> 2) For a large buffer (1.5 MB) it seems
> 3) slices are generally around 20 KB for 720 H.264 and around 60 KB for 
> 1080 H.264 for sample streams I have around, and source buffer aligned on 
> 2-byte boundaries or not at all.
> 
> For mutually aligned on 8-bytes boundaries case, we have (K8 in 32-bit 
> mode):
> 
> (fast_memcpy/sse2)
> 16384           0.09            10204.49        0.95
> 16384           0.09            10210.21        1.00
> 24576           0.09            10539.97        0.97
> 24576           0.09            10660.10        0.99
> 32768           0.09            10705.95        1.00
> 32768           0.09            10687.72        1.00
> 49152           0.09            10482.79        1.02
> 49152           0.09            10566.90        0.99
> 65536           0.09            10539.16        1.00
> 65536           0.09            10546.25        1.00
> 98304           0.18            5264.49         2.00
> 98304           0.18            5266.05         1.00
> 131072          0.18            5281.90         1.00
> 131072          0.18            5282.21         1.00
> 
> (memcpy/mmx)
> 16384           0.04            21997.10        1.00
> 16384           0.04            21997.37        1.00
> 24576           0.04            21870.70        1.01
> 24576           0.04            21870.79        1.00
> 32768           0.04            21374.74        1.02
> 32768           0.04            21380.27        1.00
> 49152           0.09            10498.95        2.04
> 49152           0.09            10499.11        1.00
> 65536           0.11            8829.70         1.19
> 65536           0.11            8829.66         1.00
> 98304           0.11            8818.30         1.00
> 98304           0.11            8818.50         1.00
> 131072          0.11            8811.01         1.00
> 131072          0.11            8811.13         1.00
> 
> (memcpy/libc)
> 16384           0.05            19301.84        1.01
> 16384           0.05            19301.02        1.00
> 24576           0.05            19479.85        0.99
> 24576           0.05            19479.76        1.00
> 32768           0.05            19214.58        1.01
> 32768           0.05            19215.24        1.00
> 49152           0.11            8563.23         2.24
> 49152           0.11            8563.03         1.00
> 65536           0.13            7217.21         1.19
> 65536           0.13            7217.20         1.00
> 98304           0.13            7201.32         1.00
> 98304           0.13            7203.47         1.00
> 131072          0.13            7197.95         1.00
> 131072          0.13            7197.79         1.00
> 
> (memcpy/agner)
> 16384           0.04            22421.78        1.00
> 16384           0.04            22422.20        1.00
> 24576           0.04            22422.01        1.00
> 24576           0.04            22421.31        1.00
> 32768           0.04            22418.09        1.00
> 32768           0.04            22418.23        1.00
> 49152           0.08            12430.12        1.80
> 49152           0.08            12430.02        1.00
> 65536           0.09            10604.25        1.17
> 65536           0.09            10601.27        1.00
> 98304           0.09            10619.99        1.00
> 98304           0.09            10620.01        1.00
> 131072          0.09            10627.03        1.00
> 131072          0.09            10627.05        1.00
> 
> Agner's is indeed the best, then fast_memcpy/mmx (the very old one), then 
> libc, then fast_memcpy/sse2.

could you post the source that was used to generate above numbers?
At least the mplayer devs might be interrested if agners version is
faster.

[...]

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

He who knows, does not speak. He who speaks, does not know. -- Lao Tsu
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20090309/335812f6/attachment.pgp>



More information about the ffmpeg-devel mailing list