[FFmpeg-devel] r9017 breaks WMA decoding on Intel Macs
Guillaume Poirier
gpoirier
Fri Jun 1 11:51:31 CEST 2007
Hi,
Michael Niedermayer wrote:
> so ive came up with the following decissions:
> 1. trent please stay out of disscussions related to gnu/elf/pic if you
> cannot accept that your views are not shared by the other people
> 2. the code should be changed so the loop statements are in asm and not
> in C this is the obvious optimal solution
Could you explain why this would get rid of the shitty offset+%pointer
syntax? This may seem like a trivial question to you, but this doesn't
seem obvious to me. I'd tend to think that we'd still have to write
smth like offset+loop_register+%pointer, or smth like that.
> 3. as a temporary solution the SSE FFT can be disabled for mac osx
> or any other solution like the one proposed by trent can be commited
> but _before_ any solution is commited it should be benchmarked and
> tested with at least current gcc and gcc 2.95
No benchmarked yet with GCC 2.95, just 4.0.1 that's on OSX:
Guillaume's patch (offset+%num => offset%num syntax):
1 60108 dezicycles in ff_imdct_calc_sse, 8 runs, 0 skips
2 43420 dezicycles in ff_imdct_calc_sse, 16 runs, 0 skips
3 35035 dezicycles in ff_imdct_calc_sse, 32 runs, 0 skips
4 30826 dezicycles in ff_imdct_calc_sse, 64 runs, 0 skips
6 28645 dezicycles in ff_imdct_calc_sse, 127 runs, 1 skips
7 30792 dezicycles in ff_imdct_calc_sse, 229 runs, 27 skips
8 150237 dezicycles in ff_imdct_calc_sse, 485 runs, 27 skips
9 250897 dezicycles in ff_imdct_calc_sse, 996 runs, 28 skips
10 304451 dezicycles in ff_imdct_calc_sse, 2020 runs, 28 skips
11 330404 dezicycles in ff_imdct_calc_sse, 4065 runs, 31 skips
12 343799 dezicycles in ff_imdct_calc_sse, 8159 runs, 33 skips
13 347859 dezicycles in ff_imdct_calc_sse, 16351 runs, 33 skips
14 338554 dezicycles in ff_imdct_calc_sse, 32732 runs, 36 skips
1 103317 dezicycles in ff_imdct_calc_sse, 8 runs, 0 skips
2 64707 dezicycles in ff_imdct_calc_sse, 16 runs, 0 skips
3 45353 dezicycles in ff_imdct_calc_sse, 32 runs, 0 skips
4 35913 dezicycles in ff_imdct_calc_sse, 63 runs, 1 skips
5 30643 dezicycles in ff_imdct_calc_sse, 127 runs, 1 skips
6 43838 dezicycles in ff_imdct_calc_sse, 245 runs, 11 skips
7 148116 dezicycles in ff_imdct_calc_sse, 500 runs, 12 skips
8 246907 dezicycles in ff_imdct_calc_sse, 1011 runs, 13 skips
9 305463 dezicycles in ff_imdct_calc_sse, 2035 runs, 13 skips
10 335241 dezicycles in ff_imdct_calc_sse, 4083 runs, 13 skips
11 351301 dezicycles in ff_imdct_calc_sse, 8176 runs, 16 skips
12 353970 dezicycles in ff_imdct_calc_sse, 16365 runs, 19 skips
13 344613 dezicycles in ff_imdct_calc_sse, 32744 runs, 24 skips
14 322969 dezicycles in ff_imdct_calc_sse, 65500 runs, 36 skips
1 60271 dezicycles in ff_imdct_calc_sse, 8 runs, 0 skips
2 43298 dezicycles in ff_imdct_calc_sse, 16 runs, 0 skips
3 34705 dezicycles in ff_imdct_calc_sse, 32 runs, 0 skips
4 30188 dezicycles in ff_imdct_calc_sse, 64 runs, 0 skips
5 27809 dezicycles in ff_imdct_calc_sse, 128 runs, 0 skips
6 43635 dezicycles in ff_imdct_calc_sse, 249 runs, 7 skips
7 153052 dezicycles in ff_imdct_calc_sse, 503 runs, 9 skips
8 255035 dezicycles in ff_imdct_calc_sse, 1014 runs, 10 skips
9 308643 dezicycles in ff_imdct_calc_sse, 2038 runs, 10 skips
10 336467 dezicycles in ff_imdct_calc_sse, 4086 runs, 10 skips
11 348994 dezicycles in ff_imdct_calc_sse, 8179 runs, 13 skips
12 353761 dezicycles in ff_imdct_calc_sse, 16367 runs, 17 skips
13 342854 dezicycles in ff_imdct_calc_sse, 32749 runs, 19 skips
14 322213 dezicycles in ff_imdct_calc_sse, 65509 runs, 27 skips
1 60563 dezicycles in ff_imdct_calc_sse, 8 runs, 0 skips
2 44468 dezicycles in ff_imdct_calc_sse, 16 runs, 0 skips
3 36408 dezicycles in ff_imdct_calc_sse, 32 runs, 0 skips
4 32254 dezicycles in ff_imdct_calc_sse, 64 runs, 0 skips
5 29918 dezicycles in ff_imdct_calc_sse, 128 runs, 0 skips
6 28938 dezicycles in ff_imdct_calc_sse, 226 runs, 30 skips
7 124161 dezicycles in ff_imdct_calc_sse, 373 runs, 139 skip
8 248707 dezicycles in ff_imdct_calc_sse, 885 runs, 139 skips
9 308152 dezicycles in ff_imdct_calc_sse, 1909 runs, 139 skips
10 333396 dezicycles in ff_imdct_calc_sse, 3956 runs, 140 skips
11 346553 dezicycles in ff_imdct_calc_sse, 8050 runs, 142 skips
12 349341 dezicycles in ff_imdct_calc_sse, 16238 runs, 146 skips
13 340201 dezicycles in ff_imdct_calc_sse, 32617 runs, 151 skips
14 320084 dezicycles in ff_imdct_calc_sse, 65374 runs, 162 skips
-----------------------------------------------------------------------------------------------------
Trent's patch (more clobbers):
1 84272 dezicycles in ff_imdct_calc_sse, 8 runs, 0 skips
2 54421 dezicycles in ff_imdct_calc_sse, 16 runs, 0 skips
3 39548 dezicycles in ff_imdct_calc_sse, 32 runs, 0 skips
4 32168 dezicycles in ff_imdct_calc_sse, 64 runs, 0 skips
5 28367 dezicycles in ff_imdct_calc_sse, 128 runs, 0 skips
6 41320 dezicycles in ff_imdct_calc_sse, 245 runs, 11 skips
7 152034 dezicycles in ff_imdct_calc_sse, 499 runs, 13 skips
8 253791 dezicycles in ff_imdct_calc_sse, 1008 runs, 16 skips
9 315684 dezicycles in ff_imdct_calc_sse, 2031 runs, 17 skips
10 344689 dezicycles in ff_imdct_calc_sse, 4078 runs, 18 skips
11 357826 dezicycles in ff_imdct_calc_sse, 8174 runs, 18 skips
12 362568 dezicycles in ff_imdct_calc_sse, 16363 runs, 21 skips
13 353112 dezicycles in ff_imdct_calc_sse, 32739 runs, 29 skips
--
1 59621 dezicycles in ff_imdct_calc_sse, 8 runs, 0 skips
2 41340 dezicycles in ff_imdct_calc_sse, 16 runs, 0 skips
3 32077 dezicycles in ff_imdct_calc_sse, 32 runs, 0 skips
4 27342 dezicycles in ff_imdct_calc_sse, 64 runs, 0 skips
5 24983 dezicycles in ff_imdct_calc_sse, 128 runs, 0 skips
6 24323 dezicycles in ff_imdct_calc_sse, 226 runs, 30 skips
7 24323 dezicycles in ff_imdct_calc_sse, 226 runs, 286 skips
8 206706 dezicycles in ff_imdct_calc_sse, 546 runs, 478 skips
9 315479 dezicycles in ff_imdct_calc_sse, 1570 runs, 478 skips
10 349916 dezicycles in ff_imdct_calc_sse, 3617 runs, 479 skips
11 364592 dezicycles in ff_imdct_calc_sse, 7712 runs, 480 skips
12 367565 dezicycles in ff_imdct_calc_sse, 15901 runs, 483 skips
13 357175 dezicycles in ff_imdct_calc_sse, 32280 runs, 488 skips
14 335467 dezicycles in ff_imdct_calc_sse, 65038 runs, 498 skips
--
1 65130 dezicycles in ff_imdct_calc_sse, 8 runs, 0 skips
2 45605 dezicycles in ff_imdct_calc_sse, 16 runs, 0 skips
3 35843 dezicycles in ff_imdct_calc_sse, 32 runs, 0 skips
4 30820 dezicycles in ff_imdct_calc_sse, 64 runs, 0 skips
5 28176 dezicycles in ff_imdct_calc_sse, 128 runs, 0 skips
6 29809 dezicycles in ff_imdct_calc_sse, 229 runs, 27 skips
7 146353 dezicycles in ff_imdct_calc_sse, 485 runs, 27 skips
8 251789 dezicycles in ff_imdct_calc_sse, 996 runs, 28 skips
9 312750 dezicycles in ff_imdct_calc_sse, 2019 runs, 29 skips
10 340715 dezicycles in ff_imdct_calc_sse, 4067 runs, 29 skips
11 354814 dezicycles in ff_imdct_calc_sse, 8160 runs, 32 skips
12 359516 dezicycles in ff_imdct_calc_sse, 16348 runs, 36 skips
13 350074 dezicycles in ff_imdct_calc_sse, 32727 runs, 41 skips
14 329592 dezicycles in ff_imdct_calc_sse, 65484 runs, 52 skips
Looks to me like Trent's patch is slower.
Bench with GCC-2.95 will follow shortly. Well, actually, when I have
time :)
Guillaume
More information about the ffmpeg-devel
mailing list