[FFmpeg-devel] r9017 breaks WMA decoding on Intel Macs

Guillaume Poirier gpoirier
Fri Jun 1 11:51:31 CEST 2007


Hi,

Michael Niedermayer wrote:

> so ive came up with the following decissions:
> 1. trent please stay out of disscussions related to gnu/elf/pic if you
>    cannot accept that your views are not shared by the other people
> 2. the code should be changed so the loop statements are in asm and not
>    in C this is the obvious optimal solution

Could you explain why this would get rid of the shitty offset+%pointer
syntax? This may seem like a trivial question to you, but this doesn't
seem obvious to me. I'd tend to think that we'd still have to write
smth like offset+loop_register+%pointer, or smth like that.



> 3. as a temporary solution the SSE FFT can be disabled for mac osx
>    or any other solution like the one proposed by trent can be commited
>    but _before_ any solution is commited it should be benchmarked and
>    tested with at least current gcc and gcc 2.95


No benchmarked yet with GCC 2.95, just 4.0.1 that's on OSX:

Guillaume's patch (offset+%num => offset%num syntax):

1  60108  dezicycles in ff_imdct_calc_sse, 8 runs, 0 skips
2  43420  dezicycles in ff_imdct_calc_sse, 16 runs, 0 skips
3  35035  dezicycles in ff_imdct_calc_sse, 32 runs, 0 skips
4  30826  dezicycles in ff_imdct_calc_sse, 64 runs, 0 skips
6  28645  dezicycles in ff_imdct_calc_sse, 127 runs, 1 skips
7  30792  dezicycles in ff_imdct_calc_sse, 229 runs, 27 skips
8  150237 dezicycles in ff_imdct_calc_sse, 485 runs, 27 skips
9  250897 dezicycles in ff_imdct_calc_sse, 996 runs, 28 skips
10 304451 dezicycles in ff_imdct_calc_sse, 2020 runs, 28 skips
11 330404 dezicycles in ff_imdct_calc_sse, 4065 runs, 31 skips
12 343799 dezicycles in ff_imdct_calc_sse, 8159 runs, 33 skips
13 347859 dezicycles in ff_imdct_calc_sse, 16351 runs, 33 skips
14 338554 dezicycles in ff_imdct_calc_sse, 32732 runs, 36 skips

1  103317 dezicycles in ff_imdct_calc_sse, 8 runs, 0 skips
2   64707 dezicycles in ff_imdct_calc_sse, 16 runs, 0 skips
3   45353 dezicycles in ff_imdct_calc_sse, 32 runs, 0 skips
4   35913 dezicycles in ff_imdct_calc_sse, 63 runs, 1 skips
5   30643 dezicycles in ff_imdct_calc_sse, 127 runs, 1 skips
6   43838 dezicycles in ff_imdct_calc_sse, 245 runs, 11 skips
7  148116 dezicycles in ff_imdct_calc_sse, 500 runs, 12 skips
8  246907 dezicycles in ff_imdct_calc_sse, 1011 runs, 13 skips
9  305463 dezicycles in ff_imdct_calc_sse, 2035 runs, 13 skips
10 335241 dezicycles in ff_imdct_calc_sse, 4083 runs, 13 skips
11 351301 dezicycles in ff_imdct_calc_sse, 8176 runs, 16 skips
12 353970 dezicycles in ff_imdct_calc_sse, 16365 runs, 19 skips
13 344613 dezicycles in ff_imdct_calc_sse, 32744 runs, 24 skips
14 322969 dezicycles in ff_imdct_calc_sse, 65500 runs, 36 skips

1   60271 dezicycles in ff_imdct_calc_sse, 8 runs, 0 skips
2   43298 dezicycles in ff_imdct_calc_sse, 16 runs, 0 skips
3   34705 dezicycles in ff_imdct_calc_sse, 32 runs, 0 skips
4   30188 dezicycles in ff_imdct_calc_sse, 64 runs, 0 skips
5   27809 dezicycles in ff_imdct_calc_sse, 128 runs, 0 skips
6   43635 dezicycles in ff_imdct_calc_sse, 249 runs, 7 skips
7  153052 dezicycles in ff_imdct_calc_sse, 503 runs, 9 skips
8  255035 dezicycles in ff_imdct_calc_sse, 1014 runs, 10 skips
9  308643 dezicycles in ff_imdct_calc_sse, 2038 runs, 10 skips
10 336467 dezicycles in ff_imdct_calc_sse, 4086 runs, 10 skips
11 348994 dezicycles in ff_imdct_calc_sse, 8179 runs, 13 skips
12 353761 dezicycles in ff_imdct_calc_sse, 16367 runs, 17 skips
13 342854 dezicycles in ff_imdct_calc_sse, 32749 runs, 19 skips
14 322213 dezicycles in ff_imdct_calc_sse, 65509 runs, 27 skips

1   60563 dezicycles in ff_imdct_calc_sse, 8 runs, 0 skips
2   44468 dezicycles in ff_imdct_calc_sse, 16 runs, 0 skips
3   36408 dezicycles in ff_imdct_calc_sse, 32 runs, 0 skips
4   32254 dezicycles in ff_imdct_calc_sse, 64 runs, 0 skips
5   29918 dezicycles in ff_imdct_calc_sse, 128 runs, 0 skips
6   28938 dezicycles in ff_imdct_calc_sse, 226 runs, 30 skips
7  124161 dezicycles in ff_imdct_calc_sse, 373 runs, 139 skip
8  248707 dezicycles in ff_imdct_calc_sse, 885 runs, 139 skips
9  308152 dezicycles in ff_imdct_calc_sse, 1909 runs, 139 skips
10 333396 dezicycles in ff_imdct_calc_sse, 3956 runs, 140 skips
11 346553 dezicycles in ff_imdct_calc_sse, 8050 runs, 142 skips
12 349341 dezicycles in ff_imdct_calc_sse, 16238 runs, 146 skips
13 340201 dezicycles in ff_imdct_calc_sse, 32617 runs, 151 skips
14 320084 dezicycles in ff_imdct_calc_sse, 65374 runs, 162 skips


-----------------------------------------------------------------------------------------------------

Trent's patch (more clobbers):

1   84272 dezicycles in ff_imdct_calc_sse, 8 runs, 0 skips
2   54421 dezicycles in ff_imdct_calc_sse, 16 runs, 0 skips
3   39548 dezicycles in ff_imdct_calc_sse, 32 runs, 0 skips
4   32168 dezicycles in ff_imdct_calc_sse, 64 runs, 0 skips
5   28367 dezicycles in ff_imdct_calc_sse, 128 runs, 0 skips
6   41320 dezicycles in ff_imdct_calc_sse, 245 runs, 11 skips
7  152034 dezicycles in ff_imdct_calc_sse, 499 runs, 13 skips
8  253791 dezicycles in ff_imdct_calc_sse, 1008 runs, 16 skips
9  315684 dezicycles in ff_imdct_calc_sse, 2031 runs, 17 skips
10 344689 dezicycles in ff_imdct_calc_sse, 4078 runs, 18 skips
11 357826 dezicycles in ff_imdct_calc_sse, 8174 runs, 18 skips
12 362568 dezicycles in ff_imdct_calc_sse, 16363 runs, 21 skips
13 353112 dezicycles in ff_imdct_calc_sse, 32739 runs, 29 skips

--

1   59621 dezicycles in ff_imdct_calc_sse, 8 runs, 0 skips
2   41340 dezicycles in ff_imdct_calc_sse, 16 runs, 0 skips
3   32077 dezicycles in ff_imdct_calc_sse, 32 runs, 0 skips
4   27342 dezicycles in ff_imdct_calc_sse, 64 runs, 0 skips
5   24983 dezicycles in ff_imdct_calc_sse, 128 runs, 0 skips
6   24323 dezicycles in ff_imdct_calc_sse, 226 runs, 30 skips
7   24323 dezicycles in ff_imdct_calc_sse, 226 runs, 286 skips
8  206706 dezicycles in ff_imdct_calc_sse, 546 runs, 478 skips
9  315479 dezicycles in ff_imdct_calc_sse, 1570 runs, 478 skips
10 349916 dezicycles in ff_imdct_calc_sse, 3617 runs, 479 skips
11 364592 dezicycles in ff_imdct_calc_sse, 7712 runs, 480 skips
12 367565 dezicycles in ff_imdct_calc_sse, 15901 runs, 483 skips
13 357175 dezicycles in ff_imdct_calc_sse, 32280 runs, 488 skips
14 335467 dezicycles in ff_imdct_calc_sse, 65038 runs, 498 skips

--

1   65130 dezicycles in ff_imdct_calc_sse, 8 runs, 0 skips
2   45605 dezicycles in ff_imdct_calc_sse, 16 runs, 0 skips
3   35843 dezicycles in ff_imdct_calc_sse, 32 runs, 0 skips
4   30820 dezicycles in ff_imdct_calc_sse, 64 runs, 0 skips
5   28176 dezicycles in ff_imdct_calc_sse, 128 runs, 0 skips
6   29809 dezicycles in ff_imdct_calc_sse, 229 runs, 27 skips
7  146353 dezicycles in ff_imdct_calc_sse, 485 runs, 27 skips
8  251789 dezicycles in ff_imdct_calc_sse, 996 runs, 28 skips
9  312750 dezicycles in ff_imdct_calc_sse, 2019 runs, 29 skips
10 340715 dezicycles in ff_imdct_calc_sse, 4067 runs, 29 skips
11 354814 dezicycles in ff_imdct_calc_sse, 8160 runs, 32 skips
12 359516 dezicycles in ff_imdct_calc_sse, 16348 runs, 36 skips
13 350074 dezicycles in ff_imdct_calc_sse, 32727 runs, 41 skips
14 329592 dezicycles in ff_imdct_calc_sse, 65484 runs, 52 skips


Looks to me like Trent's patch is slower.
Bench with GCC-2.95 will follow shortly. Well, actually, when I have
time :)

Guillaume




More information about the ffmpeg-devel mailing list