[FFmpeg-devel] [PATCH] SSE2 Xvid idct
Alexander Strange
astrange
Sun Apr 6 18:41:03 CEST 2008
On Apr 6, 2008, at 8:58 AM, Michael Niedermayer wrote:
> On Sun, Apr 06, 2008 at 12:19:58AM -0400, Alexander Strange wrote:
>> This adds skal's sse2 idct and uses it as the xvid idct when
>> available.
>>
>> I merged two shuffles into the permutation and changed the zero-
>> skipping
>> some - it's fastest in MMX and not really worth doing for the first
>> three
>> rows. Their right halfs are still usually all zero, but adding the
>> branch
>> to check for it is a net loss. The best thing for speed would be
>> switching
>> IDCTs by counting the last nonzero coefficient position, but that's
>> something for later.
>>
>> xvididctheader - makes a new header so I don't add any more extern
>> declarations in .c files.
>> sse2-permute - the new permutation; it might not have a specific
>> enough
>> name, but it should work as well for simpleidct as this if I can
>> get back
>> to that.
>> sse2-xvid-idct.diff + idct_sse2_xvid.c - the IDCT
>
> Can you also post dct-test -i 0/1/2 output please!
All the same as xvidmmx (also checked by comparing actual clips):
0
-98 -125 -197 -115 -216 -104 -140 -105
108 117 137 123 107 94 105 109
199 168 110 113 135 74 85 110
-156 -94 -87 -102 -91 -94 -73 -97
-204 -91 -110 -98 -98 -95 -81 -109
114 114 84 104 77 120 117 94
150 125 110 102 115 119 83 99
-128 -91 -78 -87 -83 -81 -95 -126
IDCT XVID-MMX2: err_inf=1 err2=0.00919531 syserr=0.01080000 maxout=260
blockSumErr=5
IDCT XVID-MMX2: 6672.4 kdct/s
-98 -125 -197 -115 -216 -104 -140 -105
108 117 137 123 107 94 105 109
199 168 110 113 135 74 85 110
-156 -94 -87 -102 -91 -94 -73 -97
-204 -91 -110 -98 -98 -95 -81 -109
114 114 84 104 77 120 117 94
150 125 110 102 115 119 83 99
-128 -91 -78 -87 -83 -81 -95 -126
IDCT XVID-SSE2: err_inf=1 err2=0.00919531 syserr=0.01080000 maxout=260
blockSumErr=5
IDCT XVID-SSE2: 7549.3 kdct/s
-
1
318 380 372 344 346 388 356 349
116 152 159 136 127 150 129 137
147 147 144 153 144 149 154 138
-143 -149 -124 -118 -143 -155 -150 -148
-139 -137 -119 -135 -119 -163 -138 -134
211 189 265 233 187 209 206 215
231 208 212 267 193 221 230 213
15 16 63 54 35 51 46 61
IDCT XVID-MMX2: err_inf=1 err2=0.01372969 syserr=0.01940000 maxout=241
blockSumErr=48
IDCT XVID-MMX2: 6665.2 kdct/s
318 380 372 344 346 388 356 349
116 152 159 136 127 150 129 137
147 147 144 153 144 149 154 138
-143 -149 -124 -118 -143 -155 -150 -148
-139 -137 -119 -135 -119 -163 -138 -134
211 189 265 233 187 209 206 215
231 208 212 267 193 221 230 213
15 16 63 54 35 51 46 61
IDCT XVID-SSE2: err_inf=1 err2=0.01372969 syserr=0.01940000 maxout=241
blockSumErr=48
IDCT XVID-SSE2: 9357.7 kdct/s
-
2
0 2474 0 0 0 0 0 2474
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 -2478 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
2474 0 0 0 0 0 0 0
IDCT XVID-MMX2: err_inf=1 err2=0.00773437 syserr=0.12390000 maxout=256
blockSumErr=3
IDCT XVID-MMX2: 6683.2 kdct/s
0 2474 0 0 0 0 0 2474
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 -2478 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
2474 0 0 0 0 0 0 0
IDCT XVID-SSE2: err_inf=1 err2=0.00773437 syserr=0.12390000 maxout=256
blockSumErr=3
IDCT XVID-SSE2: 9362.3 kdct/s
More information about the ffmpeg-devel
mailing list