[FFmpeg-devel] SH4: optimization attempts
Guennadi Liakhovetski
g.liakhovetski
Thu Jan 27 10:27:03 CET 2011
Hi all
I've been trying since some time to optimize decoding of video and audio
for some "popular" codecs on SuperH (SH4A) CPUs. The target have been vp8,
mp3, vorbis, aac, h.264, but I have also tried some others, for which I
thought, I could do somewhat better, than the present generic code. A
couple of words to the SH4A family: it has an FPU, has
multiply-and-accumulate instructions for both integers and fp, has
double-precision fp, also supports some operations with vectors of 4 fp
registers: inner product and multiplication by a 4x4 matrix - both vector
operations with 28-bit precision, has approximate sin and cos operations.
The CPU also supports parallel instruction execution. Attached are my
various attempts, of them the mp3 patch is known, I've posted a couple
iterations of it to the list, will have to do a new one... The others are
for info. Below is a table of my profiling results and optimizations
attempts. I'd be glad to hear any comments to this, further optimizations
ideas. Some of the proposed patches are generic, like the vp8, replacing
multiplication by addition in several filter functions, but, as you see
below, it didn't bring any results. Still, it might be better to switch to
those versions, because I think, on other CPUs this might make a
difference and it also looks better. Further, I'll be attending this year
FOSDEM on the first February weekend, so, would be happy to continue
discussing any of these topics there too.
Test cases: decoding of
codec profile fn % optimization time-drop, %
MP3 apply_window_mp3_c 53 SUM8_* additions 42
MULH 17
VP8 vp8_decode_frame 14 *filter*: '*' to '+' 0
vp8_h_loop_filter16_inner_c 10
vp8_h_loop_filter8uv_c 10
vp8_v_loop_filter8uv_c 10
vp8_v_loop_filter16_c 10
vp8_v_loop_filter16_inner_c 9
vp8_h_loop_filter16_c 8
Vorbis vorbis_residue_decode 25 .vector_fmul() 0
pass 16
ff_imdct_half_c 9
AAC decode_spectrum_and_dequant 14 VMUL2, VMUL4 0
ff_imdct_half_c 12
pass 11
ALS decode_var_block_data 30 multiple 15
get_bits1 16
decode_rice 14
get_unary 14
h.264 get_cabac 9
put_h264_qpel_v_lowpass 9
Thanks
Guennadi
---
Guennadi Liakhovetski, Ph.D.
Freelance Open-Source Software Developer
http://www.open-technology.de/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: aac.diff
Type: text/x-diff
Size: 299 bytes
Desc:
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20110127/440d06b3/attachment.diff>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: als.diff
Type: text/x-diff
Size: 10522 bytes
Desc:
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20110127/440d06b3/attachment-0001.diff>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dsputil.diff
Type: text/x-diff
Size: 15594 bytes
Desc:
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20110127/440d06b3/attachment-0002.diff>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vp8.diff
Type: text/x-diff
Size: 3138 bytes
Desc:
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20110127/440d06b3/attachment-0003.diff>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mp3.diff
Type: text/x-diff
Size: 1954 bytes
Desc:
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20110127/440d06b3/attachment-0004.diff>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mathops.diff
Type: text/x-diff
Size: 6867 bytes
Desc:
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20110127/440d06b3/attachment-0005.diff>
More information about the ffmpeg-devel
mailing list