[FFmpeg-devel] snow.c optimisations
Reimar Döffinger
Reimar.Doeffinger at gmx.de
Thu Dec 29 14:05:51 CET 2011
On 29 Dec 2011, at 04:07, yann.lepetitcorps at free.fr wrote:
>
> diff --git a/libavcodec/snow.c b/libavcodec/snow.c
> index 0ce9b28..4aae985 100644
> --- a/libavcodec/snow.c
> +++ b/libavcodec/snow.c
> @@ -190,16 +190,26 @@ static void mc_block(Plane *p, uint8_t *dst, const uint8_t
> *src, int stride, int
> tmp2= tmp2t[1];
>
> if(b&2){
> +
> + int s_1 = (HTAPS_MAX/2-4)*stride;
> + int s0 = (HTAPS_MAX/2-3)*stride;
> + int s1 = (HTAPS_MAX/2-2)*stride;
> + int s2 = (HTAPS_MAX/2-1)*stride;
> + int s3 = (HTAPS_MAX/2-0)*stride;
> + int s4 = (HTAPS_MAX/2+1)*stride;
> + int s5 = (HTAPS_MAX/2+2)*stride;
> + int s6 = (HTAPS_MAX/2+3)*stride;
That does not even remotely fit into the register set on x86 and since the multiplication is with a constant probably significantly slower.
> const unsigned color = block->color[plane_index];
> const unsigned color4 = color*0x01010101;
> if(b_w==32){
> - for(y=0; y < b_h; y++){
> - *(uint32_t*)&dst[0 + y*stride]= color4;
> - *(uint32_t*)&dst[4 + y*stride]= color4;
> - *(uint32_t*)&dst[8 + y*stride]= color4;
> - *(uint32_t*)&dst[12+ y*stride]= color4;
> - *(uint32_t*)&dst[16+ y*stride]= color4;
> - *(uint32_t*)&dst[20+ y*stride]= color4;
> - *(uint32_t*)&dst[24+ y*stride]= color4;
> - *(uint32_t*)&dst[28+ y*stride]= color4;
> + for(y=0; y < b_h; y++, dst += stride){
> + memset(dst,color4, 32);
Using color4 with memset makes for rather confusing code IMO.
Also relying on the compiler inlining a suitably optimized variant of memset in performance-critical code is at least risky.
More information about the ffmpeg-devel
mailing list