[FFmpeg-devel] 4xm idct computation
yann.lepetitcorps at free.fr
yann.lepetitcorps at free.fr
Thu Dec 29 02:42:28 CET 2011
> > Perhaps that the y*stride can to be factorised into the
> ff_snow_pred_block()
> > func because this was very redundant ?
> > (the same thing with the y*src_stride into ff_snow_inner_add_yblock() )
>
>
> Like this, I see now that the copy of color4 can be make by blocs :)
>
>
> diff --git a/libavcodec/snow.c b/libavcodec/snow.c
> index 0ce9b28..432d1d4 100644
> --- a/libavcodec/snow.c
> +++ b/libavcodec/snow.c
> @@ -288,32 +288,33 @@ static void mc_block(Plane *p, uint8_t *dst, const
> uint8_t
> *src, int stride, int
> }
>
> void ff_snow_pred_block(SnowContext *s, uint8_t *dst, uint8_t *tmp, int
> stride,
> int sx, int sy, int b_w, int b_h, BlockNode *block, int plane_index, int w,
> int
> h){
> +
> if(block->type & BLOCK_INTRA){
> int x, y;
> const unsigned color = block->color[plane_index];
> const unsigned color4 = color*0x01010101;
> if(b_w==32){
> - for(y=0; y < b_h; y++){
> - *(uint32_t*)&dst[0 + y*stride]= color4;
> - *(uint32_t*)&dst[4 + y*stride]= color4;
> - *(uint32_t*)&dst[8 + y*stride]= color4;
> - *(uint32_t*)&dst[12+ y*stride]= color4;
> - *(uint32_t*)&dst[16+ y*stride]= color4;
> - *(uint32_t*)&dst[20+ y*stride]= color4;
> - *(uint32_t*)&dst[24+ y*stride]= color4;
> - *(uint32_t*)&dst[28+ y*stride]= color4;
> + for(y=0; y < b_h; y++, dst += stride){
> + *(uint32_t*)&dst[0]= color4;
> + *(uint32_t*)&dst[4]= color4;
> + *(uint32_t*)&dst[8]= color4;
> + *(uint32_t*)&dst[12]= color4;
> + *(uint32_t*)&dst[16]= color4;
> + *(uint32_t*)&dst[20]= color4;
> + *(uint32_t*)&dst[24]= color4;
> + *(uint32_t*)&dst[28]= color4;
> }
> }else if(b_w==16){
> - for(y=0; y < b_h; y++){
> - *(uint32_t*)&dst[0 + y*stride]= color4;
> - *(uint32_t*)&dst[4 + y*stride]= color4;
> - *(uint32_t*)&dst[8 + y*stride]= color4;
> - *(uint32_t*)&dst[12+ y*stride]= color4;
> + for(y=0; y < b_h; y++, dst += stride){
> + *(uint32_t*)&dst[0]= color4;
> + *(uint32_t*)&dst[4]= color4;
> + *(uint32_t*)&dst[8]= color4;
> + *(uint32_t*)&dst[12]= color4;
> }
> }else if(b_w==8){
> - for(y=0; y < b_h; y++){
> - *(uint32_t*)&dst[0 + y*stride]= color4;
> - *(uint32_t*)&dst[4 + y*stride]= color4;
> + for(y=0; y < b_h; y++, dst += stride){
> + *(uint32_t*)&dst[0]= color4;
> + *(uint32_t*)&dst[4]= color4;
> }
> }else if(b_w==4){
> for(y=0; y < b_h; y++){
We can too add this little optimisation :
@@ -321,8 +322,9 @@ void ff_snow_pred_block(SnowContext *s, uint8_t *dst,
uint8_t *tmp, int stride,
}
}else{
for(y=0; y < b_h; y++){
+ ystride = y * stride;
for(x=0; x < b_w; x++){
- dst[x + y*stride]= color;
+ dst[x + ystride]= color;
But I see now than only the BLOC_INTRA type has been modified ...
=> and the other bloc, that use mc_block(), seem to be the more important part
to be optimised :(
==> that is why it seemed too simple :)
@+
Yannoo
More information about the ffmpeg-devel
mailing list