[Ffmpeg-devel] few remarks for h264 decoder

Loren Merritt lorenm
Sat Dec 31 01:00:14 CET 2005

On Fri, 30 Dec 2005, G?bor Kov?cs wrote:

> First of all sorry for not making patches or more detailed descriptions, but 
> I don't
> have more time for these problems and I'am not depending on ffmpeg's h264 
> decoding accuracy. Here are the isseus I think(!) I found:
> 1.
> pred_direct_motion() uses 8bit width fill_rectangle() with ref[0],ref[1] 
> which can be negative numbers causing trouble with val*0x0101 in 
> fill_rectangle()


> 2.
> ff_h264_weight_WxH_mmx2() can overflow. "paddw" should be "paddsw"
> example log2_denom = 6, offset = -120<<log2_denom, weight = -120
> 255 * weight + offset = -38280 (overflows mmx 16bit)
> Probably ff_h264_biweight_WxH_mmx2 is effected as well, but I'am not sure 
> "paddsw" doesn't cause trouble because of the three components added 
> together...
> In most cases b-frames use implicit weights anyway (which shouldn't overflow)

fixed. Yes, it's possible to add the 3 components without intermediate 

> 3.
> pred_direct_motion() fills ref_cache[list][scan8[4]] and 
> ref_cache[list][scan8[12]] before they are used by pred_motion(). They should 
> be PART_NOT_AVAILABLE until pred_motion() finishes.

Nothing uses the mvs in between. Maybe it's faster to avoid interpolating 
direct mvs in 8x8 blocks that aren't direct mode, but otherwise I won't 
change it.

> 4.
> Probably a known problem, but b-frame deblocking bs[] calculation based on 
> mv_cache[] condition is far from standard. I guess it's because b-frames are 
> mostly non reference frames so minor deblocking bug is not a big problem.

Explain. The only known problems are noted as /*FIXME*/.

> There is also a possible problem if the last slice is B_TYPE and it's used in 
> the upper MB row of the next P_TYPE slice for vertical edge deblocking.

Ok, I've never encountered any videos where multiple slices in one frame 
were different types.

> 5.
> I didn't check it throughly but mmx h264_qpel4_hv_lowpass macro uses
> approximation (comment sais ((a-b)/4-b)/4). I know a proper solution
> needs 32bit range, but this is why God created "pmaddwd" :) I think the
> performance loss needed for accurate calculation would be minimal.

> input:  abccba
> what it does: (((a-b)/4-b)/4+c+32)>>6
> reformulated (not bothering for a moment with 16bit limit)
> ((a-b)-b*4+c*16+32*16)>>(6+4)
> (a-5*b+c*16+512)>>10
> what it should be:
> (a-5*b+c*20+512)>>10
> So there are two problems. One is the factor of c (16 vs 20). Other is 
> the loss of precision (lower bits) using the /4 right shifts.

The code is right, the comment was wrong.
I have not performed a detailed error analysis of the loss in precision, 
but since it does not appear to introduce any deviation from JM, I have 
to assume that those lsbs don't actually matter.

> 6.
> direct_ref_list_init() saves the poc of reference frames in the current 
> Picture.
> But what happens if the current frame is decoded with more slices that uses
> decode_ref_pic_list_reordering() to reorder the default reference list or 
> maybe even have different ref_count[]?

Ok, if I ever encounter a file that does that, I'll fix it. This could get 
complicated if you mix spatial and temporal direct modes.

> 7.
> fill_default_ref_list() should be recalled when multiple b-slices have 
> different h->ref_count[]. Other solution is to remove the "index < 
> h->ref_count[list]" condition from fill_default_ref_list's b-frame loop.

fill_default_ref_list() is recalled at the beginning of every slice, 
whether or not anything has changed.

> 8.
> There is a problem with pred_direct_motion() when direct_8x8_inference_flag=1
> In this case the far corners of the macroblock should be used for each 8x8 
> block.
> Currently:
> XoXo
> oooo.
> XoXo
> oooo
> What should be:
> XooX
> oooo
> oooo
> XooX
> Actually the current code calculates all the 16 motion vectors and only 
> sub_mb_type flag tells hl_motion() to use 8x8 blocks. In theory the non used 
> motion vectors still can have invalid values and cause trouble.

direct_8x8_inference_flag=1 guarantees that the 4 mvs in a 8x8 block 
will have the same value, so it doesn't matter which one we use. And they 
are all always valid, unless that ref list wasn't used, in which case none 
are valid.

--Loren Merritt

More information about the ffmpeg-devel mailing list