[Ffmpeg-devel] H.264 encoder
Michael Niedermayer
michaelni
Fri Jun 23 19:00:36 CEST 2006
Hi
On Fri, Jun 23, 2006 at 03:58:24PM +0200, Panagiotis Issaris wrote:
> Hi,
>
> We've just put a new patch on our homepage which addresses
> the remarks made.
>
> Direct link to the patch:
> http://research.edm.uhasselt.be/~h264/edm-20060622T170534-ffmpeg-h264.diff
>
> Michael Niedermayer wrote:
>
> >>>> ...
> >>>> We felt that now was a good time to ask for some additional feedback.
> >>
> >>
> >> it contains a few pretty large tables which could maybe be generated at
> >> runtime ...
> >
>
> Fixed. Except for a table which contains statistically
> gathered data on the relation between the SAE and the resulting
> encoded size for a certain block.
replace by an approximation (someting which fits in <80 chars of source),
your values seem like they are overtrained to some data, thats not usefull
for any other data
[...]
> >> code duplication, ive not checked too carefully but loop filter & motion
> >> compensation at least appear to be duplicated from the h264 decoder
> >
>
> We didn't find a way to easily reuse the loop filter from the decoder, which
> is why we wrote the routine ourselves. The motion compensation code in
hmm, where is the problem?
> libavcodec depends on the MpegEncoderContext, which we're not using. For
> this
> reason, we wrote a simple motion compensation scheme ourselves, using the
> optimized DspContext functions.
hmm, i dont remember any h264 MC functions which depend on MpegEncContext
[...]
> I'm pretty sure our codec is simpler then x264, if only for the simple
> reason
> that it is _much_ smaller in codesize and number of features as we only
> started working on it a few months ago :)
well, is that true even when its compared against old x264 versions?
>
> What specifically needs to be done now to get the patch integrated in
> libavcodec?
alot, the code looks somewhat messy, more details are below,
also note that if there where many similar issues i generally
mentioned just one but for acceptance all need to be fixed
[...]
> diff -Naur ff/libavcodec/dsputil.c ff-ours/libavcodec/dsputil.c
> --- ff/libavcodec/dsputil.c 2006-05-30 07:44:22.000000000 +0200
> +++ ff-ours/libavcodec/dsputil.c 2006-06-21 16:55:41.000000000 +0200
> @@ -3769,6 +3769,243 @@
> dest[0] = cm[dest[0] + ((block[0] + 4)>>3)];
> }
>
> +const int16_t MF00[6] = {13107, 11916, 10082, 9362, 8192, 7282};
> +const int16_t V00[6] = {10*16, 11*16, 13*16, 14*16, 16*16, 18*16};
these exist several times
and for non static variables their names are not that good
also maybe you could add a h264dsp.c for h264 specific code (not generic 4x4
block compare or so, just the really h264 specific stuff) dsputil.c is
already quite huge ...
> +
> +// we'll always work with transposed input blocks, to avoid having to make a distinction between
> +// C and mmx implementations
> +void ff_h264_transform_dct_quant_c(int16_t block[4][4], int QP, int dontscaleDC) // y,x indexing
> +{
> + static const int16_t MF[6][4][4] =
> + {
> + { { 13107, 8066, 13107, 8066}, { 8066, 5243, 8066, 5243}, { 13107, 8066, 13107, 8066}, { 8066, 5243, 8066, 5243} },
> + { { 11916, 7490, 11916, 7490}, { 7490, 4660, 7490, 4660}, { 11916, 7490, 11916, 7490}, { 7490, 4660, 7490, 4660} },
> + { { 10082, 6554, 10082, 6554}, { 6554, 4194, 6554, 4194}, { 10082, 6554, 10082, 6554}, { 6554, 4194, 6554, 4194} },
> + { { 9362, 5825, 9362, 5825}, { 5825, 3647, 5825, 3647}, { 9362, 5825, 9362, 5825}, { 5825, 3647, 5825, 3647} },
> + { { 8192, 5243, 8192, 5243}, { 5243, 3355, 5243, 3355}, { 8192, 5243, 8192, 5243}, { 5243, 3355, 5243, 3355} },
> + { { 7282, 4559, 7282, 4559}, { 4559, 2893, 4559, 2893}, { 7282, 4559, 7282, 4559}, { 4559, 2893, 4559, 2893} }
> + };
> + int32_t qbits = 15 + QP/6;
> + int32_t f = (1<<qbits)/3;
> + int mod = QP%6;
maybe the unused quantize_c() from h264.c could be used or merged?
[...]
> +void ff_h264_transform_inverse_quant_dct_add_c(int16_t block[4][4], int QP, int dontscaleDC, uint8_t *dst, int stride) // y,x indexing
> +{
> + static const int16_t V[6][4][4] =
> + {
> + { { 10*16, 13*16, 10*16, 13*16}, { 13*16, 16*16, 13*16, 16*16}, { 10*16, 13*16, 10*16, 13*16}, { 13*16, 16*16, 13*16, 16*16} },
> + { { 11*16, 14*16, 11*16, 14*16}, { 14*16, 18*16, 14*16, 18*16}, { 11*16, 14*16, 11*16, 14*16}, { 14*16, 18*16, 14*16, 18*16} },
> + { { 13*16, 16*16, 13*16, 16*16}, { 16*16, 20*16, 16*16, 20*16}, { 13*16, 16*16, 13*16, 16*16}, { 16*16, 20*16, 16*16, 20*16} },
> + { { 14*16, 18*16, 14*16, 18*16}, { 18*16, 23*16, 18*16, 23*16}, { 14*16, 18*16, 14*16, 18*16}, { 18*16, 23*16, 18*16, 23*16} },
> + { { 16*16, 20*16, 16*16, 20*16}, { 20*16, 25*16, 20*16, 25*16}, { 16*16, 20*16, 16*16, 20*16}, { 20*16, 25*16, 20*16, 25*16} },
> + { { 18*16, 23*16, 18*16, 23*16}, { 23*16, 29*16, 23*16, 29*16}, { 18*16, 23*16, 18*16, 23*16}, { 23*16, 29*16, 23*16, 29*16} }
> + };
> + DCTELEM elem[4][4];
> + int mod = QP%6;
> +
> + if (QP >= 24)
> + {
> + int shift = QP/6-4;
> +
> + if (dontscaleDC)
> + elem[0][0] = block[0][0];
> + else
> + elem[0][0] = ((int32_t)block[0][0]*V[mod][0][0]) << shift;
> +
> + elem[0][1] = ((int32_t)block[0][1]*V[mod][0][1]) << shift;
> + elem[0][2] = ((int32_t)block[0][2]*V[mod][0][2]) << shift;
> + elem[0][3] = ((int32_t)block[0][3]*V[mod][0][3]) << shift;
> +
> + elem[1][0] = ((int32_t)block[1][0]*V[mod][1][0]) << shift;
> + elem[1][1] = ((int32_t)block[1][1]*V[mod][1][1]) << shift;
> + elem[1][2] = ((int32_t)block[1][2]*V[mod][1][2]) << shift;
> + elem[1][3] = ((int32_t)block[1][3]*V[mod][1][3]) << shift;
> +
> + elem[2][0] = ((int32_t)block[2][0]*V[mod][2][0]) << shift;
> + elem[2][1] = ((int32_t)block[2][1]*V[mod][2][1]) << shift;
> + elem[2][2] = ((int32_t)block[2][2]*V[mod][2][2]) << shift;
> + elem[2][3] = ((int32_t)block[2][3]*V[mod][2][3]) << shift;
> +
> + elem[3][0] = ((int32_t)block[3][0]*V[mod][3][0]) << shift;
> + elem[3][1] = ((int32_t)block[3][1]*V[mod][3][1]) << shift;
> + elem[3][2] = ((int32_t)block[3][2]*V[mod][3][2]) << shift;
> + elem[3][3] = ((int32_t)block[3][3]*V[mod][3][3]) << shift;
maybe the init_dequant* stuff from h264.c could be used?
[...]
> +/**
> + * |ZD(i,j)| = (|YD(i,j)| MF(0,0) + 2 f) >> (qbits + 1)
> + *
> + */
> +void ff_h264_hadamard_quant_4x4_c(DCTELEM Y[4][4], int QP)
> +{
> + int qbits = 15 + QP/6;
> + int f2 = ((1 << qbits) / 3)*2;
> + int shift = (qbits + 1);
> + int mod = QP%6;
% and / are slow, they could be done with tables assuming the whole isnt
replaced by a table
> +
> + int32_t MF = MF00[mod];
> +
> + Y[0][0] = (((ABS(Y[0][0])>>1) * MF + f2) >> shift)*(1-(((Y[0][0])>>14)&2));
shift could be made a constant and added into MF
to change the sign based on the sign of another number you can use:
(a ^ (b>>31)) - (b>>31)
> + Y[0][1] = (((ABS(Y[0][1])>>1) * MF + f2) >> shift)*(1-(((Y[0][1])>>14)&2));
> + Y[0][2] = (((ABS(Y[0][2])>>1) * MF + f2) >> shift)*(1-(((Y[0][2])>>14)&2));
> + Y[0][3] = (((ABS(Y[0][3])>>1) * MF + f2) >> shift)*(1-(((Y[0][3])>>14)&2));
> +
> + Y[1][0] = (((ABS(Y[1][0])>>1) * MF + f2) >> shift)*(1-(((Y[1][0])>>14)&2));
> + Y[1][1] = (((ABS(Y[1][1])>>1) * MF + f2) >> shift)*(1-(((Y[1][1])>>14)&2));
> + Y[1][2] = (((ABS(Y[1][2])>>1) * MF + f2) >> shift)*(1-(((Y[1][2])>>14)&2));
> + Y[1][3] = (((ABS(Y[1][3])>>1) * MF + f2) >> shift)*(1-(((Y[1][3])>>14)&2));
this stuff should eithetr be a loop or a macro or whatever, but not 16 times
near duplicated
[...]
> +//#define DEBUG_H264CAVLC
> +#define H264_CAVLC_CLIPTABLE_SIZE 65536
> +#define H264_CAVLC_CLIPTABLE_OFFSET 32768
> +
> +static int length_table[7][4095];
> +static int code_table[7][4095];
> +int16_t ff_h264_cavlc_cliptable[H264_CAVLC_CLIPTABLE_SIZE];
> +
> +void h264cavlc_generate_cliptable()
> +{
> + int value;
> + for(value=-32768; value<32768; value++)
> + {
> + if (value < -2047)
> + ff_h264_cavlc_cliptable[value + H264_CAVLC_CLIPTABLE_OFFSET] = -2047;
> + else if (value > 2047)
> + ff_h264_cavlc_cliptable[value + H264_CAVLC_CLIPTABLE_OFFSET] = 2047;
> + else
> + ff_h264_cavlc_cliptable[value + H264_CAVLC_CLIPTABLE_OFFSET] = value;
> + }
is it really faster to use that table instead of if(x+2047 > 4094U) x= ...
and cliping is bad, VERY bad, you should change the mb type, qp or whatever
if you cannot encode a coefficient, of coarse if you encode several MB types
and choose the one which has the best rate+distortion at the end then cliping
isnt an issue
> +}
> +
> +void h264cavlc_generate_tables()
> +{
> + int vlcnum, level;
> + for (vlcnum=0; vlcnum<7; vlcnum++)
> + {
> + for(level=-2047; level<2048; level++)
> + {
> + if (vlcnum == 0)
> + {
> + int levabs, sign;
> + int len, inf;
> +
> + if (level < 0)
> + {
> + sign = 1;
> + levabs = -level;
> + }
> + else
> + {
> + sign = 0;
> + levabs = level;
> + }
int sign= level < 0;
int levabs= ABS(level);
[...]
> +static inline int h264cavlc_get_lookup_table(int na, int nb)
> +{
> + int nc = 0;
> + int8_t lookup_table[8] = {0, 0, 1, 1, 2, 2, 2, 2};
> +
> + if (na >= 0 && nb >= 0)
> + {
> + nc = na+nb+1;
> + nc >>= 1;
> + }
> + else
> + {
> + if (na >= 0) // nB < 0
> + nc = na;
> + else if (nb >= 0) // nA < 0
> + nc = nb;
> + }
> +
> + return (nc < 8) ? lookup_table[nc] : 3;
> +}
pred_non_zero_count() could be used for the first part
[...]
> +
> + // Encode the trailing one sign bits
> +
> + for (i = total_coeffs-1, t = trailing_ones ; t > 0 ; i--, t--)
> + {
> + if (levels[i] > 0) // +1
> + put_bits(b,1,0);
> + else // -1
> + put_bits(b,1,1);
put_bits(b,1, levels[i] <= 0);
> + }
> +
> + // Encode levels of the remaining nonzero coefficients
> +
> + if (numlevels > 0)
> + {
> + int level_two_or_higher = 1;
> + int firstlevel = 1;
> + int vlcnum;
> +
> + if (total_coeffs > 3 && trailing_ones == 3)
> + level_two_or_higher = 0;
> +
> + if (total_coeffs > 10 && trailing_ones < 3)
> + vlcnum = 1;
> + else
> + vlcnum = 0;
vlcnum = total_coeffs > 10 && trailing_ones < 3;
> +
> + for (i = numlevels-1 ; i >= 0 ; i--)
> + {
> + int16_t val = levels[i];
> + int16_t level = val; // 'level' will contain the absolute value of 'val'
> +
> + if (level < 0)
> + level = -level;
int16_t level= ABS(val);
[...]
> + for (i = total_coeffs-1 ; i > 0 && total_zeros > 0 ; i--)
> + {
> + int runbefore = zeros[i];
> + int vlcnum = total_zeros-1;
> + if (vlcnum > 6)
> + vlcnum = 6;
int vlcnum= FFMIN(total_zeros-1, 6);
> +/* NAL unit types */
> +#define NAL_SLICE 1
> +#define NAL_DPA 2
> +#define NAL_DPB 3
> +#define NAL_DPC 4
> +#define NAL_IDR_SLICE 5
> +#define NAL_SEI 6
> +#define NAL_SPS 7
> +#define NAL_PPS 8
> +#define NAL_AUD 9
> +#define NAL_END_SEQUENCE 10
> +#define NAL_END_STREAM 11
> +#define NAL_FILLER_DATA 12
> +#define NAL_SPS_EXT 13
> +#define NAL_AUXILIARY_SLICE 19
> +
maybe an enum would be cleaner, maybe not dunno
[...]
> +#define H264_CAVLC_CLIPTABLE_SIZE 65536
> +#define H264_CAVLC_CLIPTABLE_OFFSET 32768
duplicated
[...]
> +#define H264_COPY_4X4BLOCK_TRANSPOSED(xoffset,yoffset,dest,src1,src2) \
> +{ \
> + dest[0][0] = src1[yoffset+0][xoffset+0]-src2[yoffset+0][xoffset+0]; \
> + dest[1][0] = src1[yoffset+0][xoffset+1]-src2[yoffset+0][xoffset+1]; \
> + dest[2][0] = src1[yoffset+0][xoffset+2]-src2[yoffset+0][xoffset+2]; \
> + dest[3][0] = src1[yoffset+0][xoffset+3]-src2[yoffset+0][xoffset+3]; \
> + dest[0][1] = src1[yoffset+1][xoffset+0]-src2[yoffset+1][xoffset+0]; \
> + dest[1][1] = src1[yoffset+1][xoffset+1]-src2[yoffset+1][xoffset+1]; \
> + dest[2][1] = src1[yoffset+1][xoffset+2]-src2[yoffset+1][xoffset+2]; \
> + dest[3][1] = src1[yoffset+1][xoffset+3]-src2[yoffset+1][xoffset+3]; \
> + dest[0][2] = src1[yoffset+2][xoffset+0]-src2[yoffset+2][xoffset+0]; \
> + dest[1][2] = src1[yoffset+2][xoffset+1]-src2[yoffset+2][xoffset+1]; \
> + dest[2][2] = src1[yoffset+2][xoffset+2]-src2[yoffset+2][xoffset+2]; \
> + dest[3][2] = src1[yoffset+2][xoffset+3]-src2[yoffset+2][xoffset+3]; \
> + dest[0][3] = src1[yoffset+3][xoffset+0]-src2[yoffset+3][xoffset+0]; \
> + dest[1][3] = src1[yoffset+3][xoffset+1]-src2[yoffset+3][xoffset+1]; \
> + dest[2][3] = src1[yoffset+3][xoffset+2]-src2[yoffset+3][xoffset+2]; \
> + dest[3][3] = src1[yoffset+3][xoffset+3]-src2[yoffset+3][xoffset+3]; \
use loop or macros please to simplify that (5 lines for a line copy and
4 line copys is better then 16 element copies)
[...]
> +#ifdef H264_DEBUG_WRITE_DECODED_IMAGE
> +static void ff_h264_append_image(uint8_t *data, AVCodecContext *avctx)
> +{
> + int f = open("/tmp/teststream.yuv",O_CREAT|O_WRONLY|O_APPEND,S_IRUSR|S_IWUSR);
> + write(f,data,avctx->width*avctx->height*3/2);
> + close(f);
write() and close() is not ok in a codec
[...]
> + for (i = 0 ; i < rbsplen ; i++)
> + {
> + if (i + 2 < rbsplen && (rbsp[i] == 0 && rbsp[i+1] == 0 && rbsp[i+2] < 4))
> + {
> + dest[destpos++] = rbsp[i++];
> + dest[destpos++] = rbsp[i];
> + dest[destpos++] = 0x03; // emulation prevention byte
> + }
> + else
> + dest[destpos++] = rbsp[i];
> + }
instead of a loop which checks every byte, you could check just every 2nd
also see encode_nal() un h264.c maybe you can use some parts from that
[...]
> +
> +// inblock is transposed, outblock isn't
> +void ff_h264_dct_c(DCTELEM inblock[4][4],DCTELEM outblock[4][4])
> +{
> + DCTELEM pieces[4][4];
> +
> + pieces[0][0] = inblock[0][0]+inblock[1][0]+inblock[2][0]+inblock[3][0];
> + pieces[0][1] = inblock[0][1]+inblock[1][1]+inblock[2][1]+inblock[3][1];
> + pieces[0][2] = inblock[0][2]+inblock[1][2]+inblock[2][2]+inblock[3][2];
> + pieces[0][3] = inblock[0][3]+inblock[1][3]+inblock[2][3]+inblock[3][3];
> +
> + pieces[1][0] = (inblock[0][0]<<1)+inblock[1][0]-inblock[2][0]-(inblock[3][0]<<1);
> + pieces[1][1] = (inblock[0][1]<<1)+inblock[1][1]-inblock[2][1]-(inblock[3][1]<<1);
> + pieces[1][2] = (inblock[0][2]<<1)+inblock[1][2]-inblock[2][2]-(inblock[3][2]<<1);
> + pieces[1][3] = (inblock[0][3]<<1)+inblock[1][3]-inblock[2][3]-(inblock[3][3]<<1);
> +
> + pieces[2][0] = inblock[0][0]-inblock[1][0]-inblock[2][0]+inblock[3][0];
> + pieces[2][1] = inblock[0][1]-inblock[1][1]-inblock[2][1]+inblock[3][1];
> + pieces[2][2] = inblock[0][2]-inblock[1][2]-inblock[2][2]+inblock[3][2];
> + pieces[2][3] = inblock[0][3]-inblock[1][3]-inblock[2][3]+inblock[3][3];
> +
> + pieces[3][0] = inblock[0][0]-(inblock[1][0]<<1)+(inblock[2][0]<<1)-inblock[3][0];
> + pieces[3][1] = inblock[0][1]-(inblock[1][1]<<1)+(inblock[2][1]<<1)-inblock[3][1];
> + pieces[3][2] = inblock[0][2]-(inblock[1][2]<<1)+(inblock[2][2]<<1)-inblock[3][2];
> + pieces[3][3] = inblock[0][3]-(inblock[1][3]<<1)+(inblock[2][3]<<1)-inblock[3][3];
> +
> + outblock[0][0] = pieces[0][0]+pieces[0][1]+pieces[0][2]+pieces[0][3];
> + outblock[0][1] = pieces[1][0]+pieces[1][1]+pieces[1][2]+pieces[1][3];
> + outblock[0][2] = pieces[2][0]+pieces[2][1]+pieces[2][2]+pieces[2][3];
> + outblock[0][3] = pieces[3][0]+pieces[3][1]+pieces[3][2]+pieces[3][3];
> +
> + outblock[1][0] = (pieces[0][0] << 1)+pieces[0][1]-pieces[0][2]-(pieces[0][3]<<1);
> + outblock[1][1] = (pieces[1][0] << 1)+pieces[1][1]-pieces[1][2]-(pieces[1][3]<<1);
> + outblock[1][2] = (pieces[2][0] << 1)+pieces[2][1]-pieces[2][2]-(pieces[2][3]<<1);
> + outblock[1][3] = (pieces[3][0] << 1)+pieces[3][1]-pieces[3][2]-(pieces[3][3]<<1);
> +
> + outblock[2][0] = pieces[0][0]-pieces[0][1]-pieces[0][2]+pieces[0][3];
> + outblock[2][1] = pieces[1][0]-pieces[1][1]-pieces[1][2]+pieces[1][3];
> + outblock[2][2] = pieces[2][0]-pieces[2][1]-pieces[2][2]+pieces[2][3];
> + outblock[2][3] = pieces[3][0]-pieces[3][1]-pieces[3][2]+pieces[3][3];
> +
> + outblock[3][0] = pieces[0][0]-(pieces[0][1]<<1)+(pieces[0][2]<<1)-pieces[0][3];
> + outblock[3][1] = pieces[1][0]-(pieces[1][1]<<1)+(pieces[1][2]<<1)-pieces[1][3];
> + outblock[3][2] = pieces[2][0]-(pieces[2][1]<<1)+(pieces[2][2]<<1)-pieces[2][3];
> + outblock[3][3] = pieces[3][0]-(pieces[3][1]<<1)+(pieces[3][2]<<1)-pieces[3][3];
> +}
h264.c h264_diff_dct_c() looks so much nicer, is this faster?
[...]
> +
> +static inline void ff_h264_hadamard_invquant_2x2(int16_t Y[2][2], int QP)
> +{
> + int32_t V = V00[QP%6];
> + int div = QP/6;
> +
> + Y[0][0] = ((Y[0][0]*V) << div) >> 5;
> + Y[0][1] = ((Y[0][1]*V) << div) >> 5;
> + Y[1][0] = ((Y[1][0]*V) << div) >> 5;
> + Y[1][1] = ((Y[1][1]*V) << div) >> 5;
V <<= div; and no <<div obviously
[...]
> +static inline void ff_h264_neighbour_count_nonzero(MacroBlock *mb, int type, int x, int y, int *nA, int *nB)
> +{
> + if (type == NEIGHBOUR_SUBTYPE_Y) // Y
> + {
> + if (x == 0)
> + {
> + MacroBlock *leftmb = mb->leftblock;
> +
> + if (!leftmb)
> + *nA = -1;
> + else
> + *nA = leftmb->Y_nonzero[y][3];
> + }
> + else
> + *nA = mb->Y_nonzero[y][x-1];
> +
> + if (y == 0)
> + {
> + MacroBlock *topmb = mb->topblock;
> +
> + if (!topmb)
> + *nB = -1;
> + else
> + *nB = topmb->Y_nonzero[3][x];
> + }
> + else
> + *nB = mb->Y_nonzero[y-1][x];
> + }
> + else if (type == NEIGHBOUR_SUBTYPE_U) // U
> + {
> + if (x == 0)
> + {
> + MacroBlock *leftmb = mb->leftblock;
> +
> + if (!leftmb)
> + *nA = -1;
> + else
> + *nA = leftmb->U_nonzero[y][1];
> + }
> + else
> + *nA = mb->U_nonzero[y][x-1];
> +
> + if (y == 0)
> + {
> + MacroBlock *topmb = mb->topblock;
> +
> + if (!topmb)
> + *nB = -1;
> + else
> + *nB = topmb->U_nonzero[1][x];
> + }
> + else
> + *nB = mb->U_nonzero[y-1][x];
> + }
> + else // V
> + {
> + if (x == 0)
> + {
> + MacroBlock *leftmb = mb->leftblock;
> +
> + if (!leftmb)
> + *nA = -1;
> + else
> + *nA = leftmb->V_nonzero[y][1];
> + }
> + else
> + *nA = mb->V_nonzero[y][x-1];
> +
> + if (y == 0)
> + {
> + MacroBlock *topmb = mb->topblock;
> +
> + if (!topmb)
> + *nB = -1;
> + else
> + *nB = topmb->V_nonzero[1][x];
> + }
> + else
> + *nB = mb->V_nonzero[y-1][x];
> + }
> +}
ugh, thats not duplicate thats triplicated ;)
[...]
> + if (chromaDCcount == 0)
> + {
> + if (chromaACcount == 0)
> + CodedBlockPatternChroma = 0;
> + else
> + CodedBlockPatternChroma = 2;
> + }
> + else
> + {
> + if (chromaACcount == 0)
> + CodedBlockPatternChroma = 1;
> + else
> + CodedBlockPatternChroma = 2;
> + }
if(chromaACcount) CodedBlockPatternChroma= 2;
else CodedBlockPatternChroma= !!chromaDCcount;
[...]
> + if (CodedBlockPatternLuma > 0)
> + {
> + for (j = 0 ; j < 4 ; j++)
> + {
> + int X = (j%2)*2;
> + int Y = (j/2)*2;
% and / in more or less inner loops are not ok, especially when this is
just (j&1)*2 and j&2
[...]
> + int x,y;
> +
> + for (y = 0 ; y < 4 ; y++)
> + for (x = 0 ; x < 4 ; x++)
> + mb->Y_nonzero[y][x] = 0;
memset() ?
> + }
> +
> + if (CodedBlockPatternChroma == 0)
> + {
> + int x,y;
> +
> + for (y = 0 ; y < 2 ; y++)
> + {
> + for (x = 0 ; x < 2 ; x++)
> + {
> + mb->U_nonzero[y][x] = 0;
> + mb->V_nonzero[y][x] = 0;
again memset() should be useable here, also look at fill_rectangle() from
h264.c
> + }
> + }
> + return;
> + }
> +
> + if (CodedBlockPatternChroma != 0)
> + {
> + coefficients[0] = UD[0][0];
> + coefficients[1] = UD[0][1];
> + coefficients[2] = UD[1][0];
> + coefficients[3] = UD[1][1];
> + h264cavlc_encode(b,coefficients,4,-1,-1,1); // nA and nB are not used in this case
> +
> + coefficients[0] = VD[0][0];
> + coefficients[1] = VD[0][1];
> + coefficients[2] = VD[1][0];
> + coefficients[3] = VD[1][1];
> + h264cavlc_encode(b,coefficients,4,-1,-1,1); // nA and nB are not used in this case
dulicate -> loop or macro
[...]
> + for (y = 0 ; !done && y < 4 ; y++) for (x = 0 ; !done && x < 4 ; x++) if (residual->part4x4Y[0][0][y][x] != 0) done = 1;
> + for (y = 0 ; !done && y < 4 ; y++) for (x = 0 ; !done && x < 4 ; x++) if (residual->part4x4Y[0][1][y][x] != 0) done = 1;
> + for (y = 0 ; !done && y < 4 ; y++) for (x = 0 ; !done && x < 4 ; x++) if (residual->part4x4Y[1][0][y][x] != 0) done = 1;
> + for (y = 0 ; !done && y < 4 ; y++) for (x = 0 ; !done && x < 4 ; x++) if (residual->part4x4Y[1][1][y][x] != 0) done = 1;
find a new keyboard your enter key is broken
[...]
> +
> +#ifdef H264_ENABLE_QPEL
> +
> +#define H264_QPEL_6TAP(a,b,c,d,e,f) ((a)-5*(b)+20*(c)+20*(d)-5*(e)+(f))
> +#define H264_QPEL_AVG(a,b) (((a)+(b)+1)>>1)
> +
> +static inline void ff_h264_calc_Yqpelpixels(const uint8_t *src, int srcstride, uint8_t *dst, int dststride1,int dststride2)
> +{
> + int b1,h1,s1,m1;
> + int aa,bb,gg,hh;
> + int j1,j,b,h;
> + const uint8_t *pos = src - 2 - 2*srcstride;
> + const uint8_t *pos2 = pos + 2;
> + int s,m;
> + int a,c,d,n,f,i,k,q;
> + int e,g,p,r;
> +
> + aa = H264_QPEL_6TAP((int)pos[0],(int)pos[1],(int)pos[2],(int)pos[3],(int)pos[4],(int)pos[5]);
> + pos += srcstride;
> + bb = H264_QPEL_6TAP((int)pos[0],(int)pos[1],(int)pos[2],(int)pos[3],(int)pos[4],(int)pos[5]);
> + pos += srcstride;
> + b1 = H264_QPEL_6TAP((int)pos[0],(int)pos[1],(int)pos[2],(int)pos[3],(int)pos[4],(int)pos[5]);
> + pos += srcstride;
> + s1 = H264_QPEL_6TAP((int)pos[0],(int)pos[1],(int)pos[2],(int)pos[3],(int)pos[4],(int)pos[5]);
> + pos += srcstride;
> + gg = H264_QPEL_6TAP((int)pos[0],(int)pos[1],(int)pos[2],(int)pos[3],(int)pos[4],(int)pos[5]);
> + pos += srcstride;
> + hh = H264_QPEL_6TAP((int)pos[0],(int)pos[1],(int)pos[2],(int)pos[3],(int)pos[4],(int)pos[5]);
> + h1 = H264_QPEL_6TAP((int)pos2[0],(int)pos2[srcstride],(int)pos2[srcstride*2],(int)pos2[srcstride*3],
> + (int)pos2[srcstride*4],(int)pos2[srcstride*5]);
> + pos2++;
> + m1 = H264_QPEL_6TAP((int)pos2[0],(int)pos2[srcstride],(int)pos2[srcstride*2],(int)pos2[srcstride*3],
> + (int)pos2[srcstride*4],(int)pos2[srcstride*5]);
> +
> + j1 = H264_QPEL_6TAP(aa,bb,b1,s1,gg,hh);
> + b = clip_uint8((b1+16)>>5);
> + h = clip_uint8((h1+16)>>5);
> + j = clip_uint8((j1+512)>>10);
> + s = clip_uint8((s1+16)>>5);
> + m = clip_uint8((m1+16)>>5);
> +
> + a = H264_QPEL_AVG((int)src[0],b);
> + c = H264_QPEL_AVG((int)src[1],b);
> + d = H264_QPEL_AVG((int)src[0],h);
> + n = H264_QPEL_AVG((int)src[srcstride],h);
> + f = H264_QPEL_AVG(b,j);
> + i = H264_QPEL_AVG(h,j);
> + k = H264_QPEL_AVG(j,m);
> + q = H264_QPEL_AVG(j,s);
> +
> + e = H264_QPEL_AVG(b,h);
> + g = H264_QPEL_AVG(b,m);
> + p = H264_QPEL_AVG(h,s);
> + r = H264_QPEL_AVG(m,s);
> +
> + dst[0] = src[0];
> + dst[dststride1] = a;
> + dst[dststride1*2] = b;
> + dst[dststride1*3] = c;
> + dst += dststride2;
> + dst[0] = d;
> + dst[dststride1] = e;
> + dst[dststride1*2] = f;
> + dst[dststride1*3] = g;
> + dst += dststride2;
> + dst[0] = h;
> + dst[dststride1] = i;
> + dst[dststride1*2] = j;
> + dst[dststride1*3] = k;
> + dst += dststride2;
> + dst[0] = n;
> + dst[dststride1] = p;
> + dst[dststride1*2] = q;
> + dst[dststride1*3] = r;
> +}
this looks like it duplicates put_h264_qpel_pixels_tab or similar from dsputil
[...]
> +
> + while (!done && numsteps < MAXSEARCHSTEPS)
> + {
> + int startx = curx - SEARCHWIDTH;
> + int starty = cury - SEARCHWIDTH;
> + int stopx = curx + SEARCHWIDTH + 1;
> + int stopy = cury + SEARCHWIDTH + 1;
> + int foundbetter = 0;
> + int scanx, scany;
> +
> + if (startx < 0)
> + startx = 0;
> + if (starty < 0)
> + starty = 0;
> + if (stopx > t->refframe_width - 16 + 1)
> + stopx = t->refframe_width - 16 + 1;
> + if (stopy > t->refframe_height - 16 + 1)
> + stopy = t->refframe_height -16 + 1;
> +
> + for(scany = starty; scany < stopy; scany++)
> + {
> + for(scanx = startx; scanx < stopx; scanx++)
> + {
> + if (!(curx == scanx && cury == scany))
> + {
> + int xvec = (scanx-x)*4-pred_mvx; // it's actually this difference which will be encoded!
> + int yvec = (scany-y)*4-pred_mvy;
> + int bitsize;
> + int xmod = scanx%2;
> + int ymod = scany%2;
> + int absnum = xmod+ymod*2;
> + int sae = t->dspcontext.pix_abs[0][0](0,targetmb->Y[0],
> + refframe->reconstructed_picture.data[0] + scany * refframe->reconstructed_picture.linesize[0] + scanx,
> + refframe->reconstructed_picture.linesize[0], 16);
> +
> + sae += t->dspcontext.pix_abs[1][absnum](0,targetmb->U[0],
> + refframe->reconstructed_picture.data[1] + (scany/2) * refframe->reconstructed_picture.linesize[1] + scanx/2,
> + refframe->reconstructed_picture.linesize[1], 8);
> + sae += t->dspcontext.pix_abs[1][absnum](0,targetmb->V[0],
> + refframe->reconstructed_picture.data[2] + (scany/2) * refframe->reconstructed_picture.linesize[2] + scanx/2,
> + refframe->reconstructed_picture.linesize[2], 8);
it would be nice if the comparission function where user selectable instead
of hardcoded like in the rest of lavc too
[...]
> +static inline int ff_h264_median(int x, int y, int z)
> +{
> + return x+y+z-FFMIN(x,FFMIN(y,z))-FFMAX(x,FFMAX(y,z));
> +}
mid_pred()
> +
> +// Adjust the values of mvx and mvy based on the prediction from the neighbouring macroblocks
> +static void ff_h264_estimate_motion_vectors(MacroBlock *destmb, int *mvpred_x, int *mvpred_y, int *mvpred_x2, int *mvpred_y2)
> +{
> + int mvAx = 0, mvAy = 0;
> + int mvBx = 0, mvBy = 0;
> + int mvCx = 0, mvCy = 0;
> + int mvDx = 0, mvDy = 0;
> + int Aavail = 0;
> + int Bavail = 0;
> + int Cavail = 0;
> + int Davail = 0;
> +
> + if (destmb->leftblock != NULL && destmb->leftblock->available)
> + {
> + Aavail = 1;
> + mvAx = destmb->leftblock->mv_x;
> + mvAy = destmb->leftblock->mv_y;
> + }
> + if (destmb->topblock != NULL)
> + {
> + MacroBlock *topblock = destmb->topblock;
> +
> + if (topblock->available)
> + {
> + Bavail = 1;
> + mvBx = topblock->mv_x;
> + mvBy = topblock->mv_y;
> + }
> + if (topblock->leftblock != NULL && topblock->leftblock->available)
> + {
> + Davail = 1;
> + mvDx = topblock->leftblock->mv_x;
> + mvDy = topblock->leftblock->mv_y;
> + }
> + if (topblock->rightblock != NULL && topblock->rightblock->available)
> + {
> + Cavail = 1;
> + mvCx = topblock->rightblock->mv_x;
> + mvCy = topblock->rightblock->mv_y;
> + }
> + }
> +
> + if (!Cavail)
> + {
> + Cavail = Davail;
> + mvCx = mvDx;
> + mvCy = mvDy;
> + }
> +
> + if (!Bavail && !Cavail && Aavail)
> + {
> + mvBx = mvAx;
> + mvBy = mvAy;
> + mvCx = mvAx;
> + mvCy = mvAy;
> + }
> +
> + *mvpred_x = ff_h264_median(mvAx,mvBx,mvCx);
> + *mvpred_y = ff_h264_median(mvAy,mvBy,mvCy);
> +
> + if (!Aavail || !Bavail || (Aavail && mvAx == 0 && mvAy == 0) || (Bavail && mvBx == 0 && mvBy == 0))
> + {
> + *mvpred_x2 = 0;
> + *mvpred_y2 = 0;
> + }
> + else
> + {
> + *mvpred_x2 = *mvpred_x;
> + *mvpred_y2 = *mvpred_y;
> + }
> +}
this looks like its a duplicate of h264.c pred_motion()
[...]
> +
> +#define H264_CAVLC_CLIPTABLE_SIZE 65536
i think ive seen this at least twice already ...
[...]
> +static const int sae_mapping[SAE_ENCODED_SIZE_SIZE] = [ ... one 30kb long line snipped ... ]
> +
uhm
[...]
> +
> +/**
> + * Can contain pointers to the relevant starting points in a picture
> + */
> +typedef struct _MacroBlock
stuff begining with _ is reserved in C ...
[...]
[not reviewed mmx code]
> +/**
> + * |ZD(i,j)| = (|YD(i,j)| MF(0,0) + 2 f) >> (qbits + 1)
> + *
> + */
> +void ff_h264_hadamard_quant_2x2_mmx(int16_t Y[2][2], int QP)
> +{
> + int qbits = 15 + QP/6;
> + int f2 = ((1 << qbits) / 3)*2;
> + int shift = qbits+1;
> + int32_t shift1[1];
> + int32_t f22[2];
> + int16_t MF = MF00[QP%6];
> + int16_t MF2[4];
> + MF2[0] = MF;
> + MF2[1] = MF;
> + MF2[2] = MF;
> + MF2[3] = MF;
int16_t MF2[4]= {MF,MF,MF,MF}; is cleaner IMHO
[...]
also remove tabs and trailing whitespace, they arent allowed in ffmpeg svn
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
In the past you could go to a library and read, borrow or copy any book
Today you'd get arrested for mere telling someone where the library is
More information about the ffmpeg-devel
mailing list