[FFmpeg-devel] [PATCH 2/5] truehd: break out part of rematrix_channels into platform-specific callback.

Ben Avison bavison at riscosopen.org
Thu Mar 20 18:59:54 CET 2014


On Thu, 20 Mar 2014 17:03:37 -0000, Michael Niedermayer <michaelni at gmx.at> wrote:
> On Thu, Mar 20, 2014 at 04:06:12PM -0000, Ben Avison wrote:
>> rematrix_channels does (accum >> 14) afterwards though, so unlike
>> mlp_filter_channel (where sometimes the post-accumulate shift is 0) I
>> think you always need the upper 32 bits of the product.
>
> the lsbs of the matrix coeffs are 0 in many cases or at least many of
> the ones ive seen so far, thus the coeffs can be shifted down and
> the result of the accumulation shifted up which is equivalent to
> shifting by less than 14

That would only work for a coefficient where the 14 lsbs were zero, so
only applies to 0x4000, 0x8000 and 0xC000 (assuming 0 is already special
cased). And it only works when matrix_noise_shift==0.

Unless you're thinking of run-time assembly, however, that means the
number of permutations to expand for each matrix row has gone up from
2^8=256 (each of 8 terms may be multiplied or not) to 3^8=6561 (each
coefficient may be multiplied before or after the shift, or not used at
all). Multiplies aren't *that* slow on modern CPUs that it's worth
testing and branching over them inside the inner loop - a branch
mispredict sets us back a time equivalent to the pipeline length, so if
it's 50/50 whether we do or not then the average time would be
(pipeline_length + mul_cycles) / 2, which is typically rather more than
mul_cycles.

Ben


More information about the ffmpeg-devel mailing list