[FFmpeg-devel] [PATCH 2/5] truehd: break out part of rematrix_channels into platform-specific callback.

Thu Mar 20 16:21:04 CET 2014

On Thu, Mar 20, 2014 at 02:55:29PM -0000, Ben Avison wrote:
> On Thu, 20 Mar 2014 02:07:42 -0000, Michael Niedermayer <michaelni at gmx.at> wrote:
> >>i think matrix_coeff is guranteed to fit in int16_t, this would allow
> >>simplifying the code
> >
> >that is using int16 though maybe its not helpfull on arm
> 
> Yes, Christophe pointed out the same thing earlier. I can't see any way

I think Christophe meant the s->channel_params[channel].coeff in
mlp_filter_channel()

I meant the matrix_coeff in rematrix_channels()

> to take advantage on ARM (including NEON) of the multiplier only being 16
> bits, and in fact packing the matrix_coeff array more tightly would
> actually make things worse.
> 
> >also the matrix_coeffs are trivial values for the file i looked at
> >like 0x3000 or 0x4000
> >so optimizing for such special cases might be worth it
> 
> Well, the matrices I can see look more like
> 
> / F880, 05C0, 0000, FE40, C000, 0000 \
> | 08E0, F8E0, 00C0, FF80, 1040, C000 |
> | D900, C600, C000, FD00, DB00, CF00 |
> | 0000, C000, D2B0, 0000, 0000, C000 |
> \ C000, 0CD4, DBC4, 0000, C000, 0CD4 /

Is this for a real world file or some reference file ?
reference files tend to be poor choices for data dependant
optimizations as reference files often try to cover a large range of
cases to test most of an implementation even if such choice makes no
sense for achiving good compression/quality

> 
> Only the zeros there look worth considering. But there are 2^6 possible
> patterns for zeros in one row of that matrix (even worse 2^8 for 7.1
> streams), and it doesn't look like any patterns in particular are
> especially common. I could imagine all 2^6 or 2^8 possibilities being
> expanded out, but that would make the binary 256 times bigger, so hurt
> the I cache and branch predictor efficiency, and also need a 1K branch
> table (a big chunk of the D cache). Alternatively, it could do some run
> time assembly.
> 
> But since the matrix can change every frame (typically 40 samples for
> TrueHD) my gut instinct is that we're better off sticking with a fixed
> number of multiplies, even if some of the coefficients are zero.

what i meant was that many of the values look like they might fit into
32bit arithmetic, avoiding the need for dealing with 64bit results
that is when the coefficients are shfted down

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Avoid a single point of failure, be that a person or equipment.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20140320/143821c8/attachment.asc>