[FFmpeg-devel] [PATCH] lavfi/hue: use lookup tables

Reimar Döffinger Reimar.Doeffinger at gmx.de
Wed Aug 21 19:00:33 CEST 2013


On Tue, Aug 20, 2013 at 12:58:10PM +0200, Michael Niedermayer wrote:
> On Tue, Aug 20, 2013 at 09:43:17AM +0000, Paul B Mahol wrote:
> > On 8/20/13, Michael Niedermayer <michaelni at gmx.at> wrote:
> > > On Mon, Aug 19, 2013 at 08:50:44PM +0000, Paul B Mahol wrote:
> > >> On 8/19/13, Michael Niedermayer <michaelni at gmx.at> wrote:
> > >> > On Mon, Aug 19, 2013 at 08:28:14PM +0000, Paul B Mahol wrote:
> > >> >> On 8/19/13, Michael Niedermayer <michaelni at gmx.at> wrote:
> > >> >> > On Mon, Aug 19, 2013 at 07:12:04PM +0000, Paul B Mahol wrote:
> > >> >> >> Signed-off-by: Paul B Mahol <onemda at gmail.com>
> > >> >> >> ---
> > >> >> >>  libavfilter/vf_hue.c | 76
> > >> >> >> +++++++++++++++++++++++++++++++---------------------
> > >> >> >>  1 file changed, 45 insertions(+), 31 deletions(-)
> > >> >> >
> > >> >> > breaks -vf hue=90
> > >> >> >
> > >> >> > also i dont see how this could work, rotation is not newu=f(oldu)
> > >> >>
> > >> >> obviously, i need bigger luts, but do you consider lut approach worth
> > >> >> it?
> > >> >
> > >> > if its faster sure, but iam not sure SIMD without LUT wont beat it
> > >>
> > >> shouldn't lut with SIMD be fastest?
> > >
> > > if theres a CPU and a architecture that has a fast multiple lookup
> > > instruction
> > 
> > The way how you said it, it appears there is none.
> 
> i dont know if theres one (not counting FPGAs here)

Well, I think there are SIMD functions that can use 1 or 2 other
registers as "LUT", but that is too small for this purpose.
Then on some architectures like GPUs, they kind of can do that,
by merging memory request from multiple threads.
But in the general case, SIMD LUT is something that would be complex
to implement doesn't really make much sense.
A SIMD instruction that would end up causing 16 memory requests would
simply not be all that much slower than doing one memory request at a
time, and it would either end up stalling the SIMD pipeline for a long
time or need huge data paths to be able to do these memory requests
within just a few cycles.
Reasons along those lines are why it's usually either or between SIMD
and LUTs.


More information about the ffmpeg-devel mailing list