[FFmpeg-devel] [patch][OpenHEVC]added ASM functions for epel + qpel

Ronald S. Bultje rsbultje at gmail.com
Tue Mar 4 13:55:23 CET 2014


Hi,

want to rephrase something:

On Tue, Mar 4, 2014 at 7:29 AM, Ronald S. Bultje <rsbultje at gmail.com> wrote:

> > +%macro MC_PIXEL_COMPUTE16_8 0
> > +    MC_PIXEL_COMPUTE2_8
> > +    punpckhbw         m2, m0, m15
> > +    psllw             m2, 6
> > +%endmacro
> > +
> > +%macro MC_PIXEL_COMPUTE2_10 0
> > +    psllw             m1, m0, 4
> > +%endmacro
>
> So, these are only used for full-pixel MC, right? I think for these kind
> of situations, you can see that splitting MC and weighting in two functions
> is just not a good idea. I'm going to guess that the large majority of
> calls doesn't actually use weighting in practice, so you're effectively
> (for e.g. a still) unpacking to words, just to pack back to bytes
> afterwards.
>
> I'd strongly advise to look into merging weighting and MC together. I
> understand it's more code, but most of your code here is boilerplate and I
> feel it can be simplified a lot further. A lot of the extra code is just
> "stuff" that's not really needed.
>

So, as for this, I'm going to leave final decision to you - but (looking at
your H/V/HV subpel stuff now) - remember that when I suggest to merge it, I
don't mean it has to be unrolled for all cases. What do I mean with that?
Well, right now you have one function for H, V or HV subpel and one for
weighting with a store/load set in between. I suggest to merge the
_interface_ so that a subset of implementations can be combined.

What I'm suggesting is that you have a DSP interface that does mc h, v, h+v
or fullpel, with optional weighting. You can also use this interface for
bidir mc with averaged or custom bidir weighting. But that doesn't mean
each function needs full assembly implementation, most can be a C wrapper
that calls 1 or 2 or 3 actual asm implementations that do the true gritty
stuff, and the c wrapper does the annoying stuff. What you want to prevent
is your assembly doing silly stuff like upshifting by 6 to fit the
interface - to me that shows that the interface can be improved.

(Not saying vp9's asm is perfect, but I do think it does this part better,
because we have like 1/20th of the assembly that you have for mc.)

I'll finish reviewing the h/v/hv assembly first to make sure I'm getting it
all correct, but for an example of what that looks like, do see how the vp9
mc combines h and v asm functions into hv c wrappers. The advantage here
is: _very_ little actual assembly, and most of the c wrappers (for w>16 or
h+v subpel interpolation) can be written in trivial macros. But, for
fullpel in both dimensions (i.e. no h or v subpel), you see we actually
unroll the full width all the way up to w=64. Why? Because here it's worth
it.

Ronald


More information about the ffmpeg-devel mailing list