[FFmpeg-devel] What new instructions would you like?

James Darnley james.darnley at gmail.com
Sat Feb 1 13:53:28 EET 2020


On 30/12/2019, Lauri Kasanen <cand at gmx.com> wrote:
> Hi,
>
> For the Libre RISC-V project, I'm going to research the popular codecs
> and design new instructions to help speed them up. With ffmpeg being
> home to lots of asm folks for many platforms, I also want to ask your
> opinion.
>
> What new instructions would you like? Anything particular you find
> missing in existing ISAs, slow, or cumbersome?

Do you mean SIMD instructions?  I have no idea what exists in RISC-V
already or what capabilities or limitations it has, and I am going to
use x86 language and terms such as byte, word, dword, qword.

Things I have found missing in old(er) x86 instruction sets are
missing word size and signed/unsigned variants for existing
operations.  Some operations may have byte and word variants but dword
and qword might be missing, or there might be a signed version but not
an unsigned version (and vice versa).  A couple of things I had to
emulate:
* packed absolute value of dwords
* packed maximum unsigned words
* packed max and min signed dwords (I might have really wanted
unsigned for this)
* arithmetic right shift of qwords
* pack dwords to words with unsigned saturation

Shuffle instructions.  pshufb is very useful and I think I read on IRC
that arm/aarch64/neon does not have an equivalent.  (Or was that other
shuffles?)  It allows for arbitrary reordering of bytes and setting
bytes to 0.  On x86 it takes the shuffle pattern from another SIMD
register but I usually use it with a constant pattern that gets loaded
from memory.  An interesting improvement would be if you can encode 17
* 16 (or however long your vectors might be) values in an immediate
value so it doesn't require another register.

Good documentation.  The intel instruction manual has pretty good
explanation of what the instructions do.  The old instructions from
around the time of MMX and SSE had excellent diagrams, these might
have been mostly for shuffle operations.  I need to look and jog my
memory.  I think punpcklbw is an example of what I mean.  The entry in
the manual for it has a good diagram IMO.  (At least the version I am
currently looking at)

No stupid lane stuff.  AVX2 brought us a SIMD vector length extension
from 16 to 32 bytes.  Good except for the stupid lanes they were split
into making it hard to "mix" data from the low 0-15 bytes and the high
16-31 bytes.

I forgot about this email for a month.  Sorry about that.  Seeing
RISC-V in the schedule at FOSDEM reminded me about this.


More information about the ffmpeg-devel mailing list