[FFmpeg-devel] [PATCH 09/13] avcodec/svq1dec: clear MMX state after MB decode loop

Wed Oct 26 17:21:14 EEST 2016

On Wed, Oct 26, 2016 at 3:54 PM, Michael Niedermayer
<michael at niedermayer.cc> wrote:
> On Tue, Oct 25, 2016 at 12:00:01AM +0200, Hendrik Leppkes wrote:
>> On Mon, Oct 24, 2016 at 10:31 PM, Ronald S. Bultje <rsbultje at gmail.com> wrote:
>> > Hi,
>> >
>> > On Mon, Oct 24, 2016 at 4:26 PM, Henrik Gramner <henrik at gramner.com> wrote:
>> >
>> >> On Mon, Oct 24, 2016 at 9:59 PM, Ronald S. Bultje <rsbultje at gmail.com>
>> >> wrote:
>> >> > Good idea to reference Hendrik Gramner here, who keeps insisting we get
>> >> rid
>> >> > of all MMX code in ffmpeg (at least as an option) for future Intel CPUs
>> >> in
>> >> > which MMX will be deprecated.
>> >>
>> >> Replacing MMX with SSE2 is indeed the most "proper" fix in my opinion,
>> >> but it's a fair amount of work and not done in an evening.
>> >>
>> >> The fact that a lot of assembly lacks unit tests is certainly not
>> >> helping in that regard.
>> >>
>> >> Some MMX instructions are slower than the equivalent SSE2 code on
>> >> Skylake. Intel hasn't officially commented on (as far as I know at
>> >> least) if we should expect this trend to continue, but they certainly
>> >> seem to treat MMX as legacy.
>> >>
>> >> I doubt they would completely remove support for it though, backwards
>> >> compatibility is a big selling-point for x86.
>> >
>> >
>> > Well, it gives us another way of fixing this issue (on x86-64 only): have
>> > sse2 implementations for all code that has a mmx (register) path right now.
>> >
>>
>> I don't think the argument for pre-sse2 CPUs is that strong on 32-bit
>> systems, either.
>
> SSE2 was initially not faster than MMX as CPUs implemented it as 2
> MMX operations internally not having a full width SIMD unit for SSE*
> so there would be a performace loss on some x86-32 CPUs if MMX was
> replaced by "half-width SSE2" there
>

You can add "not caring about first-gen sse2 CPUs" to the list as
well, if you want. Those are way old as well.
There is going to be a performance loss either way, except that emms
slows it down everywhere, while using sse2 is likely to be fine on
modern CPUs. So IMHO thats the better course to take, if any.

- Hendrik