[FFmpeg-devel] [PATCH 3/3] ac3enc: add SIMD-optimized shifting functions for use with the fixed-point AC3 encoder

Tue Mar 8 22:19:34 CET 2011

On 8 Mar 2011, at 20:38, Justin Ruggles <justin.ruggles at gmail.com> wrote:
> On 03/08/2011 02:15 PM, Michael Niedermayer wrote:
> 
>> On Tue, Mar 08, 2011 at 01:59:44PM -0500, Ronald S. Bultje wrote:
>>> Hi,
>>> 
>>> On Tue, Mar 8, 2011 at 1:18 PM, Justin Ruggles <justin.ruggles at gmail.com> wrote:
>>>> +%macro AC3_LSHIFT_INT16 1
>>>> +cglobal ac3_lshift_int16_%1, 3,3,5, src, len, shift
>>>> +    test   shiftd, shiftd
>>>> +    jz .end
>>>> +    movd       m0, shiftd
>>>> +    ALIGN 8
>>>> +.loop:
>>>> +    AC3_SHIFT_4MM srcq, psllw, m0
>>>> +    sub      lend, mmsize*2
>>>> +    ja .loop
>>> 
>>> If it's not a multiple of mmsize*2, this will loop forever. Making
>>> this jg (= signed) would fix that.
>> 
>> really?
>> i want that CPU on which ja behaves liek that
> 
> in testing, it did not cause an infinite loop.  i think because ja also
> checks the carry flag?

How about: if you compare unsigned numbers the unsigned conditional jump instruction of your CPU hopefully does the right thing? (I assume the knowledge that subtraction and comparison are the same except for result writeback).
jg would incorrectly exit immediately for lend >= 2^31 (since lend would then be negative interpreted as signed value and certainly not larger than mmsize*2).