[FFmpeg-devel] libavcodec/lossless_videodsp : add add_bytes AVX2
Paul B Mahol
onemda at gmail.com
Wed Oct 25 23:08:30 EEST 2017
On 10/25/17, Martin Vignali <martin.vignali at gmail.com> wrote:
> 2017-10-25 21:53 GMT+02:00 Paul B Mahol <onemda at gmail.com>:
>
>> On 10/25/17, Martin Vignali <martin.vignali at gmail.com> wrote:
>> > 2017-10-25 9:43 GMT+02:00 Paul B Mahol <onemda at gmail.com>:
>> >
>> >> On 10/21/17, Martin Vignali <martin.vignali at gmail.com> wrote:
>> >> > Hello,
>> >> >
>> >> > In attach patch to add AVX2 version for add_bytes
>> >> >
>> >> > 0001-libavcodec-lossless_videodsp-add-add_bytes-avx2-vers :
>> >> > add AVX2 version
>> >> >
>> >> > pass fate-test for me (os 10.12, x86_64)
>> >> >
>> >> > checkasm result : (Kaby Lake) (run 10 times, and i took the fastest
>> >> > version)
>> >> > checkasm: all 2 tests passed
>> >> > add_bytes_c: 108.7
>> >> > add_bytes_sse2: 26.5
>> >> > add_bytes_avx2: 15.5
>> >> >
>> >> >
>> >> > 0002-libavcodec-lossless_video_dsp-cosmetic-add-better-se:
>> >> > only cosmetic
>> >> > like the ref c function declaration in asm file is not consistent
>> >> > between
>> >> > each asm file
>> >> > i think a better separator for each function make the file easier to
>> >> > read
>> >> >
>> >> > also add the c declaration for add bytes in comment
>> >> >
>> >> >
>> >> > Martin
>> >> >
>> >>
>> >> Are you sure 32bit alignment is actually enforced?
>> >>
>> >>
>> > Hello,
>> >
>> > I think, data used by add_bytes is always aligned
>> > because dst and src, are start of a line of an AvFrame
>>
>> Yes, but try width thats not multiple of 32.
>> _______________________________________________
>>
>>
> Sorry, not sure i understand.
> following the doc, AVFrame->linesize, is multiple of max alignment
>
> and in the asm, loop will be repeat until, val < width
>
> Can you indicate me, the part, where you think, it's not ok ?
I dunno. You should test it with widths not divisible by 32.
also try encoding cropped video.
More information about the ffmpeg-devel
mailing list