[FFmpeg-devel] [PATCH] FFV1 rectangular slice multithreading

Thu Oct 14 22:16:06 CEST 2010

On Thu, Oct 14, 2010 at 8:09 AM, Michael Niedermayer <michaelni at gmx.at> wrote:
> On Thu, Oct 14, 2010 at 06:33:08AM -0700, Jason Garrett-Glaser wrote:
>> On Thu, Oct 14, 2010 at 5:59 AM, Michael Niedermayer <michaelni at gmx.at> wrote:
>> > Hi
>> >
>> > Following patchset makes ffv1.2 much faster on multiprocessor systems
>> > (requires version to be set to 2, which needs you to edit the source if you
>> > ?want to try as 1.2 bitstream is not finalized yet)
>> >
>> > Compression wise 4 slices with foreman and large gops (300 frames) perform
>> > slightly better (0.05% IIRC) than 1 slice.
>> > With small gops (50 frames) compression is worse with the rangecoder and the
>> > large context model by 0.8% otherwise better too.
>> > (its quite obvious why its worse in that case and ill be working on that ...)
>> >
>> > Comments welcome, bikesheds not, and ill apply this soon
>>
>> >+ ? ?if(f->num_h_slices > 256U || f->num_v_slices > 256U){
>>
>> The max slices is 256, but this allows for up to 65,536, which doesn't
>> seem right.
>
> oops, fixed locally with
> + ? ?if(f->num_h_slices > 256U || f->num_v_slices > 256U || f->num_h_slices*f->num_v_slices > MAX_SLICES){
> - ? ?if(f->num_h_slices > 256U || f->num_v_slices > 256U){

Isn't the former check unnecessary?  If either num_h_slices or
num_v_slices exceeds 256, the latter check will also be triggered,
unless one of the slice counts is 0 (which is equally invalid).

Also while you're playing with FFV1, I mucked with the contexts a
while back.  I was never able to get a very large improvement, but
here's some of the ideas I tried:

1.  Base some (or all) of the contexts on previous residual values
instead of neighboring pixels (it was almost equivalent in
compression, but I'm curious how much a combination of that and the
current approach could help).  The bonus of this method is you can
combine it with FFV2's pixel ordering to allow decode/encode SIMD of
the median prediction.

2.  Base some of the contexts on the actual pixel values, e.g. an [8]
based on a quantized luma range of the average neighbor range.

3.  "Blended" contexts -- in addition to reading the relevant context,
read all the "neighboring" contexts too, and do a weighted average of
some sort.  This is equivalent to blending on context update, IIRC.

Obviously no issue if you don't have time to try any of this, but I'm
just throwing it out there for people to play with.

Dark Shikari