[FFmpeg-devel] [PATCH] FFV1 rectangular slice multithreading

Thu Oct 14 22:58:01 CEST 2010

On Thu, Oct 14, 2010 at 01:16:06PM -0700, Jason Garrett-Glaser wrote:
> On Thu, Oct 14, 2010 at 8:09 AM, Michael Niedermayer <michaelni at gmx.at> wrote:
> > On Thu, Oct 14, 2010 at 06:33:08AM -0700, Jason Garrett-Glaser wrote:
> >> On Thu, Oct 14, 2010 at 5:59 AM, Michael Niedermayer <michaelni at gmx.at> wrote:
> >> > Hi
> >> >
> >> > Following patchset makes ffv1.2 much faster on multiprocessor systems
> >> > (requires version to be set to 2, which needs you to edit the source if you
> >> > ?want to try as 1.2 bitstream is not finalized yet)
> >> >
> >> > Compression wise 4 slices with foreman and large gops (300 frames) perform
> >> > slightly better (0.05% IIRC) than 1 slice.
> >> > With small gops (50 frames) compression is worse with the rangecoder and the
> >> > large context model by 0.8% otherwise better too.
> >> > (its quite obvious why its worse in that case and ill be working on that ...)
> >> >
> >> > Comments welcome, bikesheds not, and ill apply this soon
> >>
> >> >+ ? ?if(f->num_h_slices > 256U || f->num_v_slices > 256U){
> >>
> >> The max slices is 256, but this allows for up to 65,536, which doesn't
> >> seem right.
> >
> > oops, fixed locally with
> > + ? ?if(f->num_h_slices > 256U || f->num_v_slices > 256U || f->num_h_slices*f->num_v_slices > MAX_SLICES){
> > - ? ?if(f->num_h_slices > 256U || f->num_v_slices > 256U){
> 
> Isn't the former check unnecessary?  If either num_h_slices or
> num_v_slices exceeds 256, the latter check will also be triggered,
> unless one of the slice counts is 0 (which is equally invalid).

integer overflow ... now i could have casted to uint64_t but i wanted to throw
this out and store x/y/w/h per slice, not that iam planing to do anything
with that but the overhead is small and its more flexible for the bitstream

> 
> Also while you're playing with FFV1, I mucked with the contexts a
> while back.  I was never able to get a very large improvement, but
> here's some of the ideas I tried:
> 
> 1.  Base some (or all) of the contexts on previous residual values
> instead of neighboring pixels (it was almost equivalent in
> compression, but I'm curious how much a combination of that and the
> current approach could help).  The bonus of this method is you can
> combine it with FFV2's pixel ordering to allow decode/encode SIMD of
> the median prediction.

>
> 2.  Base some of the contexts on the actual pixel values, e.g. an [8]
> based on a quantized luma range of the average neighbor range.

that should be easy to try.

>
> 3.  "Blended" contexts -- in addition to reading the relevant context,
> read all the "neighboring" contexts too, and do a weighted average of
> some sort.  This is equivalent to blending on context update, IIRC.

in the large context model we have 5 quantized inputs, that makes if we use
naive bilinear style interpolation in higher dimension
interpolation in a 5d hypercube with 32 points, iam not sure how fast that
would be.
also if we just consider a 1d context from 10..0 and the next from 1..11
and have 99% of actual values be 0 then the simple blending i was thinking of
really could behave poorly if 1..11 was quite different from 10..0
but i dont know what you had in mind exactly ...

anyway ill apply the patchset soon ...

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

The real ebay dictionary, page 3
"Rare item" - "Common item with rare defect or maybe just a lie"
"Professional" - "'Toy' made in china, not functional except as doorstop"
"Experts will know" - "The seller hopes you are not an expert"
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20101014/87d8630e/attachment.pgp>