[FFmpeg-devel] [PATCH] RoQ video encoder

Fri Jun 1 21:39:22 CEST 2007

*Michael Niedermayer wrote:*
> a patch to fix this is welcome (a sample which shows some artifacts with
> the current decoder but works correctly with the ID decoder would also
> be welcome if you have such a sample)
>   
http://icculus.org/~riot/bf2introseg.bz2
Try that.  Q3A plays it fine.  Treating MOT as mcomp(0, 0) should cause 
minor artifacts with the jet flyby, and severe artifacts with the muzzle 
flashes.
> you dont seem to understand how the buffer handling in libavcodec works
> there is no gurantee that the frame you get from get_buffer() will be
> the same as the last you released with release_buffer()
> you can use reget_buffer() to achive that ...
>   
Yeah, I misread that part.
> could you explain how the roq encoder decides on the choice it makes
> for encoding each block, ive not reviewed the highlevel design of it
> carefully yet (i of course will before any patch with it will get
> accepted) i was just confused a little by all the sorting ...
>   
RoQ only has 259 possible encodings of each 8x8 macroblock (assuming the 
parameters are the same, and there is no reason for them not to be), and 
the encoder always spent most of its time during the codebook generation 
and matching phases, so I figured an encoder could be written which just 
brute-forced everything to aggressively squeeze quality out of the bit 
budget:
- The best results for each type of encoding are evaluated (i.e. the 
best codebook entry for each macroblock, the best motion vector, etc.), 
and the error of all encoding mixes are determined.
- All macroblocks are assigned a possibility list, which contains all 
possible encodings of the macroblock.
- The list is sorted by bit usage in descending order, and then error in 
ascending order.
- Possibilities with the same bit usage as the next possibility are 
discarded (ensuring only the lowest-error possibility for a each bit 
consumption remains.).  List is re-sorted to push invalid entries to the 
end of the list.
- Possibilities with lower bit usage and higher error are discarded.
The result of this is a list where plist[n+1] always consumes more bits 
and has more error than plist[n].  The sort operations are mainly to 
push discarded entries to the end of the list.
- A list of reductions is created, one for each macroblock.  A reduction 
represents a change in the encoding of a macroblock that reduces the 
size of it.  Is declared as a pointer to a macroblock, an index into its 
possibility list, and the error increase and bit reduction of going from 
plist[index] to plist[index+1].
- The reduction list is sorted by error per bits saved, in ascending 
order.  Reductions where plist[index+1] does not exist are forced to the 
end of the list.
- Reduction loop: The reduction earliest in the list is processed: Its 
index is incremented, the error and bit cost changes of the next index 
in that possibility list is evaluated, and the reduction is 
bubble-sorted to its new proper location in the list.  If the size of 
the encoded frame falls below the desired value, or no more reductions 
can be performed, the loop is exited.

A flaw with this is that during the possibility list creation, it is 
possible that, for example, going from plist[0] to plist[2] may produce 
less error-per-bit than going from plist[0] to plist[1].  The update I 
added discards any entry which is a less-efficient use of the bit budget 
than an entry further down the list.  Mileage varies, but I noted some 
pretty significant quality improvements in high-action frames.

In retrospect though, given that end-result, the entire list could be 
processed in one step, so I'll have to make a new patch with a better 
method: Find the lowest-error possibility, if there is a tie, pick the 
one with the fewest bits consumed, add it to the new list.  Find the 
entry with the least error added per bit reduced, add it to the list.  
Repeat until no more bit reductions are available.  I'll make a patch 
for that.

A major flaw with this has also been the chicken-and-egg problem of 
codebook generation.  That is, whether motion compensation is used 
instead of the codebook depends on how good the codebook is, but the 
codebook would be better if it didn't use parts of the image that get 
skipped for motion compensation.  Regenerating the codebook isn't really 
an option, because that screws with the rate control (the size of the 
codebook is a fairly significant factor in evaluating frame size, and 
often a good portion of the 4x4 entries are unused).  I wouldn't say 
that using all of the 4x4 entries is really a goal either, because 
codebook entries are not necessarily a good use of the bit budget either.

Best option would really be to only use codebook generation if motion 
compensation and backbuffer skip both go over error thresholds (should 
be a separate one for each, higher threshold for SKIP versus MCOMP), but 
picking that threshold is tricky as it can vary heavily with scene 
complexity.  It MIGHT be possible to pick an error threshold by doing 
two passes per frame, but I never really entertained that possibility 
because the codebook generator is agonizingly slow.
> you can make qsort behave binary identical by simply never returning a
> 0 from the comparission function
>   
I was concerned with the fact that qsort could still either do cmp(a, b) 
or cmp(b, a) and forcing a non-zero return wouldn't fix that, but since 
they're pointers, I'll just compare addresses if all else is equal.