[FFmpeg-devel] [PATCH] encoder for adobe's flash ScreenVideo2 codec

Thu Jul 23 04:19:41 CEST 2009

On Wed, Jul 22, 2009 at 4:05 PM, Vitor Sessak<vitor1001 at gmail.com> wrote:
> Joshua Warner wrote:
>>
>> Him
>>
>> I fixed the issues you guys have commented on (tell me if I
>> accidentally missed one), and the revised patch is attached.
>
> I'll give a second batch of comments...
>
>> +/**
>> + * @file libavcodec/flashsv2enc.c
>> + * Flash Screen Video Version 2 encoder
>> + * @author Joshua Warner
>> + */
>> +
>> +/* Differences from version 1 stream:
>> + * NOTE: Currently, the only player that supports version 2 streams is
>> Adobe Flash Player itself.
>> + * * Supports sending only a range of scanlines in a block,
>> + * ? indicating a difference from the corresponding block in the last
>> keyframe.
>> + * * Supports initializing the zlib dictionary with data from the
>> corresponding
>> + * ? block in the last keyframe, to improve compression.
>> + * * Supports a hybrid 15-bit rgb / 7-bit palette color space.
>> + */
>> +
>> +/* TODO:
>> + * Don't keep Block structures for both current frame and keyframe.
>> + * Make better heuristics for deciding stream parameters (optimum_*
>> functions). ?Currently these return constants.
>> + * Figure out how to encode palette information in the stream, choose an
>> optimum palette at each keyframe.
>> + * Figure out how the zlibPrimeCompressCurrent flag works, implement
>> support.
>> + * Find other sample files (that weren't generated here), develop a
>> decoder.
>> + */
>> +
>> +#include <stdio.h>
>> +#include <stdlib.h>
>
> Are both includes needed?
>
>> +#include "avcodec.h"
>> +#include "put_bits.h"
>> +#include "bytestream.h"
>
> Is bytestream.h used?

I'll make sure to remove the unneeded headers.

>
>> +static av_cold void cleanup(FlashSV2Context * s)
>> +{
>> + ? ?if (s->encbuffer)
>> + ? ? ? ?av_free(s->encbuffer);
>
> No need to check if s->encbuffer is null, av_free() already does that.
>
>> +static av_cold int flashsv2_encode_init(AVCodecContext * avctx)
>> +{
>> + ? ?FlashSV2Context *s = avctx->priv_data;
>> +
>> + ? ?s->avctx = avctx;
>> +
>> + ? ?s->comp = avctx->compression_level;
>> + ? ?if (s->comp == -1)
>> + ? ? ? ?s->comp = 9;
>> + ? ?if (s->comp < 0 || s->comp > 9) {
>> + ? ? ? ?av_log(avctx, AV_LOG_ERROR,
>> + ? ? ? ? ? ? ? "Compression level should be 0-9, not %d\n", s->comp);
>> + ? ? ? ?return -1;
>> + ? ?}
>> +
>> +
>> + ? ?if ((avctx->width > 4095) || (avctx->height > 4095)) {
>> + ? ? ? ?av_log(avctx, AV_LOG_ERROR,
>> + ? ? ? ? ? ? ? "Input dimensions too large, input must be max 4096x4096
>> !\n");
>> + ? ? ? ?return -1;
>> + ? ?}
>> +
>> + ? ?if (avcodec_check_dimensions(avctx, avctx->width, avctx->height) < 0)
>> + ? ? ? ?return -1;
>> +
>> +
>> + ? ?s->last_key_frame = 0;
>
> This is unneeded, the context is already alloc'ed with av_mallocz().
>
>> +static inline unsigned int chroma_diff(unsigned int c1, unsigned int c2)
>> +{
>> + ? ?unsigned int t1 = (c1 & 0x000000ff) + ((c1 & 0x0000ff00) >> 8) + ((c1
>> & 0x00ff0000) >> 16);
>> + ? ?unsigned int t2 = (c2 & 0x000000ff) + ((c2 & 0x0000ff00) >> 8) + ((c2
>> & 0x00ff0000) >> 16);
>> +
>> + ? ?return abs(t1 - t2) + abs((c1 & 0x000000ff) - (c2 & 0x000000ff)) +
>> + ? ? ? ?abs(((c1 & 0x0000ff00) >> 8) - ((c2 & 0x0000ff00) >> 8)) +
>> + ? ? ? ?abs(((c1 & 0x00ff0000) >> 16) - ((c2 & 0x00ff0000) >> 16));
>> +}
>
> Does doing the square instead of abs() is faster and/or looks better?
>
>> +static int optimum_use15_7(FlashSV2Context * s)
>> +{
>> +#ifndef FLASHSV2_DUMB
>> + ? ?double ideal = ((double)(s->avctx->bit_rate * s->avctx->time_base.den
>> * s->avctx->ticks_per_frame)) /
>> + ? ? ? ?((double) s->avctx->time_base.num) * s->avctx->frame_number;
>> + ? ?if (ideal + use15_7_threshold < s->total_bits) {
>> + ? ? ? ?return 1;
>> + ? ?} else {
>> + ? ? ? ?return 0;
>> + ? ?}
>> +#else
>> + ? ?return s->avctx->global_quality == 0;
>> +#endif
>> +}
>
> I think if you were trying to encode optimally (if it's worth the price of
> been 2x slower), I'd suggest, for each (key?)frame:
>
> 1- Encode with 15_7 and see how many bits is consumed (after zlib) and how
> much distortion (measured, for ex., using chroma_diff()) you get.
> 2- Encode with bgr and see both the number of bits consumed after zlib and
> the distortion.

The only problem with this is that because it is a screen sharing
codec, it is important to be able to force it to use lossless mode.
Switching modes creates unacceptable artifacts, to not much benefit.
At most, I would do this with a number of different s->dist values.
This is a good idea for future improvements, but for now, I really
need to be able to wrap up this project quickly - I don't have much
more time to work on it before I have to begin other projects.  Don't
think that I'm asking you to accept low-quality code, however.

>
> Then, you choose the one that has the smallest quantity (distortion +
> lambda*rate). The reasoning behind that is better explained at
> doc/rate_distortion.txt. The parameter lambda is found in frame->quality and
> is passed from the command line by "-qscale" ("-qscale 2.3" =>
> frame->quality == (int) 2.3*FF_LAMBDA_SCALE). It is also a good starting
> point to implement in future rate control (using VBR with a given average
> bitrate gives better quality than CBR).
>
> Note that what is explained in rate_distortion.txt is already what you are
> doing with the s->dist parameter (s->dist == 8*lambda), so this "solves" the
> problem of finding the optimum dist.
>
> If the speed loss is not worth the price of trying both methods, I think
> that s->use15_7 should be chosen set based on frame->quality (by testing on
> a few samples from what quality value using bgr starts been optimal on
> average).
>
> Unfortunately, the rate distortion method do not solve the problem of
> finding the optimal block size. How much do quality/bitrate depend on it?

Quality doesn't depend at all on the block size, but the bit rate
depends a lot on it (I have seen the bit rate change by a factor of 4
between different block sizes).  Like I said before, I have tried
different formulas for estimating the optimal block size, but none of
them have worked consistently better than the 64x64 defaults.  Brute
force would be the obvious next step, but I think that would be
prohibitively expensive, because several frames (probably at least 10%
of the inter-key-frame distance) would have to be trial encoded to
make a good assessment.

>
> -Vitor
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at mplayerhq.hu
> https://lists.mplayerhq.hu/mailman/listinfo/ffmpeg-devel
>