[FFmpeg-devel] a64 encoder 7th round

Thu Jan 29 13:47:52 CET 2009

> Surely the addition of P frames does not help in every case, one always can
> just encode unrelated frames but where it does help the smaller P frames
> allow the freed up space and bitrate to be used by something else to improve
> quality ...

In multicol mode i can only improve quality by either having a bigger 
charset (impossible, 256 is already the maximum the c64 can handle) or 
sending a charset more often. a lifetime of 4 frames however is already 
a good tradeoff between quality and framerate. Also sending two screens 
and thus making chars 8x4 pixels big would be an option. But all that 
would make a single frame much bigger. So i can only take advantage of 
that if i can load/decompress/whatsover them faster _even_ in worst 
case. So a delta/rle/whatsoever must always, under any circumstances be 
faster and have thus no cross over point with plain loading. If not 
framerate will drop or plain loading will be enough for same quality and 
speed.

> If this picture was the input then the use of the specific colors is optimal
> because that is how it is supposed to look.
> This really very strongly points towards something being wrong with how you
> choose the colors.

Sure, i can interlace #000000 and #ffffff to get a nice gray tone, but 
well, try that at 50Hz refresh rate, it is flickering like hell and 
really hurts your eyes :-) It even flickers on my TFT when i watch 
things in an emulator. Also, there is another reason for transforming 
color space: i get smoother gradients by that with colors that mix well. 
If i'd stick to colors that represent the original pic best i end up in 
a big flickering hell again and gradients need more dithersteps. As for 
green for example there is just a normal green and a light green 
available. The light green is okay to mix with white, but does flicker 
quite a bit when being mixed with the normal green, and mixing any of 
the colors with black is resulting in *heavy* flicker. All that doesn't 
of course appear on the example jpegs i have shown, but on a real 
machine this is a big thing to take care of. Sure, in multicolor mode, i 
don't do any interlacing, so here things don't hurt as much, but still 
dithering with pixels with high luminance difference doesn't look too 
nice either (pixels are just too big and thus single pixel get very 
present, as well as they do when they build a vertical or horizontal line).
Here of course i can also just choose a normal gray gradient, but for 
better comparision i took that brown/pink/yellow color gradient, so 
don't get confused by that, the resulting multicol video can of course 
be displayed in plain gray tones. There is luckily black, dark grey 
middle gray, light gray and white available in the 16 colors.
But for color modes like the ecmh mode, that is exactly why i do that 
transform via HSV and then a lookup in color lines. Sure, colors don't 
match the original colors anymore, but it improves the viewing 
experience. Also non of the screens are calibrated (nor can be) and each 
c64 displays colors a bit different. What you saw so far were excellent 
palettes from an emulator.

> You simply have to simulate how something will look and compare that with a
> good comparission function against the input picture.

That is exactly what i am doing, except for the good compare function, 
that is where i see potential, but it is not as easy as one might think, 
as the attributes of the cells need to be respected and some attributes 
apply for smaller, some for bigger cells that again include a multiple 
of smaller cells. Thinking all over that is really causing headache at 
times.

> iam not sure what these links are supposed to proof, they arent compareing
> error diffusion against ordered dither
> rather you could look at:

The links show how dithering is usually done on a c64. It is state of 
the art on that platform to do it like this, that simply is how things 
have developed over the past decades and it turned out that this looks 
best on a c64. I just can't help. I have no doubt that your examples 
look good as well, but on a PC.
Also i don't see any need of proofing. The encoder enables to do 
something that was not possible so far, and the results are way better 
than i ever expected, and still i did improvements. If someone is able 
to do it even better, i am open towards that, but i don't see the 
necessarity that i shall proof all that what i have done is the best 
thing ever. It is just the best i could achive so far until someone 
finds a better solution. But finding a better solution is more than just 
throwing a bunch of concepts into discussion and stating that they will 
perform better and thus pushing me into defense position by all that. 
This really costs me a big amount of valueable time (yes, there is work 
to do, i have a family to share time with, and if there is time left i 
prefer to spend it into programming on one of my projects than into such 
discussions) to reply with long mails and the urge to proof things. 
People can feel free to contribute codewise.

> no, not at all, first you could use loops like:
> (i hope my wild guesses for the ASM are understandable)
> 
>     ldx #<dest
>     stx a1+1 ;set highbyte of dest in code
>     stx a2+1
>     stx a3+1
>     stx a4+1
>     ldx $de01
>  loop
>     lda $de00
>  a1 sta $0000,x
>  a2 sta $0004,x
>  a3 sta $0008,x
>  a4 sta $000C,x
>     ldx $de01
>     bne loop
> 
> to write 4 equal bytes, and a similar loop to write 4 different ones

I corrected your example a bit:

      ldx #<dest
      stx a1+1 ;set highbyte of dest in code
      stx a2+1
      stx a3+1
      stx a4+1
      ;yet only highbytes are set, if you want to change the lowbyte 
(0,4,8,c) additional code is needed here. We can only set 8 bit of a 16 
bit address at once. With +1 that is the lowbyte (little endian).

      ldy #$00

      ldx $de00
      stx count
      lda $de01
loop
      ldx #$00
a1   sta $0000,x
a2   sta $0004,x
a3   sta $0008,x
a4   sta $000c,x
      inx
      cpx count
      bne loop

However this would only make sense for a count<4. Also note that $de00 
and $de01 need to be read alternatingly. Example:

lda $de00 ;read byte 1
...
lda $de01 ;read byte 2 (after that the internal latch increases and the 
next two bytes are offered on $de00/01, there is no way to read byte 1 
or 2 again)
...
lda $de00 ;read byte 3
...
lda $de01 ;read byte 4 (latch increases)

Also you need to advance the lowbytes for the 4 sta after each loop, 
increment is expensive. Indirect Y indexed adressing may help here, but 
sta is even more expensive then, and changing the pointer as well. That 
is why i choose this selfmodifying code, as it usually is faster, though 
more complex.

Indirect Y indexed looks like this:

lda #lowbyte of destaddr
sta $fb
lda #highbyte of destaddr
sta $fc
ldy #offset

sta ($fb),y

stores at destaddr+offset

inc $fb ;inc lowbyte
bne *+4 ;need to inc highbyte?
inc $fc ;inc highbyte

this does a destaddr++;

Usually you increment the offset however until it reaches 0xff, then it 
wraps around and you just increment $fc. Still, it is expensive, each 
inc costs 5 cycles, each sta 7 cycles.

You see, things might appear easier and less complex than they are. I'd 
also wish there were random access on the network chip buffer (when 
sending and doing the checksum or size information for e.g.), but it is 
not available. There is just always a bunch of restrictions we got to 
live with and work around. That is exactly what makes things so fun and 
challenging on that platform :-)

So i'd be happy you trust on my skills in that matter, coz explaining 
the whole c64/6502 world will be endless, and i guess we will just loose 
the focus in such discussions, leading nowhere but into more confusion 
and a big waste of time. Endless was already the time i invested on 
making these modes work on c64 at all and to produce encoded material 
that suites that modes.
So let's better focus on how to speedup encoding or match pictures even 
better in the encoder than on how to display and best load them on a 
c64. The ELBG thing for example was a good thing, as it speeds up things 
a lot for a similar result. But as for the format of a frame and the 
restrictions that apply by the c64 hardware, there is nothing we can 
change. As well, i have a good sense to evaluate what looks good on the 
real machine, as i can try it on the real machine, and now what aspects 
to take care of.

So eventually is there anything codewise that still needs to be fixed to 
get first of all the multi modes submitted? Is there real interest to 
get it included to ffmpeg? Coz after that i'd focus on the muxer and the 
next mode to suit the ffmpeg requirements.

Kindest regards,

Toby