[FFmpeg-devel] a64 encoder 7th round

Sun Feb 1 12:11:15 CET 2009

>> As for 5col mode i am anyway not sure if it is the nicest thing to 
>> either lift the darkest area or lower the brightes area if both occur in 
>> a single block. But that is nothing that helps regarding the framesize 
>> and loading times :-)
>>     
>
> you could always include white and black and switch a middle one, assuming
> theres no odd limitation that makes that impossible
>   
I am sorry to admit, that we again have to cope with odd limitations, as 
usual :-) 3 of the colors can be defined with a color out of 16, in b/w 
mode that is 11,12 and 15 all colors i can't display with colorram, so 
lucky to have them this way. Then balck and white can both be displayed 
with colorram and even more are far away from each other and if i lift 
or lower one of them (as i can only use one at a time) contrast should 
still be enough. That was my thought when expanding with a 5th color.

> you dont load them
> i did mention this didnt i? Reuse data from the previous frame or 4th pevious
> depending on mem layout, like P frames.
> you have 1000 byte of this colorram thing storing the 5th color, why do you
> change it every frame?
> change it every 2 or 4, and ideally let the encoder decide when to update
> within some limit instead of hardcoding a every 4 frame update.
>   
All that solutions always also drop quite some overhead on the displayer 
and make framesize vary, what makes the usage of the buffers kind of 
complex. I request the desired packet size in advance and then get as 
many bytes sent as requested. It is all such a pain for no gain. I have 
already tried quite some attempts to speed up things, here larger frames 
brought improvement (before i had a maximum framesize of 0x100), but all 
the rest didn't work out with a better framerate (for e.g. RLE attempt).

Loading happens roughly like this (lifetime is 4, so a frame is 0x600 + 
colorram):
- request another packet with 0x200 bytes
- receive a packet with 0x400 bytes data (request for that already sent 
in last round).
- now i read the data from the received packet while the pc sends 
already the next data as i requested it already beforehand to save time
- request 0xxxx for colorram
- read the 0x200 bytes left from previous request.
- request next frame data 0x400
- wait for 2nd vsync
- read colorram data nad write it immedeately to colorram
- loop

> also whats this 18700 cycles thing?
> 4col mode can be done in 2vsync
> 5col needs 3vsync
> the difference are 25*40=1000 byte per frame, which our 8 entry LUT
> would need ~10 cycles per byte, thats 10k cycles not 18.7k
> and as you said your normal copy is not optimal either so there must
> be more headroom if 4col works at full speed currently.
>   
Updating must either start synced to vsync and be maximum as fast as the 
rasterbeam, else you will see the update on the screen. In opposite to 
the update of the screens and charset i can do this in the not shown 
buffer and switch display to the new buffer with a single STA to a 
register. As for colorram, there is only one colorram that is always 
diaplayed. Other option: I update an area in the offside and copy it to 
colorram when it is about to show the next frame, but this is even more 
expensive. So what i do is loading as usual, wait for the 2. vysnc to 
happen then load colram and display starts on 3rd vysnc. How are you 
going to reduce that to 2 vsyncs when optimizing on colorram side? If at 
all you need to speedup the loading process of the 0x600 bytes 
beforehand. To manage that in 1 vync is at least possible theoretically 
but letting no space for slightest delays on the delivery of data over 
the net will lead into a varying framerate again.
There can always be some reasons for things taking a bit longer on the 
net or on PC side. At the moment things work fine, even when the server 
and the network is under heavy load.
> also with codecs like mpeg4 a 0.1% loss from what is achivable means
> rejection, you argue that a 30% reduction in filesize is negligible ...
> And that is a 30% reduction that at the same time is faster than your
> current code, even if its not fast enough to make the next vsync it
> means more free time that could be used for other things.
>   
So far there are not much other things i need to do, except checking 
some keys for beeing pressed, and if i add audio somewhen, update some 
registers on the soundchip. There's enough time left for that, as well 
as enough time to cope with delays on server/net side, i tested that.
So for the sake of making this endless discussion stop, i'll stuff the 
nibbles in colram together, that is a good tradeoff, saves some space, 
does not add much overhead codewise (on both sides), does however not 
make speed suffer or improve.

Kindest regards,

Toby