[FFmpeg-devel] a64 encoder 7th round

Bitbreaker/METALVOTZE bitbreaker
Sun Feb 1 22:26:46 CET 2009

> also there are
> 1. the 1000 byte of chars
> 2. the 2048 bytes of the charset
> 3. the 1000 byte of the colorram
> you write 2. split in 512 byte blocks so each frame gets
> 1000 + 512 + 1000 to copy and charset is updated once every 4 frames
> also, we know that your not perfectly optimized 1000+512 byte 4col code
> can do 2vsync
> so at least the following would be possible:
> each frame contains 6 256 byte blocks with a type in front

That would then means 6 request packets to send again, what takes some 
time as well, but no idea how much exactly, gotta do measurements for 
that first. Roughly it is writing an mac + ip + udp header + 2 bytes to 
the network card, during that the length fields are calculated as well 
and i substract length of the precalculated checksum to save time.

> types 0-7  could point to the 8 256 byte parts of the charset
> types 8-15 could point to the 8 256 byte parts of the charset with a
> flip of the charset (assuming this can be triggered seperately)
> types 16-19 could point to the 4 256 byte chars
> types 20-23 could point to the 4 256 byte chars with a page flip if
> this is possible
> 24-27 could point to the 4 256 byte parts of the colorram (stored
> compressed to 32 byte)

So no possibility to use all colors once, just black and white? :-(

> * this would not increase the amount of copying by a byte (you had padding
>   that can be used for the 6 control bytes)

Can be, but i transported already audio data successfully in there. 
However i commented audio out so far, coz the result is still not too 
satisfying. But most likely i'll need 5 bytes per audio sample and two 
samples per vsync. But that is going to be another chapter, on how to 
represent audio with a mixed signal of either triangle, sawtooth, white 
noise or rectangle with variable pulse width in 3 channels :-)

> * it would be equivalent to your 4col mode if the colorram update where not
>   used
> * it would give the encoder alot more flexibility in what to update
> and of course if you could squeeze another 256 byte copy in per frame that
> would improve the choices the encoder had. Similarly if blocks where 128
> byte that would mean more flexibility.

But would mean 12 requests and even more overhead. So you have to find 
the tradeoff between overhead and speed. Remeber that i introduced 
already bigger frames to increase speed.

Overall nice plans, but for future if at all i think. If not the loading 
time is the limit, my own ressources will be :-) But who knows, maybe 
the yet implemented modes are teaser enough to find more people and do 
more modes. So i suggets to split all that off the yet existing modes.

First of all i'll do another test about the 5col, and see if i can 
squeeze it to 2 vysncs. (using 0x500 byte blocks is still below the MTU 
and saves me another request + 1 packet header to read + some more 
speedcode + some more interleaving, let's see). Also i could tune the 
server a bit and let it split requests into the right size if they are 
above MTU, or even better let it accept requests like: send 2 times 
0x500 bytes. That would at least save the overhead of sending requests, 
still i need to skip 2 headers before the payload appears. But lets see 
what speeds i can achive by that. Also i need to see how many 
packets/data the network card buffer can hold. The datasheet 
unfortunatedly doesn't mentions the buffer size at all :-(

If your concepts really work out, then they should IMO be implemented in 
a mode that is able to display more colors and details. The multicolor 
mode will not profit a lot from all that i'm afraid. And see above, 
let's split that or i might fiddle around for livetime and not be able 
to submit anything to ffmpeg then.
Also if you need moving blocks, the use of sprites could be considered. 
There are then 8 blocks of 24x21 pixel in size, using 0x40 bytes each. 
Either in multicolor or Hires (colors independent defineable from 
multicolor charset) There can be 8 in a row, and with some tricks and 
overhead they can be multiplexed. However they consume additional cpu 
time as the videochip has more bus accesses when shown. Also, there is a 
bug when the first sprite's y-position is higher or same as with other 
sprites. In that case the videochip collides with the network cards bus 
access and data is trashed. (the network card designer's fault :-( ) Did 
i already mention that the network card's design sucks in several other 
points as well? :-)
But alas, it is the fastest way to load so far and a good substitute for 
the 5,25" floppy discs ;-)

Kindest regards,


More information about the ffmpeg-devel mailing list