[FFmpeg-devel] a64 encoder 7th round
Bitbreaker/METALVOTZE
bitbreaker
Sun Feb 1 22:26:46 CET 2009
> also there are
> 1. the 1000 byte of chars
> 2. the 2048 bytes of the charset
> 3. the 1000 byte of the colorram
>
> you write 2. split in 512 byte blocks so each frame gets
> 1000 + 512 + 1000 to copy and charset is updated once every 4 frames
>
> also, we know that your not perfectly optimized 1000+512 byte 4col code
> can do 2vsync
>
> so at least the following would be possible:
> each frame contains 6 256 byte blocks with a type in front
That would then means 6 request packets to send again, what takes some
time as well, but no idea how much exactly, gotta do measurements for
that first. Roughly it is writing an mac + ip + udp header + 2 bytes to
the network card, during that the length fields are calculated as well
and i substract length of the precalculated checksum to save time.
> types 0-7 could point to the 8 256 byte parts of the charset
> types 8-15 could point to the 8 256 byte parts of the charset with a
> flip of the charset (assuming this can be triggered seperately)
> types 16-19 could point to the 4 256 byte chars
> types 20-23 could point to the 4 256 byte chars with a page flip if
> this is possible
> 24-27 could point to the 4 256 byte parts of the colorram (stored
> compressed to 32 byte)
So no possibility to use all colors once, just black and white? :-(
> * this would not increase the amount of copying by a byte (you had padding
> that can be used for the 6 control bytes)
Can be, but i transported already audio data successfully in there.
However i commented audio out so far, coz the result is still not too
satisfying. But most likely i'll need 5 bytes per audio sample and two
samples per vsync. But that is going to be another chapter, on how to
represent audio with a mixed signal of either triangle, sawtooth, white
noise or rectangle with variable pulse width in 3 channels :-)
> * it would be equivalent to your 4col mode if the colorram update where not
> used
> * it would give the encoder alot more flexibility in what to update
>
> and of course if you could squeeze another 256 byte copy in per frame that
> would improve the choices the encoder had. Similarly if blocks where 128
> byte that would mean more flexibility.
But would mean 12 requests and even more overhead. So you have to find
the tradeoff between overhead and speed. Remeber that i introduced
already bigger frames to increase speed.
Overall nice plans, but for future if at all i think. If not the loading
time is the limit, my own ressources will be :-) But who knows, maybe
the yet implemented modes are teaser enough to find more people and do
more modes. So i suggets to split all that off the yet existing modes.
First of all i'll do another test about the 5col, and see if i can
squeeze it to 2 vysncs. (using 0x500 byte blocks is still below the MTU
and saves me another request + 1 packet header to read + some more
speedcode + some more interleaving, let's see). Also i could tune the
server a bit and let it split requests into the right size if they are
above MTU, or even better let it accept requests like: send 2 times
0x500 bytes. That would at least save the overhead of sending requests,
still i need to skip 2 headers before the payload appears. But lets see
what speeds i can achive by that. Also i need to see how many
packets/data the network card buffer can hold. The datasheet
unfortunatedly doesn't mentions the buffer size at all :-(
If your concepts really work out, then they should IMO be implemented in
a mode that is able to display more colors and details. The multicolor
mode will not profit a lot from all that i'm afraid. And see above,
let's split that or i might fiddle around for livetime and not be able
to submit anything to ffmpeg then.
Also if you need moving blocks, the use of sprites could be considered.
There are then 8 blocks of 24x21 pixel in size, using 0x40 bytes each.
Either in multicolor or Hires (colors independent defineable from
multicolor charset) There can be 8 in a row, and with some tricks and
overhead they can be multiplexed. However they consume additional cpu
time as the videochip has more bus accesses when shown. Also, there is a
bug when the first sprite's y-position is higher or same as with other
sprites. In that case the videochip collides with the network cards bus
access and data is trashed. (the network card designer's fault :-( ) Did
i already mention that the network card's design sucks in several other
points as well? :-)
But alas, it is the fastest way to load so far and a good substitute for
the 5,25" floppy discs ;-)
Kindest regards,
Toby
More information about the ffmpeg-devel
mailing list