[FFmpeg-devel] a64 encoder 7th round

Bitbreaker/METALVOTZE bitbreaker
Tue Feb 3 20:28:10 CET 2009

Michael Niedermayer schrieb:
> On Tue, Feb 03, 2009 at 03:22:21PM +0100, Bitbreaker/METALVOTZE wrote:
>>> ldx $de00
>>> lda lut+0,x
>>> sta dest,y
>>> lda lut+1,x
>>> sta dest+64,y
>>> lda lut+2,x
>>> sta dest+128,y
>>> lda lut+3,x
>>> sta dest+192,y
>>> iny
>> Just tested something similar to reconstruct the "compressed" colorram. 
>> However it spoils of course my option of linear writing and thus things 
>> need to happen even faster as i write at the lower and upper end of the 
>> colorram area at the same time. It works out tightly however when i 
>> start 44 lines before screen ends. Writing endures until i enter the 
>> upper area again, but ends luckily fast enough (4 lines) before the last 
>> line of the first 0x100 block of colorram is displayed. So i have to 
>> take care that i cross no 0x100 border codewise and indexwise, as that 
>> would add extra cycles and thus trash display. I could however place the 
>> LUT into zeropage where no extar cycles apply on those conditions. Would 
>> make things more stable, 
>> but wastes 64 nice favourite places to store 
>> values when running out of registers ;-)
> 19 not 64 (see my previous reply for the actual table
> you need just 2^n + n - 1 not 2^n * n with overlapping entries
What i do is stuffing 4 bits of each $0100 block together codec wise

on c64 i copy 64 byte lut to $0100 (this is the stack) coz i was fed up 
by wasting so many cycles for just reading a table. Then i can suddenly do:

ldy #$00
stx $40 ;save stack pointer
ldx $de00
sta $d800,y
sta $d900,y
sta $da00,y
sta $db00,y

ldx $de01

when finished, restore stack pointer

this allows me to save 2 more cycle per 4 byte lookup as i can just pull 
data from stack within 3 cycles and even get the stackpointer 
incremented for free by that. To bad that stack area is fixed, else i'd 
do it the other way round by pushing bytes on the stack.
This is on the one hand dirty, but works fine, the real stackpointer and 
data is far away from my LUT i placed in the stack, so no collisions 
expected. My lookup consists of 16 entries each 4 bytes. The size does 
not hurt, 6502 code grew anyway by 0x500 bytes by the latest 
optimizations (twice the size now) ;-)

LUT looks like:


Tested all on the real machine as well this evening, works fine so far.

Kindest regards,


More information about the ffmpeg-devel mailing list