[FFmpeg-devel] a64 encoder 7th round
Bitbreaker/METALVOTZE
bitbreaker
Tue Feb 3 20:28:10 CET 2009
Michael Niedermayer schrieb:
> On Tue, Feb 03, 2009 at 03:22:21PM +0100, Bitbreaker/METALVOTZE wrote:
>
>>> ldx $de00
>>> lda lut+0,x
>>> sta dest,y
>>> lda lut+1,x
>>> sta dest+64,y
>>> lda lut+2,x
>>> sta dest+128,y
>>> lda lut+3,x
>>> sta dest+192,y
>>> iny
>>>
>> Just tested something similar to reconstruct the "compressed" colorram.
>> However it spoils of course my option of linear writing and thus things
>> need to happen even faster as i write at the lower and upper end of the
>> colorram area at the same time. It works out tightly however when i
>> start 44 lines before screen ends. Writing endures until i enter the
>> upper area again, but ends luckily fast enough (4 lines) before the last
>> line of the first 0x100 block of colorram is displayed. So i have to
>> take care that i cross no 0x100 border codewise and indexwise, as that
>> would add extra cycles and thus trash display. I could however place the
>> LUT into zeropage where no extar cycles apply on those conditions. Would
>> make things more stable,
>>
>
>
>> but wastes 64 nice favourite places to store
>> values when running out of registers ;-)
>>
>
> 19 not 64 (see my previous reply for the actual table
> you need just 2^n + n - 1 not 2^n * n with overlapping entries
>
What i do is stuffing 4 bits of each $0100 block together codec wise
on c64 i copy 64 byte lut to $0100 (this is the stack) coz i was fed up
by wasting so many cycles for just reading a table. Then i can suddenly do:
ldy #$00
tsx
stx $40 ;save stack pointer
ldx $de00
txs
pla
sta $d800,y
pla
sta $d900,y
pla
sta $da00,y
pla
sta $db00,y
iny
ldx $de01
...
when finished, restore stack pointer
this allows me to save 2 more cycle per 4 byte lookup as i can just pull
data from stack within 3 cycles and even get the stackpointer
incremented for free by that. To bad that stack area is fixed, else i'd
do it the other way round by pushing bytes on the stack.
This is on the one hand dirty, but works fine, the real stackpointer and
data is far away from my LUT i placed in the stack, so no collisions
expected. My lookup consists of 16 entries each 4 bytes. The size does
not hurt, 6502 code grew anyway by 0x500 bytes by the latest
optimizations (twice the size now) ;-)
LUT looks like:
8,8,8,8
8,8,8,9
8,8,9,8
8,8,9,9
...
Tested all on the real machine as well this evening, works fine so far.
Kindest regards,
Toby
More information about the ffmpeg-devel
mailing list