Fun with VPOKE

May 22, 2019
491
251
63
Finally, VPOKE

VPOKE has 3 arguments: the bank, the index, and the value.

Since VERA (the video chip) has 1MB of RAM, it's easier for us to access that memory 64K at a time. So the bank is used to decide which 64K section we're looking at with VPOKE and VPEEK. VPEEK(1,0) actually reads from location 65536 in VERA memory, and VPOKE 1,0,v writes to 65536.

There is another way to push data into VERA that might be a bit faster, depending on the situation. Since the CPU talks to VERA through an 8 byte data channel, it has to do so by setting the 3-byte address registers, then setting the data register.

We can shorten this path a little by priming the address registers with the first VPOKE, then writing directly to the data ports. Successive writes actually cause VERTA to internally increment the address.

Look at this table:
1568422975539.png

We can actually talk to VERA directly by setting the address in registers 0-2, and then writing data in register 3. Furthermore, if we set bits 4-6 of VERA_ADDR_HI to a value of 2, VERA will then add 2 to the current value each time we write to the screen.

So consider this simple example:

Code:
10 PRINT CHR$(19+128)
20 POKE $9F25,0
30 POKE $9F20,32 : REM BIT 5=32
40 POKE $9F21,0
50 POKE $9F22,10
60 FOR I=1 TO 10
70 POKE $9F23,I
80 NEXT
1568422007068.png

But I'd like to do a larger block of text on the screen, so consider this:

Code:
10 PRINT CHR$(19+128)
20 POKE $9F25,0
30 POKE $9F20,32
40 FOR Y=0 TO 15
50 POKE $9F21,Y
60 POKE $9F22,10
70 FOR I=0 TO 15
80 POKE $9F23,I
90 NEXT
100 NEXT
1568422497103.png

Next, I want to change the color of each cell:

First I change line 30 to POKE $9F20,16 and I poke two values on line 80: POKE $9F23,I:POKE $9F23,C

But oops, we don't have a "C" yet. Let's fix that by adding 85 C=C+1

1568422882472.png


Code:
10 PRINT CHR$(19+128)
20 POKE $9F25,0
30 POKE $9F20,16
40 FOR Y=0 TO 15
50 POKE $9F21,Y
60 POKE $9F22,10
70 FOR I=0 TO 15
80 POKE $9F23,I:POKE $9F23,C
85 C=C+1
90 NEXT
100 NEXT
(more to follow)
 
May 22, 2019
491
251
63
So, in essence we can use the address registers in VERA to compute a row and column, we can essentially use $9F21 as our row and $9F22 as the column.

Let's apply this to COLORS:

Lines 10-150 stay exactly the same. We will modify 160 to the end....

Code:
10 PRINT CHR$(147);"    ";
20 FOR X=0 TO 15
30 PRINT RIGHT$("   "+STR$(X),3);" ";
40 NEXT
50 PRINT:PRINT
60 FOR Y=0 TO 15
70 PRINT
80 C=Y*16
90 PRINT RIGHT$("   "+STR$(C),3);" ";
100 FOR X=0 TO 15
110 C=Y*16+X
120 PRINT RIGHT$("   "+STR$(C),3);" ";
130 NEXT
140 PRINT:PRINT
150 NEXT
160 C=0
170 POKE $9F22,32
180 FOR Y=2 TO 49 STEP 3
190 FOR Y1=0 TO 2
200 POKE $9F21,Y+Y1
210 POKE $9F20,9
220 FOR X=0 TO 15
230 FOR I=0 TO 3
240 POKE $9F23,C
250 NEXT I
260 C=C+1
270 NEXT X
280 C=C-16
290 NEXT Y1
300 C=C+16
310 NEXT Y
and the end result:

1568423874827.png
 
Last edited:
May 22, 2019
491
251
63
Finally, a performance test...

The first version of COLORS takes 316 ticks, or 5.27 seconds.
The second version takes 241 ticks, or 4.01 seconds
This is a performance difference of 30%. Just from using POKE, rather than VPOKE.
 

DigitalStefan

New Member
Sep 15, 2019
5
1
3
It's possible to get your other, non-tricksy version of this down to 233 ticks by unrolling the inner loop,

Thanks though :)

I downloaded the emulator yesterday and it's really interesting to play with.
 
May 22, 2019
491
251
63
It's possible to get your other, non-tricksy version of this down to 233 ticks by unrolling the inner loop,

Thanks though :)

I downloaded the emulator yesterday and it's really interesting to play with.
Yes, there is a lot optimization possible in the first example. The intent was more to show how to directly use the VERA registers than a serious attempt to optimize the program.
 
  • Like
Reactions: MonstersGoBoom
May 22, 2019
491
251
63
FYI, guys... it looks like the VERA register layout may have changed since I wrote this.

I'll try to edit and re-upload the BASIC example above, but note that the order of the VERA address registers has reversed.
1569036313043.png

$9F20 and $9F22 have swapped places. I've updated the BASIC demo above to account for this. Note that this example now only works in r31 (or newer) of the emulator.
 
Last edited:
May 22, 2019
491
251
63
Since the question arose of whether the improvements in the second program were due to the algorithm or the VPOKE, I thought I'd write a test program with exactly the same algorithm (as closely as possible, anyway) and VPOKE for one process and POKE for the other.

Here is what I came up with....

Code:
0 TI$="000000"
10 POKE $9F22,32
50 FOR Y=0 TO 59
60 POKE $9F21,Y
80 POKE $9F20,2
90 FOR X=1 TO 79
100 POKE $9F23,0
110 NEXT
120 NEXT
130 T1=TI
200 TI$="000000"
210 N=0
220 FOR Y=0 TO 59
230 FOR X=0 TO 78
240 VPOKE 0,N,128
250 N=N+2
260 NEXT
270 N=N+98
280 NEXT
290 T2=TI
300 PRINT CHR$(147)
310 PRINT "POKE: ";T1/60
320 PRINT "VPOKE:";T2/60
The first pass pokes the row and column values directly into the address registers (a row is 256 bytes long, which allows for easy row/column management)
The second pass uses VPOKE and uses addition to compute the address. Each row is 256 bytes long, and the X loop adds up to 158 bytes. So to get to the next row, I'm adding 96.

I only did 79 columns in each loop to confirm I wasn't writing past the right side edge. This is just a little sanity check I added in to make sure both versions match the exact number of bytes written.