Question about BASIC instructions' speed

LRFLEW

New Member
Sep 23, 2019
17
4
3
So I was messing around with different programs and how fast they can render to the screen. I modified the emulator to time how long it takes between writes to the Vera address $10000 (first byte of bank $1) to perform the performance tests (measured using the emulated CPU's clock cycles). After reading through the discussion on the post "VPOKE Banks?", I noticed that my experiment would be affected by a problem brought up. Basically, in that post, it was brought up that, on real hardware, the current implementation of POKE in BASIC will result in the increment being applied twice when writing to a Vera data port on actual hardware. This issue doesn't appear in the emulator, so I hadn't noticed anything wrong (and the discussion seems to indicate that the issue will be fixed before hardware is release), but went ahead and tried updating the test programs to use VPOKE instead. I was expecting a slight hit to the performance, as VPOKE requires more writes to memory and more value processing. What I found, however, surprised me. Here are two sample programs, both write $4000 (16384) 0-bytes to the Vera starting at $10000, and loops indefinitely (the GOTO on line 80).

POKE Program:
Code:
10 POKE $9F25, $00
20 POKE $9F22, $11
30 POKE $9F20, $00
40 POKE $9F21, $00
50 FOR I = 0 TO $3FFF
60 POKE $9F23, $00
70 NEXT
80 GOTO 30
Time between loops: 14.125 seconds

VPOKE Program:
Code:
50 FOR I = 0 TO $3FFF
60 VPOKE $1, I, $00
70 NEXT
80 GOTO 50
Time between loops: 12.268 seconds

Any idea why this is the case? I don't have much experience with BASIC prior to working with the X16 emulator, so I don't really know any of the "classic" optimization secrets (other than a few things I read online, such as that smaller line numbers are better). Looking at the assembly in the ROM, the VPOKE command should take longer than the POKE command (I think), but it isn't. I feel so confused about this.
 
May 22, 2019
486
250
43
This is part of the discussion I've been having with someone in a Github pull request.

The issue is that BASIC has to parse the command parameters every time, and it takes longer to read $9F23 as it does to read "I". So your second loop is faster because there's less text to parse.

Let me run a quick test with your two code blocks and my standard test rig:
So the first loop also runs in 14.1 seconds for me.
1571295405320.png

Now that we have a control, let's try adding a variable.

1571295558538.png

So the number parsing is the issue.

Finally... I went back and implemented the VPOKE version exactly as you wrote it:
1571295824445.png

So VPOKE is significantly slower, by about 48% in this test.
 
May 22, 2019
486
250
43
so I don't really know any of the "classic" optimization secrets (other than a few things I read online, such as that smaller line numbers are better)
This one is a myth - mostly.

Line numbers in BASIC are stored as binary integers (unlike numeric literals in the code itself), so the length of the line number is irrelevant. It's always 2 bytes in memory.

Consider this:
1571296322419.png

So why is this "mostly" a myth? Because GOTO statements still need to be parsed, and if you have a lot of GOTO statements with long line numbers, you are adding a few microseconds to each GOTO.

There are a lot of things you can do to make BASIC code faster, and most of it comes down to reducing the amount of text that needs to be parsed. If you want to start a BASIC optimization thread, I'm sure people will have other suggestions to throw in there.
 

LRFLEW

New Member
Sep 23, 2019
17
4
3
The issue is that BASIC has to parse the command parameters every time, and it takes longer to read $9F23 as it does to read "I".
Interesting.

I was able to confirm it with my timing emulator with some tests. Taking my first program, I made these changes and looked at the new times.

Code:
60 POKE 40739, $00
Time: 18.300

Code:
60 POKE $9F23, $00
Time: 14.125 (same as before)

Code:
5 A=$9F23
60 POKE A, $00
Time: 8.625

I would have tried it with A%, but the actual address overflows the signed 16-bit integers.

This does seem to confirm my theory that hexadecimal is faster to parse than decimal. Maybe I should try Binary as well. I will also mess with the 0 value and ways to represent that. I may make a new thread specifically about optimization like you suggested for that, though.

EDIT: Figured I'd add a few more results I found with more testing

Code:
60 POKE %1001111100100011, $00
Time: 31.628
So... don't use binary if you want speed I guess.

Code:
60 POKE A, $0
Time: 7.563
Removing the superfluous zero saved a decent amount of time, apparently.

Code:
60 POKE A, 0
Time: 7.743
So hexadecimal is faster, even for single digit numbers. Good to know.

Code:
60 POKE A, .
Time: 6.642
I read that this was supposed to be the fastest way to represent zero, and it appears that's correct.
 
Last edited:
  • Like
Reactions: BruceMcF

Wertyloo

Member
Sep 15, 2019
72
7
8
The issue is that BASIC has to parse the command parameters every time, and it takes longer to read $9F23
why isn't the tokenizer store the hexa on 3 byte: .byte "$",$23,$9f
when decode, the interpreter only need to read 3 byte,and on the last 2 doesnt need to decode either from ascii
when list run it knows from the "$" how to display the integer after it: "$9F23"
 
May 22, 2019
486
250
43
why isn't the tokenizer store the hexa on 3 byte: .byte "$",$23,$9f
when decode, the interpreter only need to read 3 byte,and on the last 2 doesnt need to decode either from ascii
when list run it knows from the "$" how to display the integer after it: "$9F23"
probably because there wasn’t room in an 8K interpreter for that.There should be room in this 16K interpreter, so we are hoping this gets added.
 

LRFLEW

New Member
Sep 23, 2019
17
4
3
why isn't the tokenizer store the hexa on 3 byte: .byte "$",$23,$9f
BASIC treats (most) numeric values as floating point, right? Why not make the tokenized version of any literal number (hex, dec, or binary) be the actual binary float value? It would be longer at 6 bytes (a "start sequence" byte and 5 bytes for the value), but shouldn't require any interpretation at all, as it can be passed directly to the floating point functions by address. The only problem I foresee is that removing type information like this would mean that LIST won't know which way to print back the value, but maybe using an extra byte for "type" (or incorporating it into in the first byte) would fix that.
 
May 22, 2019
486
250
43
BASIC treats (most) numeric values as floating point, right? Why not make the tokenized version of any literal number (hex, dec, or binary) be the actual binary float value? It would be longer at 6 bytes (a "start sequence" byte and 5 bytes for the value), but shouldn't require any interpretation at all, as it can be passed directly to the floating point functions by address. The only problem I foresee is that removing type information like this would mean that LIST won't know which way to print back the value, but maybe using an extra byte for "type" (or incorporating it into in the first byte) would fix that.
That’s the plan. The value would be stored as a float in memory with a single byte prefix as a “this is a number” token.

however, this can only work reliably well on integers, as floating point numbers don’t always align well with base 10, and you will get transcription errors listing and re-parsing irrational numbers. So I would probably only use this for integer literals, skipping the encoding process for any fractional value.

This system would also need separate tokens for numbers written out as hex, decimal, and binary literals. So it's not quite as simple as just "turn it into a binary number".

At least one variant of BASIC (I want to say BBC) actually saves both the encoded binary and the string representation in the program, but that method does use more memory.

Personally, I'd probably go a step further and actually directly compile BASIC programs in memory to a more efficient P-Code format, which would be kind of halfway between machine code and the interpreted form of BASIC.
 
Last edited:

Wertyloo

Member
Sep 15, 2019
72
7
8
oookay, what i proposed early-er is like:
.byte 'poke-token',"$",$ac,$12,"+","i",",","%",$ab
and lists as: poke $12ac+i,%10101011
 

BruceMcF

Active Member
May 19, 2019
139
42
28
Since the one byte tokens for ROM Basic are used up or reserved, I proposed that the $FF value for the extension tokens be reserved for this, so that makes three bytes overhead before data: $FE $FF <descriptor> [data0] ... so between four bytes (for a one-byte integer) to eight bytes (for a floating point value).

But the thing about transcription errors is that if you remember the width in a descriptor byte before the data, most of them go away.

And if someone is entering a lot of 8 digit decimal fractions, and are imagining they are being entered "precisely", perhaps finding out that what they type in and what the computer receives are a little different is something they ought to learn.
 
Last edited:

Wertyloo

Member
Sep 15, 2019
72
7
8
And if someone is entering a lot of 8 digit decimal fractions, and are imagining they are being entered "precisely", perhaps finding out that what they type in and what the computer receives are a little different ...
thats actually a very good -and important- learning, that they may input decimal numbers, but what the programs using is a _floating_point_representation_ of that dec. number
hmm,actually is a way on x16 to use double precision floats? or you need to do 'tricks'
 

BruceMcF

Active Member
May 19, 2019
139
42
28
thats actually a very good -and important- learning, that they may input decimal numbers, but what the programs using is a _floating_point_representation_ of that dec. number

hmm,actually is a way on x16 to use double precision floats? or you need to do 'tricks'
I think the trick would be loading a Basic that has standard singles and doubles ... there's been more clamoring for direct support of integer operations than for doubles. For one thing with extended singles it's not quite as urgent. And the extension of the range of the exponent would be largely lost for most CX16 Basic applications.
 
May 22, 2019
486
250
43
I'll go a step further... if I was designing a BASIC-like language (call it SIMPLE) for an 8-bit computer, I would not use binary floats at all. Instead, I would use arbitrary length BCD numbers and binary integers. BCD math is not only straightforward, but 100% compatible with decimal numbers, so if I enter 100.3 in the program, that exact value gets encoded as something like $01 FF 1003. (4 digits, -1 exponennt, digits are 1 0 0 3).

Or if I enter just 1003, the binary value $EB03 get saved.

Of course, SIMPLE would also allow inline assembly, subroutines, functions, and compound types. I guess my SIMPLE is a lot more like Quick BASIC than BASIC. :)

Man, now I want to prototype this on my PC...
 
  • Like
Reactions: BruceMcF
May 22, 2019
486
250
43
hmm,actually is a way on x16 to use double precision floats? or you need to do 'tricks'
Commodore BASIC uses a 5-byte float. It has a full 32-bit mantissa (so it can represent a 32-bit integer with full precision), and an 8 bit exponent. However, it has no designated INF or NaN value.

A single precision float in the IEE-754 standard is actually 32 bits, with a 24 bit mantissa and an 8 bit exponent. So Commodore's floats actually have more precision than the standard IEE single. A Double has a 53-bit mantissa and 11 bit exponent.

So the Commodore float is about halfway between a Single and Double in modern systems. (Which is now encoded in hardware, thanks to the standard inclusion of the FPU in the 80486DX and later CPUs.)
 
  • Like
Reactions: BruceMcF

LRFLEW

New Member
Sep 23, 2019
17
4
3
So Commodore's floats actually have more precision than the standard IEE single.
For normal values, yes. You already mentioned how Commodore BASIC's 5-byte floats lack Inf and NaN values, but it's also lacking subnormals. When the exponent is 0, Commodore BASIC handles the value as ±0 for any mantissa. For IEEE floats, the 0 exponent indicates subnormal values, allowing for more values closer to 0. The smallest positive number in Commodore BASIC should be around 1.175E-38 2.94E-39, while IEEE single precision can represent 1.40E-45 using subnormals.

EDIT: Forgot that the exponent works differently for BASIC floats. Adjusted the smallest positive value accordingly.
 
Last edited:
  • Like
Reactions: BruceMcF

BruceMcF

Active Member
May 19, 2019
139
42
28
For normal values, yes. You already mentioned how Commodore BASIC's 5-byte floats lack Inf and NaN values, but it's also lacking subnormals.
This is important for a number of areas of scientific programming ... especially subnormals of doubles.

But it doesn't seem like there will be a lot of REAL scientific programming done on the CX16 ...
... and for teaching principles of scientific programming, and electing to use the built in Basic over one of the variety of options which appear likely to be available, you can set up problems to work within the limitations of extended singles without subnormals.
 

BruceMcF

Active Member
May 19, 2019
139
42
28
I'll go a step further... if I was designing a BASIC-like language (call it SIMPLE) for an 8-bit computer, I would not use binary floats at all. Instead, I would use arbitrary length BCD numbers and binary integers. BCD math is not only straightforward, but 100% compatible with decimal numbers, so if I enter 100.3 in the program, that exact value gets encoded as something like $01 FF 1003. (4 digits, -1 exponennt, digits are 1 0 0 3).

Or if I enter just 1003, the binary value $EB03 get saved.

Of course, SIMPLE would also allow inline assembly, subroutines, functions, and compound types. I guess my SIMPLE is a lot more like Quick BASIC than BASIC. :)

Man, now I want to prototype this on my PC...
Yes, arbitrary length decimal floating point and signed 32bit integers covers a heck of a lot of ground. For one thing, you don't need to worry about unsigned integers for addresses and signed integers as ... well, as integers if all addresses lie within the positive range of the signed integer you are using.
 
Last edited: