Working on Sweeter16 for the CX16

BruceMcF

Active Member
May 19, 2019
206
63
28
I don't even have the CX16 emulator installed yet, and yet continue plugging away at getting stuff up and running for it.

David Murray mentioned Steve Wozniak's Sweet16 virtual machine. But while the code is widely available, is still IS copyright Apple. Also, it has far from the most effective VM model, since it heavily prioritizes code-space over operating speed.

So two to three weeks ago, I started working on a fresh implementation of a new Virtual Machine to execute Sweet16 code, based on a JMP(abs,X) execution model and direct callback jumps and branches rather than a "LP: JSR DOTOKEN: JMP LP" model. At the cost of some code-space, I figured I could trim over 2/3 of the VM engine overhead. The parser has much less that can be squeezed out, but I squeeze a little more out in the version that uses a page-table to parse the high nybble without requiring a save/restore or fresh read of the opcode for the bottom nybble.

It's certainly far from bug free, but today it passed the first three test routines I threw together for it, and since I have some other things to do for the next day or to, I decided to put it up on GitHub and make it available to others to bang on.

It's CC0, so feel free to share in other places (eg, 6502.org) if you want to gather feedback from further afield.

https://github.com/BruceMcF/Sweeter16

I don't have assembled CX16 binaries, but I do have binaries assembled to $CC00 in the C64 Golden Page ... since at present I am testing it via VICE emulation of the SuperCPU (xscpu.exe).

I should note that it adds three new instructions, where Woz's version has NULL's: ADJ0 and ADJS to add the sign-extended 8bit offset to the accumulator and stack index, and CALL to call a 65c02 machine language routine with the address in Register 11 (Since R15 is IP, R14 is Status, R13 is the comparison result register and R12 is the stack index).

All three are relatively cheap instructions ... the first two just requires the signed offset in the BR instruction to be X-indexed, and then enter that routine after the point where X is pointed at the IP register, and the last one is just: "JSR +: JMP BR: + JMP(R11)". I am free to add them because I am not tied to a single page for the start, but they have to have a low codespace cost, because I have a self-imposed limit of 2 pages for the smaller version and 3 pages for the bigger version.
 

Attachments

Last edited:
  • Like
Reactions: rje

rje

New Member
Nov 6, 2019
27
12
3
Thank you for doing this. I'm going to dive in and see if I can grok it. It would be nice if some version of Sweet16 made it into the X16 ROM.

Since I do have the X16 emulator, I can try it out there and see what happens.

Something to think about: there are "virtual registers" R0-R15 in the X16 already, starting at 0x02, to support the X16 ABI.


And hey -- what are the benefits of the 3-page version versus the 2-page version? Ah, you said there's a modest speed gain. How much is modest?? I tend to feel that any speed-up is worth 256 bytes on the X16.


PPS - I like your term "Swift16" there in the comments.
 
Last edited:
May 22, 2019
597
303
63
So if I understand this, Bruce, this extends the 6502's opcodes by adding additional psuedo-operations that can be executed inline?

I'd love to see a more detailed manual on the use of the S16 opcodes, along with some example code. If you can get this include into the CX16 kernel, then it will also give us a nice set of options for 16-bit math. Most importantly, it will give us a common pattern to work from - not to mention a way to smoothly transition to a math coprocessor, if someone choose to build one in the future.
 
May 22, 2019
597
303
63
I'm also going to drop a forwarding link over in the Commodore 64 section, since this is technically a Commodore 64 program at the moment (until you get it running on the Commander.)
 

BruceMcF

Active Member
May 19, 2019
206
63
28
So if I understand this, Bruce, this extends the 6502's opcodes by adding additional psuedo-operations that can be executed inline?
The CALL adds ONE 65c02 call to an address stored in R11. But if that routine is:
LDY #1
LDA (IP),Y
TAX
JMP (extended,X)

... then yes, that would be up to 128 extended op codes if you wish. In that case, the it would be:
CALL 1,extendedop

... or more than 1, if the extendedop took embedded data:

!byte CALL,3,op_printstring
!word buffer1
 
Last edited:

BruceMcF

Active Member
May 19, 2019
206
63
28
And hey -- what are the benefits of the 3-page version versus the 2-page version? Ah, you said there's a modest speed gain. How much is modest?? I tend to feel that any speed-up is worth 256 bytes on the X16.
Parsing a main operation in the original approach (but the 65c02 opcodes and X indexed execution, for fair comparison) is something like:
LDA (IP)
TAY
AND #$F0
BEQ BRANCH
LSR
LSR
LSR
TAX
TYA
AND #$0F
ASL
STA STATUS
; +28 clocks?

With the tabled parse, it is something like:
LDA (IP)
BIT #$F0
BEQ BRANCH
TAY
LDX Sweeter16,Y
AND #$0F
ASL
STA STATUS
; +23 clocks?

So I think about 5 clocks .. mostly because the tabled parse is non-destructive ... but it's the instruction dispatch, so it occurs for every Sweet16 instruction.

Add to the ~24 clocks saved in the dispatch makes about 29 clocks saved there. There may be a few more compared to the original depending on which op is called because the re-use of ops is not as aggressive ... I don't call DCR as a subroutine, I inline it, so that is 11 clocks saved in bytewise ops that do (--Rn), 22 in bytewise ops that do (--Rn), etc.
 

BruceMcF

Active Member
May 19, 2019
206
63
28
Thank you for doing this. I'm going to dive in and see if I can grok it. It would be nice if some version of Sweet16 made it into the X16 ROM.
If something is unclear, go ahead and ask ... it may be I need a comment, or a better one ... or it may be a bug and I need to fix it!

Something to think about: there are "virtual registers" R0-R15 in the X16 already, starting at 0x02, to support the X16 ABI.
That's exactly where I put the Sweeter16 virtual registers.

PPS - I like your term "Swift16" there in the comments.
It's version 0.0.x ... any naming feedback/suggestions accepted! Better names for ADJ0 and ADJS, for instance.

I hesitate to call the 3 page version Swift16 VERSUS the 2 page version ... I thought it would be a bigger speed improvement when I came up with it. But maybe Swift16 for the whole VM with the X-indexed jump. "Sweeter16" is a bit cheeky, but I don't think Woz would mind.
 

BruceMcF

Active Member
May 19, 2019
206
63
28
I'd love to see a more detailed manual on the use of the S16 opcodes, along with some example code.
I have Woz's 1977 Byte article, but I am trying to be scrupulous about copyright with this repository, so I can only include that by reference. I do have the start of an operation table now up in the main branch. Anyone who has copyright OK to distribute explanatory material, feel free to do a pull request.

I'll leave it to someone smarter at ACME macros than me to define ACME macros for it, and since so many are using the cc65 assembler, macros for that assembler will also be needed.
 

rje

New Member
Nov 6, 2019
27
12
3
I have Woz's 1977 Byte article, but I am trying to be scrupulous about copyright with this repository, so I can only include that by reference.
There is also:

And in particular, this old web page on 6502.org on how to port Sweet16 is very helpful:

While writing Apple BASIC, I ran into the problem of manipulating the 16 bit pointer data and its arithmetic in an 8 bit machine...

SWEET16 should be employed only when code is at a premium or execution is not. As an example of her usefulness, I have estimated that about 1K byte could be weeded out of my 5K byte Apple-II BASIC interpreter with no observable performance degradation by selectively applying SWEET16.
Theoretically, then, if Commodore BASIC were re-engineered by some nutjob who's particular insanity is BASIC interpreters, then Bruce's variation on SWEET16 would be something to look at.


Create your own Version of Microsoft BASIC for 6502
....of course, I just found this (https://www.pagetable.com/?p=46) written by no other than Michael Steil.
 
Last edited:

BruceMcF

Active Member
May 19, 2019
206
63
28
If I can get copyright permission for anything in that set, I will happily include it ... otherwise I will refer to it by link.

I have a couple of bug fixes done today, so they are now committed to the main branch. I've now exercised most of the register ops and the main logic for the branch ops, so when I get test code up for any register op I may have missed and for the branches other than BNZ, and fix any problems that may show up, my work on the main code will be pretty much done and I can look at getting the NMOS version done.
 

BruceMcF

Active Member
May 19, 2019
206
63
28
By the way, look in the development branch for the most recent SweetCX16 file and binary. The version assembled to $CC00 for the SuperCPU C64 emulator seems to work, so this should be good to go. You can trim off the sample Sweet16 code and put your own there, you'll have a little over 256 bytes in the Golden RAM to play with. https://github.com/BruceMcF/Sweeter16/tree/development
 

BruceMcF

Active Member
May 19, 2019
206
63
28
OK, the NMOS version for the C64 and the 65C816 versions have both passed the (rudimentory) test code I wrote, so I've merged it back into the main branch. I'll leave it there for anyone to pick up who wants it.