Commodore 64 vs ZX Spectrum Performance

artik

New Member
Jan 5, 2020
9
0
1
There is lots of discussions about performance of Zilog Z80 vs 6502. One has higher clock/more registers but another lower cycles per operations.

Recently I implemented simple program that performs machine learning for hand written digits for ZX spectrum (deep-learning/neural networks)

http://blog.cppcms.com/post/125

I decided to "port" the "C" code that uses fixed point computations to C64 - basically some tweaks to compile under cc65 remove custom assembly for z80 and run it.

https://github.com/artyom-beilis/zx_spectrum_deep_learning/blob/16_bit/train_commandore.c

To my surprise the one training cycle took about 461s = 7m 41s While on zx spectrum exactly the same code (without custom assembly) with z88dk took only 101s=1m41s
More than 4.5x difference. Actually the same variant of the program but with floating point numbers was still faster than c64 fixed point version.

I'm not that surprised Zilog Z80 is considered very good processor, for example I could write some code that multiplies 16 bit by 16 bit ints with 32 bit result without accessing global memory - all registers. So it is reasonably more powerful - but since I almost certain that many would disagree I want to ask:

Is there any better C compiler for Commandore 64? (Also it seems that cc65 is quite recent one.)

(Note tests done on vice-emulator and fuse-emulator - both provide cycle accurate simulations)


Edit: I analyzed the code that zcc - Zilong Z80 compiler produced and it wasn't good at all. I can't say anything about cc65 since I'm not familiar with 6502 assembly but at least for Zilog it wasn't very optimized for sure.
 
Last edited:
May 22, 2019
538
282
63
I don't think you're going to find a better (or much better) C compiler. The problem is that the 65x CPUs are notoriously bad for C code, because there simply aren't enough registers. So while the 6502 has 2.4x the raw throughput as the 8080 at the same clock speeds, the 8080 and Z80 are arguably better processors. (Personally, I prefer coding for the 8080 over the 6502, for that reason.)

This is complicated by Commodore's decision to nearly completely fill the Zero Page area, leaving very little space for user code to use ZP as additional registers.
 
  • Like
Reactions: grommile

BruceMcF

Active Member
May 19, 2019
174
52
28
Note that there is over half the zero page available in a c64 when you don't use Basic... but a 6502 C will be terrible in any event, because reentrant calls are expensive on the 6502 and 16bit ops are expensive, and C relies heavily on both. With cc65, you are programming the cc65 VM more than the 6502, and pay a heavy clock penalty for it. PL/M-80 be a better fit, with a true unsigned 8 bit type and calls. Don't know of one for the 6502, though.
 

artik

New Member
Jan 5, 2020
9
0
1
Note that there is over half the zero page available in a c64 when you don't use Basic... but a 6502 C will be terrible in any event, because reentrant calls are expensive on the 6502 and 16bit ops are expensive, and C relies heavily on both. With cc65, you are programming the cc65 VM more than the 6502, and pay a heavy clock penalty for it. PL/M-80 be a better fit, with a true unsigned 8 bit type and calls. Don't know of one for the 6502, though.
This is actually similar to situation to Zilong Z80 and basic ROM. No program can use IY register since virtually every ROM routine will not be useful.

Z80 has two 16 bit index register IX and IY - they are similar to BP/DI/SI of x86 architecture. And very useful so basically C programs use IX as frame pointer for stack (BP in x86) and IY that can be used for indexing of arrays and other structures in unavailable.

I can try to change all variables to static to omit stack use, use more global variables and see if it helps.

I still puzzled with missing floating point support or at least library that can help with it.
 

BruceMcF

Active Member
May 19, 2019
174
52
28
I still puzzled with missing floating point support or at least library that can help with it.
That may well be a matter of the kinds of programs that people have been looking to write with cc65 ... juggling turning the Basic ROM on and off to get access to the not-quite-standard floating point routines they contain isn't something that's going to attract a lot of attention when the most efficient algorithms for the processor are more likely to use integer or fixed point than floating point math.
 

artik

New Member
Jan 5, 2020
9
0
1
... juggling turning the Basic ROM on and off to get access to the not-quite-standard floating point routines they contain isn't something that's going to attract a lot of attention...
The z88dk I use for Zlog Z80 provides its own custom FP libraries providing IEEE float and ROM implementations as well.

I agree that fixed point is frequently better for such a system. In fact my neural network implementation works well using fixed point computation on spectrum but is it tricky to get there. Especially when computations are complicated.

I did lots of physics simulations on my spectrum when I was in school it was very good education tool especially with its easy access to graphics from basic (as I understand today it's not something that comes out of the box - c64)
 

artik

New Member
Jan 5, 2020
9
0
1
BTW on the same topic I managed to improve my basic implementation mostly doing some loop unrolling in convolution implementation (unrolling small loops of3/4 iterations) reducing overall running time by ~50%. The gap between spectrum and c64 closed to ~1.3 if favor of c64 instead of 2.0
 

BruceMcF

Active Member
May 19, 2019
174
52
28
BTW on the same topic I managed to improve my basic implementation mostly doing some loop unrolling in convolution implementation (unrolling small loops of3/4 iterations) reducing overall running time by ~50%. The gap between spectrum and c64 closed to ~1.3 if favor of c64 instead of 2.0
I have seen arguments that for integer intensive assembly language programmed processes without an extremely large number of multiplies, the two processors clocked at the same speed will leave the 6502 at roughly 2.5 times faster ... the z80 gains back some of the difference between T cycles and M cycles with instruction set efficiencies ... but if there is a lot of multiplies or if you need a stack of over 256 bytes, that can rapidly head toward parity, or reverse.

Obviously if you can clock the Z80 processor at twice the speed of the memory bus with a single wait state on the memory access cycle, that means the two processors would be in the same neighborhood for much code, and the Z80 would run up the score whenever it hit a lot of indirect address referencing or multiplication.

And AFAICT, one thing that the original 6502 does less well than a z80 is run VM's, so in a C compiler supported by a runtime VM to fill in the gaps for what C assumes that the processors don't provide, not only is the 6502 running to the VM for a lot more pseudo-ops, but it's executing them a lot less efficiently when it does so.

That's part of why Wozniak's Sweet16, while a marvel of memory-frugal spaghetti coding, runs so slow. I expect it could be sped up by 20%-40% if it was disentangled and re-implemented for a 65C02 ... though it would get less memory frugal in the process, and might require in the neighborhood of 512-640 bytes rather than 380 or so.
 
Last edited:

artik

New Member
Jan 5, 2020
9
0
1
That's part of why Wozniak's Sweet16, while a marvel of memory-frugal spaghetti coding, runs so slow
Actually something similar exists in ZX Spectrum. It has stack based floating point calculator using reverse polish notation like in FORTH for example

rst $28 ; start call $0028
defb $04, $0f ; * + codes
defb $38 ; end of calculator code

Given a,b,c on stack calculates a*b+c

It is something like 8087 co-processor but in software.

Very useful for performing math and working in assembly.