The lesser known CPUs: CoolRISC 816


Looking back into the past

Hello, everyone. Good to see you all here!

I'm going to tell you a story about a neat thing that mostly belongs to the past. We can still learn from it, even though we probably won't be using it in practice.

When you hear the word "CR816", you probably aren't going to have much of an idea what is being talked about. I think that this is a shame. Most notably CR816 seems to be a battery, a pump tester device, and a type of a car tire.

Those are not the CR816 I am going to talk about. I am going to talk about the CoolRISC 816, an adorable microcontroller CPU that really deserves a little bit more time in the spotlight.

Programming for the CR816 is like writing poetry. The machine is predictable, elegant, consistent, and does not cause difficulties when you try to accomplish something. At the same time, despite being elegant the CPU is at least a little bit odd. I find its strangeness endearing. Half of the joy in working with this thing is figuring out the adorable quirks of the design.

It isn't my favourite architecture, but my favourite architecture does at least see some use and isn't at a risk of falling into the memory hole.

If you'd like to read along, there are datasheets out there. The most descriptive one can be found by looking for the document number, R0105-078. There is a whole bunch more, but they are about as unhelpful as it gets for learning about the CPU. I'm sure there are some more, probably modernized, versions of this PDF out there, but I wasn't able to find them easily available on the Internet.

Let's dig right in!

Strange codefellows

The first unusual feature of the CR816 is the instruction length. Instructions are 22 bits long. This is not that uncommon in the land of microcontrollers, but it is mildly surprising to the people who insist that all things that a CPU deals with must come in units of 8 bits. Nope.

The CR816 is a Harvard architecture machine. This means that the instructions and the data are completely separate from each other. The address zero for instructions and for data will have different patterns of bits in them. You have no means of using a program to write to the instruction memory. Want to change the program? You need to do this from the outside of the chip, if the thing is reprogrammable at all. Not unusual for a microcontroller, considering you aren't likely to be programming it in the field.

The CR816 claims to be a RISC machine. Sure enough, all instructions execute in one clock cycle. Surprisingly, this includes branches, regardless of if taken or not. No branch or load delay slots exist either, getting rid of a common kind of concern with programming for some RISC processors. There never are any pipeline stalls either. Counting cycles is trivial, just count the instructions.

With no delay slots, no stalls, no branch prediction and no branching in two cycles the people who know a bit about circuit design may start to suspect the CR816 of either being slow or cheating just a little bit. They would be right. More on that later.

The architecture is fairly register-rich, with sixteen 8-bit registers. (This is where the name CR816 comes from. Cool RISC. Eight bits. Sixteen registers. CR 8 16. CR816.)

Two registers are not freely usable, the status register and the accumulator. The accumulator isn't much like the accumulators of other microcontrollers that feature an accumulator. The accumulator constantly updates to hold the last result that came out of the ALU, making it a great place for (very) temporary storage and utterly useless for anything else. You can make any instruction do nothing but update the flags by having it use the accumulator as the destination.

The other fourteen registers are more or less for general use. We'll get to them when we talk about addressing modes on this machine.

The status flags are pretty sparse. You have the standard set of a zero flag and of a negative flag. There is also an overflow flag, for signed operations. That's it, but it isn't a big deal either for a microcontroller. The zero flag seems to be tied to the content of the accumulator, but not having seen the design itself I have no certainty.

The instruction set overview

There are 41 instructions listed in the datasheet, as well as 8 assembler aliases. For the most part this is standard stuff.

You have your typical additions and subtractions, with and without carry. Logical operations have OR, XOR, CPL1 (aka NOT) and AND. There are some bitwise operations. There are also single shifts present. Rotations are also there, done through the carry bit.

This is when things get a little bit interesting. There is, as a standard feature, an 8-bit multiplier! Because multiplying two 8-bit numbers can have a 16-bit result, the designers had to get a little bit clever. One part of the result gets put in the accumulator and the other one goes to a chosen register.

The multiplier allows for one more trick. You can do multiple shifts, by multiplying through special constants. There is some limit of the multiple shift capability when it comes to addressing modes, but it isn't anything that's crippling. The assembler takes care of translating the multiple shifts to a multiplication instruction with the correct constant.

We also have two conditional move instructions, both dependent on the carry flag's status. Quite handy.

There is a reverse subtraction instruction as well. You already have SUBS R0, X which means R0 := R0 - X. CR816 also gives you SUBD, in which SUBD R0, X means R0 := X - R0. This redundancy may seem odd, but is explained by...

The addressing modes

This is where things get really interesting.

CR816 has six addressing modes, available for use with most operations.

The basic mode is an operation on three 8-bit registers.

OP REG1, REG2, REG3
REG1 := REG2 OP REG3

This is the typical three address code that you may be familiar with from many RISC processors. You get to specify the two source registers and the target.

Using this mode, you can also get standard two operand operations. If your target register and the first source register are the same, then you can represent this by a bit of shorthand.

OP REG1, REG1, REG2
OP REG1, REG2
REG1 := REG1 OP REG2

So far, so good. Perfectly nice machine, though you possibly wouldn't expect three-address code in a microcontroller, even though CR816 isn't the only one to have it. Moving on.

The second mode specifies an operation, a target register and an eight-bit constant.

OP REG, #DATA
REG := REG OP DATA

Again, nothing that surprising here.

The third mode specifies an operation, a target register and a constant location somewhere in the first 256 bytes of the data memory. The data item will be fetched from the location and the operation will proceed as in the second mode.

OP REG, LOCATION
data := fetch(LOCATION)
REG  := REG OP data

If you are familiar with zero-page addressing from the 6502 then you should feel right at home. This is exactly what this is. Hmm... I don't think we are starting to stretch the whole "RISC" thing a little bit at this point. We aren't a load-store architecture. Fine, you don't need to be a strict load-store machine to be RISC. Well, I'm sure it won't get much worse. Wait, there are three more modes? Uh-oh...

The fourth mode is an indexed access mode with an optional constant offset. You can skip specifying the offset if it is zero.

OP REG, (I_REG, OFFSET)
OP REG, (I_REG)
addr := I_REG + OFFSET
data := fetch(addr)
REG  := REG OP data

Yeah. Not only we are now doing an ALU operation and a data access, we are also doing an addition for the memory address. Powerful.

This is a good time to mention that CR816 can address 64 kilobytes of memory. How does an 8-bit machine accomplish that? Easy! Each index register actually maps to two registers from the register file. Your i1 index register is actually a combination of the i1l and i1h registers. A similar story happens with instruction memory indexing, where the ip register is actually made out of ipl and iph registers paired up. There is one register for indexing into the instruction memory and four for indexing data memory.

Moving right along, other modes await...

The fifth mode is a lot like the fourth mode, but it uses the r3 register instead of a constant bit of data.

OP REG, (INDEX, r3)
addr := INDEX + r3
data := fetch(addr)
REG  := REG OP data

We can only use r3 for the offset, but it still is a handy operation.

Well, what could possibly the final mode be, then?

OP REG, (INDEX, OFFSET)+
addr  := INDEX
data  := fetch(addr)
REG   := REG OP data
INDEX := addr + OFFSET
OP REG, -(INDEX, OFFSET)
addr  := INDEX - OFFSET
data  := fetch(addr)
REG   := REG OP data
INDEX := addr

Yeah, you gotta be kidding me. This is about as RISC as I am a pink elephant riding a bicycle made out of fish on a rainy Sunday morning in July. We are performing a 16-bit arithmetic operation to get an address from an index and an offset, we fetch the data from the memory, perform the ALU operation, write the result of the operation to the register file and also update the index register.

What's not so great is that OP r0, (i0, 10) and OP r0, (i0, 10)+ will access completely different memory locations. In the first case the index and the offset get added to get the address. In the second case the access will head to the memory location given by i0 alone, without factoring in the offset. The offset does eventually get added to i0 after the ALU operation, for what's that worth. If you plan ahead this won't be a big deal, but I don't care much for the two very similar bits of syntax meaning two quite different things. The behaviour itself is completely fine, mapping closely to common C constructs. It is just the syntax similarity I find bad.

In the memory-utilizing addressing modes at most one of the two ALU operands can come from the data memory, the other being taken from the destination register. The reverse subtraction allows extra flexibility for the only ALU operation in which the order matters. We can have either the subtrahend or the minuend be the thing loaded from memory, depending on if we are doing regular or reverse subtraction.

As a reminder, the CR816 runs all of the above in one clock cycle each, without ever stalling. I have only one word here: How!?

The odd bits

I think it is high time we talk about the things that might make you shake your head a little bit. Every machine has those and they are what makes life interesting. We already had the first, the oddities of the syntax for the final addressing mode. But wait, there's more!

Next up is the pipeline. The claim is that the pipeline is running in three stages. This is a little bit arguable. The page 1-9 of the datasheet (the 20th page of the PDF) to me looks like a five-stage pipeline. The trick lies in the fact that two stages of the pipeline run per clock cycle, one on the positive edge and one on the negative. This is... unusual and I'm not sure if I'd call this a three-stage pipeline. The first stage of the pipeline also does not really do any processing, making this probably the only microcontroller with what amounts to a dedicated "drive/synchronize/delay" pipeline stage. Well, it worked for Pentium IV, must work for the CR816, right?

The most obvious negative consequence of a double-edged pipeline is that the design might not be able to reach high clock speeds. Sure enough, the CR816 datasheets that I have seen around the place top out at 20 megahertz, though usually less. This isn't so bad for the use case, considering that power consumption would grow very quickly with increases in clock speed.

On the bright side, given enough pipeline bypasses you never need to stall or have a delay slot. Good and clean user experience design.

Other thing that come to mind as odd are things around the CPU flags. Not all MOVE instructions update the flags in the same way. There does exist an instruction to save the flags, which don't reside in the status register and need it to be saved. Sadly, the SFLAG instruction not only does not save, but outright destroys the zero flag. This could have been easily avoided. There is a workaround, by saving the value of the accumulator first.

There is also an event processing mode. An event is like an interrupt, but it does not change the execution flow. You need to explicitly check for an event to be able to handle it. Events do wake up the CPU from sleep modes.

Finally, the pipeline bypasses aren't perfect. The pipeline will never stall, but you will get delays in some cases. Event and interrupt settings take effect with a slight delay. You also need to insert a delay between using the multiplier to store something to an index register and using the index register as a source of an address.

The other kid on the block

So, why aren't we hearing more about the mighty CR816? The ISA, aside from the few warts, is a dream to work with. There even exists a GCC port for this architecture, if you don't want to get your hands messy with writing assembly! You can get the microcontrollers from a handful of places (search for "XE8000" or "EM6812"), but you would have to dig pretty deeply for it. Datasheets for the CPU core are somewhat hard to find by using Google. There isn't that much presence there.

I partially blame the competition. While I was not able to pinpoint the exact date of the launch, The MSP430 was out at most a year after what I believe to be the first CR816 paper. If I were to permit myself some guesswork, I'd say that it was probably mass-produced before a mass-produced CR816 came into being.

MSP430, like the CR816, has sixteen registers (twelve freely usable) and is designed for low power consumption. The major difference is that the MSP430 is a 16-bit machine, making many operations easier. Almost every register can be used for an index and registers don't need to be paired up. The MSP430 also has TI backing it. CR816 had Xemics, not exactly a company that I'd think of as notable (and not one that really survives to this day, being acquired by Semtech in mid '00s).

The MSP430 has much more powerful addressing modes. For instance, it can perform memory-memory operations. There is the same behaviour of one addressing mode doing a post-increment, but it is not an issue due to using a completely different bit of syntax (It looks like this: @R12+), removing the confusion between (A, X) and (A, X)+.

Both were advertised as RISC machines, though neither really is. Gotta get your buzzwords in, I guess.

CR816MSP430
Usable registers1412
Register width816
Index registers4 + 112 + special cases
Instruction width22 bitsVariable, 16+ bits
Addressable memory64K64K
Pipeline depth"3"/53
Worst CPI16(!)
Stack(Hard/soft)wareSoftware
MultiplierYesOptional peripheral
Is named "Cool"Totally!Nope!
Claims to be a RISCIn the nameAwkwardly
Actually is a RISCNot reallyNo, no, no and NO!

Pretty comparable machines, all told. The CR816 seems to be at least a bit competitive. Another point of comparison could be the AVR, but I'll leave that one for someone else.

Closing thoughts

Corrections are welcome. This isn't my field, so I may be wrong somewhere.

RISC is a buzzword.

Cool stuff never gets used.

At least I got to tell a tale of something that I found cute.

I need a new bike.

If you want me to keep on writing, give me some coins to keep me motivated!


Past: Fun with brute-force testcase reduction