The lesser known CPUs: dsPIC33F

Power overwhelming

If with great power comes great responsibility, then the dsPIC33F is probably not for the irresponsible. (Rumour has it that Colin Furze is legally forbidden from approaching within a nautical mile of one of those.) Therefore please don't use it.

Nah, I'm just kidding. We're totally gonna use it, if at all possible.

Let's talk about the most insanely overpowered (yet still breadboardable) device that I have ever gotten my hands on. And good news: Unlike the CR816, you can actually get your hands on this lovely wonder. Don't worry, this will be probably the most common of the CPUs that I will ever say a word about in this series. We'll go more obscure next week!

The machine itself is quirky as all hell, but in a good way. Most of the charm comes from shaking your head about how much stuff they managed to pack into those devices. Come along with me, I'll show you!

As usual, you can read along. You can look for DS70157F.

It does not start out so badly

A new microcontroller moves next door. Being of curious nature, you visit them to say hello.

At first glance the dsPIC33F does not seem that strange. Instructions are 24 bits each, so at least a multiple of eight bits. The data buses and registers are sixteen bits wide, very nice. Sixteen registers, fifteen usable freely in assembly. Awesome to have so many available. This could be interesting. On the other hand the register names are a bit odd, using the letter W instead of the more usual R. This comes from the registers being Working registers. W0, W1, ... W15. W0 is also called WREG. From our perspective this is just a strange naming scheme and we don't have to worry about it.

The flags are plentiful. You have the standard set of: Z, C, N and OV. Zero, carry, negative and overflow. Then there are the others, less expected, flags: OA, OB, SA, SB, OAB, SAB and DC. The DC flag is used for decimal operations. The rest tends to be used in some serious number crunching tasks. Clearly we are dealing with a colorful character.

Upon striking up a conversation it becomes apparent that the dsPIC33F is a bit pompous. They insist on being called a "Digital Signal Controller". Hmm... They are also honest. There is not a single claim to them being a RISC. They are a proper CISC machine. WAIT! DON'T GO! They are actually quite nice. Just give them a chance! Pretty please!

Oh, and another nice thing. They are available! You can actually get your hands on this beast. Some of them even come in PDIP packages. It gets even better! Some are even in the sample program, if you are lucky to be permitted sampling from Microchip. Otherwise they aren't exactly cheap, but at least are still obtainable.

Umm... I'm not so sure about this

The dsPIC33F can be a bit... intimidating. You see, the manual that describes the CPU is 502 pages long. This does not describe a complete microcontroller, just the CPU core. The peripherals are extra.

You must understand that what we are dealing with isn't your garden variety microcontroller here. This thing is meant for a kind of data crunching called Digital Signal Processing (DSP). The pompous naming is not just for show. To support this crunching there are 84 instructions claimed to be present in the CPU core. This is quite a bit more than expected. At the same time, there aren't really that many useless instructions present. The space is actually being used well.

Despite having a lot of instruction types, everything is pretty efficient. Most instructions execute in just one cycle, which is quite amazing considering how much an instruction can accomplish. Branches can take up to three cycles, though usually just two and untaken branches take up just one. There are pipeline stalls, though you would have to be trying pretty hard to hit one of them. Nothing that special here, though it isn't the CR816 when it comes to elegance of instruction timing. At least there are no delay slots to worry about.

The pipeline itself is pretty short, if the diagrams that I managed to gather are to be trusted, though the exact details aren't easy to find. There is a funny quirk common to some other microcontrollers, the input clock to the machine has to be twice as fast as the speed with which you want the instructions to execute. The materials when talking about the dsPIC33F's speed use the term "instruction cycle", which is equal to two clock cycles. Interesting.

Oh, and the thing can run at a speedy 40 MIPS. Intimidating indeed.

There's an instruction for that

The PIC has the standard arithmetic set of instructions. There are also all the standard logic operations. The shifts, rotates and branches are very comprehensive. This is to be expected. There is more, however.

Imagine something you'd like to have some help with. Picture some task that you hate doing in assembly. Odds are that there is an instruction or feature that will help.

Would you like to repeat a single instruction a bunch of times? Just use the REPEAT instruction and it will do that for you. No loop overhead here, just one instruction being fed into the pipeline again and again in a very tight loop.

Maybe you'd like to have a bigger loop than one instruction. You haven't changed from the previous paragraph, you still really hate to keep checking for that end condition. The DO instruction is a hardware loop that avoids the overhead of repeated checking and branching.

Nested hardware loops? The DO instruction has shadow registers, allowing you to nest the thing once without any worries, just run a second DO instruction within a DO loop. You can nest even deeper, if you do a few bookkeeping tasks in software first.

Multiplying numbers? There's a multiplier (17x17!), several multiplication instructions, and even two 40-bit accumulators for large results. Oh, did I mention that this 16-bit machine can do some higher-precision operations? Well, now I have.

Multiple shifts, but aren't in the mood for a REPEAT instruction? The barrel shifter awaits your command.

There's a divider too, kinda. It does eat up several cycles, but at least you have one, if multiplying by the inverse isn't an option.

An interrupt! What should you do? Often you'd want to save the flags and the registers before you start doing things to handle the interrupt. You don't have to do that, you have shadow registers. The PUSH.S instruction will save some of the flags as well as four of your working registers. This, not using the software stack, runs in one cycle. The only drawback is that you can do this only once, the shadow registers being just one level deep. POP.S restores the state.

Multitasking and locking? There's what looks like an atomic Test-and-Set instruction for that, BTSTS. Anyone want to try to write an OS for this architecture?

I think you get the idea.

Addressing that does not suck

If you have dealt with PIC microcontrollers you may be a bit anxious at the point of talking about addressing modes. The 8-bit models are accumulator machines and pretty much everything has to touch the accumulator on its way through the processor. This makes working with accumulator machines a bit annoying. The 16-bit PICs aren't like that.

Yes, there is an accumulator, but mostly there for reverse compatibility. The interface exposed to the assembly language programmer is actually quite nice.

You have your typical three-address code for register to register operations.

OP reg1, reg2, reg3
reg3 := reg1 OP reg2

You aren't limited to all three operands just being data stored in registers, however. In this case both reg2 and reg3 can be used as pointers to memory, possibly with post/pre-modification. This can create some pretty interesting instructions.

ADD W4, [W5++], [W6--]

This one gets some data from memory, using the address in W5. W5 is incremented. The data that has been read is added to the content of W4. This in turn is stored at a location given by W6. W6 is decremented.

Each of those modifications to W5 and W6 could have happened before or after the address was used to access the memory. You can mix and match.

Next up, you have the literal mode. The instructions for the dsPIC are pretty large, so you can cram in a ten bit literal easily and still not have to pay the cost of a dedicated instruction that just loads data. This saves you an instruction.

OP #constant, reg
reg := reg OP constant

Can you cut down your literals a little bit more, down to five bits? If so, there is another instruction format available to you.

OP reg1, #constant, reg2
reg2 := reg1 OP constant

Quite handy, but this mode has one more ace in the hole. The destination can be a form of indirect access, including the usual post/pre-inc/decrementation.

Then there is the direct addressing mode. There are CPUs which allow for convenient access to the first 256 bytes of the memory space, usually to make peripheral access simple. The direct mode on the dsPIC is 32 times that size, a full eight kilobytes. This convenient memory is called the "near" memory.

OP location
location := WREG OP location
OP location, WREG
WREG := WREG OP location

The one big downside of the near memory direct addressing is that the only of the working registers that you can use is the accumulator, WREG. This does not mean that the architecture is useless though, just maybe a little bit less convenient. On the bright side, there are also plenty of instruction that perform a read-modify-write operation that does not need a register. Want to decrease a data item somewhere in memory by one? DEC the location and you are done. No registers used at all. One cycle.

(Did I mention that there are dedicated instructions to inc/decrement by two? There are, INC2 and DEC2. They love the direct mode.)

What is even more interesting is that the very comprehensive bit manipulation operations can operate on any bit in the near memory. Want to toggle the 5th bit of the 50th data item in your memory? That's still just one cycle. Great for peripheral access, where you often may want to just set or reset a single bit.

If you run out of room in the near memory then you still can access the remaining part of your memory through MOV instructions or indirect means.

The problem of subtraction being an operation where the order of the minuend and the subtrahend matters has been solved just like in the CR816. There is a reverse subtraction instruction in case it happens to be more convenient to do things that way from an addressing mode point of view.

I'm going to stop here. I haven't even talked about the accumulators, the data prefetching, the computed function calls... I'm not here to paraphrase the whole manual, you can take a look yourself.

I will, however, leave you with this example, taken straight from the CPU manual, of how insane a single instruction can get:

MOVSAC A, [W9]-=2, W4, [W11+W12], W6, [W13]+=2

I'm sure that the kitchen sink is an implicit operand somewhere in there...

It isn't just the core that loves you

The CPU core is kinda amazing, but it does not live in isolation. I'm not going to spend too much time here, but let's take a look at what supports that CPU in getting stuff done.

The pandering to your sloth and convenience continue. Need to just move some data from one point to another, but would rather someone else do it for you? There's DMA as a peripheral. Yes, this microcontroller can have a DMA peripheral with up to four channels.

Or maybe you got sloppy and forgot that working registers are in an unknown state at system start. No problem, read one of those registers and the machine helpfully resets. Hey, beats giving wrong results, no?

Ring buffers? There is a feature of modulo addressing that makes you not have to worry about executing the modulo operation to keep yourself within the buffer. This includes oddball buffer sizes. Want to have fast operations over a buffer of one hundred bytes? No problem.

There's more goodies to do with interrupts too. You can have over a hundred sources of interrupts, though often they aren't all used. The interrupts are vectored, which means that each source of an interrupt can cause your code execution to transfer to a different location. No need to go and check what caused the interrupt in software, the dsPIC already knows and has done the branching for you. Oh, one interrupt vector table not enough for you? There are some provisions for alternate interrupt vector tables, to be used for debugging.

There is also support for bit-reversed addressing, though that is pretty specialized. If you ever need to work with Fourier transforms, you will be glad to have this. Really neat to see it in hardware.

It goes up to E-leven.

If the dsPIC33F did not blow your mind, there is another architecture that actually improved things even further. The changes are not a revolution, but do give you a little bit of extra room to manoeuvre. Good!

The dsPIC33E (that's an E at the end, not an F) runs even faster. Where the 33F series maxes out at 40 MIPS, for the 33E the max speed is 70 MIPS. This was accomplished by what looks like deepening of the pipeline, so taken branches and interrupts will hurt more. As long as your code isn't all branches, you can go faster. Another issue is that the nice read-modify-write operations can now take two cycles, depending on what is the location on which you are operating.

Here's some highlights of the improvements:

Are you too lazy to write two instructions, one to compare and one to branch? Yup, we have some new instructions that do a compare and branch in one go. (Though you will pay five instruction cycles if this does actually branch.)

Better DO instruction, with hardware nesting up to four levels in total? Check.

Extended precision multiplication improvements? Sure.

Shadow registers not enough for you? Enjoy your alternate register sets that can be automatically switched in during interrupts.

Oh, and those interrupts? Apparently there were not enough sources. The E can theoretically handle up to 246 sources of interrupts. Plus eight non-maskable interrupts as well. I pity the person who has the job that needs more than a few sources, let alone going into hundreds. How can you live with that? I hope that the possibility of so many interrupt sources is largely theoretical... Oh, and you still can have the alternate vectors as well.

The 33E is available in breadboardable packages too, if you are insane enough to operate a circuit that runs off a 140 megahertz clock on a breadboard. (Good luck!)

The heartbreak

There is just one small thing that makes the dsPIC33F an impossible choice.

The assembly is written with the destination operand on the right.

I'm sorry. Please stop crying. I know it could have been great, but there is no reconciliation when this comes into play.

They were an expensive pick too, at least compared to smaller machines. Otherwise it could have been perfect.

The use case

You may be wondering: "Why would I want a machine for data crunching?"

I would not want to use this machine for that. I'm happy to stick it into jobs that could be called "grunt work". I tend to run very few tasks on my breadboards and they don't tend to require a beefy chip. I want to get stuff done and move on to write my textbooks or to go for a walk. I need something that will get the job done conveniently, run fast enough, if I need it to run fast, and in general act like an extension of my mind. The dsPIC33F does that for me.

If you want something cheaper then there are many other good options out there. If, however, you aren't going into mass production and just want to play around, then using this massively overkilled machine is just about what your inner sloth ordered. Much like a race car, this feels good, even though it may not be that practical for everyday commutes.

Of course, if you do want to do some number crunching, the dsPIC is there for you. Even if you don't initially find something useful does not mean that you won't need it later. Let's say that one day you discover that you need do deal with some fractions. At that point you'd be happy to know that there is, in fact, native support for fractions. It isn't even a surprise at this point. After all, why wouldn't there be? And that's why I love the 33F.

Closing thoughts

Corrections welcome. It has been a very long time since I used one of these and I'm a bit rusty.

I'm not doing much circuit stuff lately, which is a shame.

In case you are wondering, there actually are two 32-bit architectures from Microchip as well. They are boring, one being an ARM and the other a MIPS.

This was designed by people who love what they are doing.

Does everything that is made with love turn out great?

I wonder why more processors aren't so user-friendly.

Next week: Let's look at the FPGA processors, for machines that are tiny and still get stuff done.

I love what I'm doing. It feels good, which is more important than anything else.


Past: The joys of a modern checklister system