Programming the Arduino, you’ll soon find yourself surrounded by strange data-type names, such as int, char, byte, long long, uint32_t and other oddities. What do they mean? Why are there some many of them? Which ones should you use? Let’s find out!
Tell It As It Is
Back when programming languages such as C were being invented, computers were really, really dumb.
Well, they still are. But today, they are at least fast enough to figure some things out for themselves in reasonable time. The C language, developed in the days of sloooow, needs you to tell it in advance what everything is – and not only what it is, but also how big it should be. For instance, you may want a small variable that occupies only eight bits (binary digits) in memory, or a fancy show-off 32-bit variable.
The Day More Bits Arrived
The designers of languages such as C and Pascal invented some data-type names to help programmers make themselves clear on this point. For example, the “native” integer (=number without fractional part) type for a given system would be called “int” by most C compilers.
“Native” here means it’s the size the CPU is most comfortable with; in a 16-bit computer, that would be 16 bits, and in a 32-bit computer, 32 bits. Easy. But this may cause serious trouble and confusion if you try to move code between these two systems. Unfortunately, by the time 32-bit systems arrived, these programming languages were already too established and widespread to simply take out the confusing definitions. So they were left in, and new ones were added.
To Make Things Worse
This issue really hits the fan when it comes to programming microcontrollers. When writing a C program for a PC, sometimes you don’t even care what the exact number of bits is, as long as you know it’s enough to store the values you might need. On the microcontroller, however, every bit counts in terms of storage space, processing speed, bit manipulation (we’ll cover that in the future) etc. so you have to be 100% precise. One method is to memorize how the specific compiler, say for the Arduino, treats its integer types. Shall we have a go at it?
Here’s the table. In the type name column, anything in brackets is optional, and the “|” symbol stands for OR – you can use either or both of the names it separates. For example, “a | b” could be written in the code as “a“, as “b” or as “a b“.
Type name(s) | Bits | Value range (inclusive) |
---|---|---|
[signed] char | 8 | -128 to 127 |
unsigned char byte | 8 | 0 to 255 |
[signed] short | int | 16 | -32,768 to 32,767 |
unsigned short | int word | 16 | 0 to 65,535 |
[signed] long [int] | 32 | -2,147,483,648 to 2,147,483,647 |
unsigned long [int] | 32 | 0 to 4,294,967,295 |
[signed] long long [int] | 64 | -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 |
unsigned long long [int] | 64 | 0 to 18,446,744,073,709,551,615 |
Did you enjoy it? I didn’t think so. That’s why they invented-
A Better Way
You’ll have to memorize at least some of the table above to understand and interface with other people’s code; but for your own programming, here’s an extremely simple 4-step process to fit all your integer needs:
- Do you want your variable to be able to contain negative numbers? If NOT, type ‘u’ (for “Unsigned”)
- Type “int”
- Type the number of bits you want it to have (8, 16, 32 or 64)
- Type “_t”
And you’re done! Here are two examples:
// An unsigned 8-bit integer uint8_t myIncome; // A signed 64-bit integer int64_t myDebts;
With the exception, perhaps, of the ubiquitous “char” data type (same as int8_t), when programming the Arduino this method is really helpful and convenient. Try it and you’ll never look back. One performance reminder, though: given that most Arduino boards have an 8-bit microcontroller with very little RAM space, wherever possible use low-bit-count integers.
Type name(s) | Bits | Value range (inclusive) |
---|---|---|
int8_t | 8 | -128 to 127 |
uint8_t | 8 | 0 to 255 |
int16_t | 16 | -32,768 to 32,767 |
uint16_t | 16 | 0 to 65,535 |
int32_t | 32 | -2,147,483,648 to 2,147,483,647 |
uint32_t | 32 | 0 to 4,294,967,295 |
int64_t | 64 | -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 |
uint64_t | 64 | 0 to 18,446,744,073,709,551,615 |
“int16_6” *raised eyebrow*
So int16_t (and the like) wasn’t enough? Ya hadda go making even NEWER new ones? *laugh* (typos will get ya every time.)
And yeah, there’s been plenty of times I’ve sat there and worked like……you know, “that place”…just to save a couple of bytes here and a couple of bytes there. Leaving a break out of a switch/case statement can save two bytes on an 8-bit controller. But leaving it out can also use several bytes MORE. I’ve got one program where I took out the breaks one at a time (16 values for the case: lines) and re-compiled between each removal and about half of them would save 2 bytes and the other half would increase the code size when removed. I guess it’s all in the compiler optimization.
Heh, int16_6 must be the 6th version of int16, with security patches and everything 😀 Good catch, I fixed the table now – thanks!
Leaving out breaks is an interesting concept… looks like it requires as much effort from the programmer as from the compiler’s optimizer, just to make sure cases don’t get mixed up! At face value, it makes sense that a jump instruction will be saved and the code will be smaller, but indeed, strange things happen under the hood.
One other note about skipping the break statement; for the one program I was doing, there was 16 case statements (reading from a 4-pin dip switch to then set one of 16 different patterns on a WS2812 string) so there was no need for a default: section. If there WAS a need for such, then skipping the break statements in the other case: sections would probably cause the default: section to be run as well but as long as all possibilities are covered, then it’s safe to do so (just not a good programming habit to get into.)
But… a 16-case switch for a 4-bit value? Why not just use a lookup table?
I probably could do something like a structured-list but the code was written by someone else and I’m just modifying it for my own purposes (I did get it down to 8072 bytes so it’s small enough now.)
I’m making a hat for a Pi for a kid to learn Scratch on and it has a 19-LED WS2812 strip on it (the craft sticks I used to make a frame for the strip limited the length. thus the odd number and not an even 20 or such.) To help her learn binary, it also has a 4-pole DIP switch and an ATTINY85 to drive the strip. The original author had 19 different patterns and used a push button to cycle them (each button press incremented a counter, saved it to EEPROM and then reset the MCU. Upon start, it’d then use the value for the switch statement) so I modified that coding to:
A) dropped it down to 16 patterns.
B) removed the reset/EEPROM code (the Pi can reset it, however. In fact, it’ll just have a function to turn on and off the strip and will do so by holding the reset pin low. The EEPROM code used quite a bit of space anyways.)
C) reading the 4-bit switch to select one of 16 patterns as it loops.
The hat has another ATTINY85 to drive a “police siren”, complete with a RGB LED that’ll flash red/blue and then two momentary switches, one on either side. Then three separate red, yellow, green LEDs to one side on the hat and a blue and a white one to the other side, all under the Pi’s control.
Just adopt the notation – brilliant!
I like it a lot.
The reflexive use of int, because, well, the “tuts” all demo that modality, drives me bats. (So, when they see me using byte there’s dis-ease and even dead panic.)