Misusing “and” in C++
The following is valid C++ code:
int x = 0;
int and y = x + 1;
Huh? What does int and y
mean?
Well, a logical AND is normally written as &&
, so let’s rewrite that line:
int && y = x + 1;
Hm, it turns out y
is actually an rvalue reference and there’s no logical operator at all. Weird.
What about this?
int x = 0;
int bitand y = x;
Maybe a bitwise AND operator? That’s normally written as &
, so let’s rewrite that second line:
int & y = x;
This time it’s a “regular” (lvalue) reference.
We can take this further:
int x = 0;
int * y = bitand x;
Here, bitand x
is & x
, which results in a pointer to x
. As it turns out, this last example works in C as well! What’s going on here?
This is a historical artifact related to text encodings. With the rise of Unicode as the “universal standard” for text on computers, it’s easy to forget that there were lots of different text encodings in common use, based on the needs of computer users in different countries speaking different languages. This means that different computers could interpret the same bytes as different characters, depending on what text encoding they used.
One of the first efforts to standardize this was ISO 646. This was based on ASCII but allowed certain characters to be substituted in “national variants.” So, you could count on most, but not all, characters in ASCII to be present. When C was first standardized by ANSI, the committee looked to ISO 646 as a baseline for text encodings. They realized that there were a few characters that weren’t guaranteed to be present, but were nonetheless part of the language syntax. There were nine in all: # [ \ ] ^ { | } ~
If these characters were substituted with different ones in the encoding you were using, then you’d have trouble writing C code.
The committee’s first solution to this was trigraphs, which are three-character “escape sequences” for those characters that weren’t guaranteed by ISO 646. For example, ??=
stands for #
, so you could write ??=include <stdio.h>
. Later, they introduced slightly more readable alternatives to the trigraphs, namely digraphs (two-character sequences) for some key syntax characters and macros for characters used in operators (including the aforementioned and
and bitand
). Then, when C++ was standardized, it included all of these different workarounds from C.
(There’s a slight difference between C and C++ here: In C, you need to include the iso646.h
header to use the operator macros, but in C++ they are actual keywords and no header is necessary.)
In all cases, these are just different ways to write the same characters, regardless of what they mean, which is how we got int and y
at the beginning. The &&
doesn’t refer to a logical AND, but we can write it as and
just the same.
You can, of course, use and
as a logical operator like it was originally intended:
if (x > 0 and y > 0) {
// ...
}
But let’s be clear: You probably shouldn’t do this. Remember that this is intended as a workaround for text encoding issues, which aren’t really a concern nowadays. Even if you like the more “Python-like” operators, you should stick to conventional syntax because this helps other people understand your code with less effort.
Bonus: There’s a similar quirk with destructors:
class my_class {
public:
compl my_class() {}
}
The bitwise complement operator is ~
, which can be written as compl
.
(Note: You may have noticed that &
isn’t one of the characters in ISO 646 that can be substituted, so technically you wouldn’t need any workaround for it if you were using an ISO 646-compliant text encoding. I assume the committee added the AND operator macros to be consistent with the OR operator macros, or
and bitor
, which do require a workaround.)
(This post is based on code I previously posted on GitHub Gist.)