Preface

The Evolution of C++: from classical to modern

Since its inception as "C with classes", C++ has experienced numerous significant revisions and improvements. The language is now standardized by ISO JTC1/SC22/WG21, a working group composed of C++ experts from various countries. The first standardized version of C++ was ISO/IEC 14882:1998, commonly known as C++98. The next edition, ISO/IEC 14882:2003, was a minor revision that addressed issues found in C++98.

The true revolution of C++ arrived with ISO/IEC 14882:2011, also known as C++11 or C++0x. Officially released in 2011, it had been delayed longer than originally planned, leading developers to joke about the delay by dubbing it C++0B, with the hexadecimal B representing the release year. C++11 is considered a watershed moment in the language's evolution, marking the transition from classical to modern C++. It introduced many important additions to both the core language and the standard library, including rvalue references/move semantics, auto type deduction, uniform initialization syntax using {} lists, lambdas, variadic templates, SFINAE rules, and various smart pointer classes, among other valuable features for crafting robust C++ programs.

A small extension to C++11 was introduced in ISO/IEC 14882:2014. This was followed by another major revision ISO/IEC 14882:2017, which added notable features like std::any, std::variant, and std::optional classes to the standard library.

C++20, i.e., ISO/IEC 14882:2020 was officially published on 15 December 2020, representing the latest major revision. The most welcomed core language features of C++20 include concepts for generic type constraints, modules for improved expression of program physical modules, and coroutines for non-preemptive multitasking. Among the many new standard library features, the ranges library is particularly exciting, as it enables functional programming with "pipeable" functions similar to F#, my favorite .NET language.

Given the impact and changes brought about by C++11/14/17/20, it's clear that pre-2011 C++ and post-2011 C++ are fundamentally different languages. This distinction is reflected in the terms "Classical C++" represented by C++98 and "Modern C++" represented by C++11 and later. Learning the reimagined modern C++ as a new language is necessary, whether it's approached with enthusiasm or apprehension.

C++ was designed with backward compatibility to C, allowing developers to use C-style programming constructs such as raw pointers, arrays, and null-terminated strings. As C++ has evolved, the focus has shifted towards reducing the reliance on C-style idioms and sticking to the "zero overhead" principle. Modern C++ is simpler, safer, more elegant, and retains its speed.

Who this book is for

This book expects readers to have a basic knowledge of C++ and a genuine interest in evolving their skills in modern C++. Most chapters are beginner-friendly, while some need extra focus. Advanced meta-template programming topics may require multiple readings but can be skipped initially. Beginners should refer to other C++ books for fundamental guidance.

What this book covers

This book focuses on helping readers understand the rationale behind new C++11 to C++20 features, discussing past C++ limitations, and examining how these features address and optimize those issues. Wherever necessary, it also explains how new features are implemented in compilers. Code samples are tested using GCC, Clang, and MSVC.

Fundamental Data Types

The fundamental types in C++ include integer types, character types, and floating-point types. These types are considered fundamental because they are built into the language itself and can be used to create more complex data structures and objects. Additionally, they are the building blocks for other C++ data types, such as arrays, structures, and classes.

The following table lists the type specifiers of the fundamental data types in C++.

Character TypesInteger TypesFloating-Point Types
charboolfloat
wchar_tshortdouble
char16_tintlong double
char32_tlong
char8_tlong long
unsigned short
unsigned int
unsigned long
unsigned long long
signed char
unsigned char

void

void is considered a fundamental type in C++. It represents the absence of a value and is used as a placeholder in function signatures and pointer declarations. It cannot be used to declare variables because it has no size or storage, but it is an important part of the C++ language and is often used in conjunction with other data types.

bool

bool is considered an integer type in C++, but it is often treated as a separate category due to its Boolean semantics.

signed char and unsigned char

In C++, the char type is considered a distinct type that can be used to represent individual characters in text string. It is technically not considered an integer type, but does have an integer representation according to the ASCII or Unicode standard, which allows it to be used for integer calculations in some context.

When signed or unsigned is applied to char, it creates a type for small integers that can hold values between 0 and 255 (or -128 to 127 in the case of signed char).Therefore, signed char and unsigned char are both considered integer types.

Note that char is a distinct type from signed char and unsigned char, and it is not guaranteed to be signed or unsigned. The signedness of char is implementation-defined, and it can vary depending on the platform and the compiler.

Type qualifiers and cv-correctness

Type specifiers can be combined with type qualifiers. In C++, there are two type qualifiers: const and volatile.

  • const indicates that a variable's value cannot be modified after it has been initialized.
  • volatile indicates that a variable's value can be modified by external factors such as hardware or other processes. Sometimes, volatile is applied to a variable to prevent compiler optimization.

CV-correctness is a programming concept in C++ that involves using the const and volatile type qualifiers to ensure that functions and data members behave correctly in the presence of const and volatile objects.

For example, a member function that does not modify the state of the object it operates on should be declared const. This ensures that the function can be called on const objects, and that it does not modify the state of the object.

class Example {
public:
    // Declared const because it does not modify the object state
    int getValue() const; 
private:
    int value_;
};

int Example::getValue() const {
    return value_;
}

A member variable can also be declared const if it should not be modified in any case:

class Example {
public:
    Example(int value) : value_(value) {}
    int getValue() const {
        // Cannot be modified because getValue is const
        return value_; 
    }
private:
    // Declared const to ensure it cannot be modified
    const int value_; 
};

The volatile qualifier can be applied to variables that can be changed by external factors, such as hardware or other processes. This ensures that the compiler does not optimize away accesses to the variable, which could cause incorrect behavior.

volatile int* ptr; // Pointer to a volatile int

Using CV-correctness can help prevent errors and improve code safety by ensuring that functions and data members behave correctly in the presence of const and volatile objects.

mutable

In C++, mutable is a type specifier that can be used to declare a non-static data member that can be modified even if the containing object is declared const. This is useful when the variable represents a cache or temporary value that does not affect the state of the object.

class Example {
public:
    int getValue() const {
        // Marked const, so it cannot modify any non-mutable members.
        // However, it can modify mutable members such as cachedValue_.
        if (cachedValue_ == 0) {
            cachedValue_ = someExpensiveCalculation();
        }
        return cachedValue_;
    }

private:
    // Declared mutable to allow modification even 
    // if Example object is const
    mutable int cachedValue_;
};

In this example, cachedValue_ is declared as mutable, which allows it to be modified even if the containing object is declared const. The getValue() function is declared const, which means it cannot modify any non-mutable members of the Example object, but it can modify the mutable member cachedValue_.

Integer Types

Common integer types

C++ supports several integer types with varying sizes and ranges. Here is a list of the most commonly used integer types in C++, available since the earlier versions of the language. Note that char is treated as integer type here for practical reason, though technically it is not.

Type nameTypical Size (in bytes)Range
bool1Boolean literal true or false, added in C++98
char1[-128, 127] or [0, 255] depending on signedness
short2[-32,768, 32,767]
int4[-2,147,483,648, 2,147,483,647]
long4 or 8[-2,147,483,648, 2,147,483,647] or [-9,223,372,036,854,775,808, 9,223,372,036,854,775,807] depending on platform
long long8[-9,223,372,036,854,775,808, 9,223,372,036,854,775,807]
unsigned char1[0, 255]
unsigned short2[0, 65,535]
unsigned int4[0, 4,294,967,295]
unsigned long4 or 8[0, 4,294,967,295] or [0, 18,446,744,073,709,551,615] depending on platform
unsigned long long8[0, 18,446,744,073,709,551,615]

The C++ standard does not specify the minimum bytes for these integer types, except the following constraints:

sizeof(char)      == 1                  // Rule 1
sizeof(char)      <= sizeof(short)      // Rule 2
sizeof(short)     <= sizeof(int)        // Rule 3
sizeof(int)       <= sizeof(long)       // Rule 4
sizeof(long)      <= sizeof(long long)  // Rule 5
sizeof(char)      *  CHAR_BIT >= 8      // Rule 6
sizeof(short)     *  CHAR_BIT >= 16     // Rule 7
sizeof(int)       *  CHAR_BIT >= 16     // Rule 8
sizeof(long)      *  CHAR_BIT >= 32     // Rule 9
sizeof(long long) *  CHAR_BIT >= 64     // Rule 10

CHAR_BIT represents the number of bits in a char type. Although most modern architectures use 8 bits per byte, this is not always the case as some older machines may have used 7-bit bytes. Under Rule 4, C/C++ allows long and int to have the same size, but it must be at least 32 bits according to Rule 9.

Fixed size integer types

The C++11 standard introduced new integer types such as int8_t, int16_t, int32_t, and int64_t with fixed sizes, as well as their unsigned counterparts, uint8_t, uint16_t, uint32_t, and uint64_t. These types are guaranteed to have the specified size and range on any conforming implementation.

The following table summarizes fixed size integer types - note that the intN_t and uintN_t types are guaranteed to have exactly N bits, where N is 8, 16, 32, or 64.

TypeSize (in bytes)Range
int8_t1[-128, 127]
uint8_t1[0, 255]
int16_t2[-32,768, 32,767]
uint16_t2[0, 65,535]
int32_t4[-2,147,483,648, 2,147,483,647]
uint32_t4[0, 4,294,967,295]
int64_t8[-9,223,372,036,854,775,808, 9,223,372,036,854,775,807]
uint64_t8[0, 18,446,744,073,709,551,615]

128-bit integer types

The C++ standard does not define a 128-bit integer type, as of the latest version C++20.

However, some compilers and libraries provide extensions that define a 128-bit integer type. For example, the GCC and Clang compilers provide an __int128 type, which is a 128-bit signed integer type. The Boost Multiprecision library provides several integer types with arbitrary precision, including a boost::multiprecision::int128_t type.

Type nameLibrary/CompilerDescription
__int128GCC, ClangA 128-bit signed integer type
unsigned __int128GCC, ClangA 128-bit unsigned integer type
int128_tBoost MultiprecisionA 128-bit signed integer type
uint128_tBoost MultiprecisionA 128-bit unsigned integer type

It's important to note that the availability and behavior of non-standard integer types may vary depending on the platform and compiler used.

Integer Type long long

History

Before long long was officially added to the C++11 standard in 2011, C++ programmers already knew about the long long integer type for a long time. It has been part of the C language since the C99 standard, and many major C++ compilers supported long long for compatibility with C.

As early as 1995, Roland Hartinger first proposed to add long long to C++. At the time, the C committee had not yet considered this type. As a result, the C++ committee was reluctant to add a fundamental type that was not also in C. After long long had been added to C99, Stephen Adamczyk proposed to reconsider its addition to C++ in 2005. Finally, long long was accepted as part of C++ in 2011, more than ten years after it was first included in the C standard.

Bit size

The C++ standard defines long long as an integer type that is at least 64 bits long, but it does not guarantee that long long will always be 64 bits on all platforms. The size of long long can depend on the architecture and the compiler being used. However, most modern platforms do support a 64-bit long long type. To ensure portability and avoid any potential issues, it's best to use the sizeof operator to determine the size of long long on a specific platform.

Remember that in C++, long long is a signed data type, and its corresponding unsigned data type is unsigned long long. It's important to note that long long int and unsigned long long int have the same meaning as long long and unsigned long long, respectively, with the latter forms being shorthand for the former ones.

Literal suffix

The C++ standard defines LL and ULL as literal suffixes for long long and unsigned long long, respectively. When initializing a long long type variable, you can write it like this:

long long x = 65536LL;

The literal suffix LL can be omitted with the same result:

long long x = 65536;

When working with large integer values in C++, it is important to use literal suffixes to ensure that the code runs as intended. For example:

long long x = 65536 << 16; // Value overflows to 0
std::cout << "x = " << x << std::endl;
long long y = 65536LL << 16;
std::cout << "y = " << y << std::endl;

The code long long x = 65536 << 16 performs a bitwise left shift operation on the decimal value 65536 by 16 bits, which can result in an overflow and unexpected behavior.

To prevent overflowing, we should use the LL literal suffix to ensure that the value is treated as a long long data type, as in long long y = 65536LL << 16. This will ensure that the code runs as intended and the value is not unexpectedly truncated or overflowed.

Numerical limits

We should avoid using macro as much as possible for defining the maximum and minimum values:

#define LLONG_MAX 9223372036854775807LL        // long long max value
#define LLONG_MIN (-9223372036854775807LL - 1) // long long min value
#define ULLONG_MAX 0xFFFFFFFFFFFFFFFFULL       // unsigned long long max value

Instead, we should use std::numeric_limits:

#include <iostream>
#include <limits>
#include <cstdio>

int main(int argc, char *argv[])
{
    // Avoid these!
    std::cout << "LLONG_MAX = "  
            << LLONG_MAX  
            << std::endl;

    std::cout << "LLONG_MIN = "  
            << LLONG_MIN  
            << std::endl;

    std::cout << "ULLONG_MAX = " 
            << ULLONG_MAX 
            << std::endl;

    std::printf("LLONG_MAX  = %lld\n", LLONG_MAX);  // format specifier %lld
    std::printf("LLONG_MIN  = %lld\n", LLONG_MIN);  // format specifier %lld
    std::printf("ULLONG_MAX = %llu\n", ULLONG_MAX); // format specifier %llu

    // Use std::numeric_limits
    std::cout << "std::numeric_limits<long long>::max() = " 
            << std::numeric_limits<long long>::max() 
            << std::endl;

    std::cout << "std::numeric_limits<long long>::min() = "
            << std::numeric_limits<long long>::min()
            << std::endl;

    std::cout << "std::numeric_limits<unsigned long long>::max() = "
            << std::numeric_limits<unsigned long long>::max() 
            << std::endl;
}

Character Types

In C++, char is not necessarily the same type as signed char, although on most platforms they are equivalent.

The C++ standard defines char, signed char, and unsigned char as three distinct integral types, each with its own range of representable values. The C++ standard does not specify whether char is signed or unsigned by default, which means that it is implementation-defined.

On most platforms, char is implemented as a signed type, and its range of representable values is the same as that of signed char. However, on some rare platforms, char may be implemented as an unsigned type, in which case it would have the same range of representable values as unsigned char.

So, while char and signed char are often the same type in C++, it is not guaranteed by the standard. To ensure portability of code that relies on the signedness of char, it is recommended to use signed char explicitly.

Issue with wchar_t

wchar_t is a character type in C++ that is used to represent wide characters. It was introduced into C++ with the C++98 standard. Many Windows API functions have a wide character version that takes wchar_t strings as arguments. The wide character version of these functions has a suffix of W added to the function name. For example, the function CreateFile() in the Windows API has a wide character version named CreateFileW().

The C++ standard specifies that a string literal with an L prefix creates a wide character string literal.

#include <windows.h>

int main()
{
    LPCWSTR fileName = L"C:\\example\\test.txt";
    HANDLE hFile = CreateFileW(fileName, 
                               GENERIC_READ, 
                               FILE_SHARE_READ, 
                               NULL, 
                               OPEN_EXISTING, 
                               FILE_ATTRIBUTE_NORMAL, 
                               NULL);

    if (hFile == INVALID_HANDLE_VALUE) {
        // Handle error
        return 1;
    }
    // Do something with the file handle
    CloseHandle(hFile);
    return 0;
}

The issue with wchar_t is that its size is implementation-defined, which means that it can vary across different systems and compilers. The C++ standard does not specify the size of wchar_t, leaving it up to the implementation to decide. For example, on Windows systems, wchar_t is 16 bits (2 bytes), while on Unix-like systems, it is typically 32 bits (4 bytes).

This lack of standardization has led to portability issues when writing cross-platform code. Code that relies on wchar_t may not work as expected when compiled on a different system with a different wchar_t size. This can result in problems with data alignment, byte order, and other issues that can cause the program to behave incorrectly.

To address this issue, the C++11 standard introduced new character types, char16_t and char32_t, which have fixed sizes of 16 and 32 bits, respectively. These types are recommended for use in portable code, rather than wchar_t.

Character Sets and Encodings

Character set

A character set, also known as a character repertoire, is a collection of characters and symbols that are used to represent written language in computing. Each character in a character set is assigned a unique code point, which is a numerical value that represents that character in digital form.

Character sets can include characters from many different writing systems and languages, such as the Latin alphabet used in English, or the Chinese characters used in Mandarin Chinese. Some character sets are designed for specific languages or scripts, while others are designed to be universal and include characters from many different languages.

Examples of character sets include ASCII, which includes characters commonly used in the English language, and Unicode, which is a universal character set that can represent all characters used in modern computing, including characters from many different writing systems.

Code point

A code point is a numerical value that represents a single character or symbol in a character set. Each character in a character set is assigned a unique code point, which is a specific number that identifies that character.

Code points are typically expressed as hexadecimal numbers, which means that they use a base-16 numbering system. For example, the code point for the letter "A" in the ASCII character set is 0x41, while the code point for the Greek letter "α" in the Unicode character set is 0x03B1.

Unicode comprises 1,114,112 code points in the range [0, 1,114,111]. The maximum value of Unicode code point is 1,114,111 (0x10FFFF).

Encodings

Encoding involves mapping each code point to a specific sequence of bits or bytes that can be used to represent that character in digital form.

The Unicode standard defines a character set that includes 1,114,111 characters, each with a unique code point, and provides several encoding schemes, including UTF-8, UTF-16, and UTF-32, that allow characters to be represented using variable-length sequences of bytes.

UTF-8 encoding

UTF-8 is a variable-length encoding scheme. It works by mapping each Unicode code point to a sequence of 1 to 4 bytes, depending on the code point value.

Code Point RangeNumber of BytesBinary Format
0 to 1271 byte0xxx'xxxx
128 to 20472 bytes110x'xxxx, 10xx'xxxx
2048 to 655353 bytes111'0xxxx 10x'xxxxx 10xx'xxxx
65536 to 11141114 bytes1111'0xxx 10xx'xxxx 10xx'xxxx 10xx'xxxx

Here's how UTF-8 encoding works:

  • If the code point value is between 0 and 127 (inclusive), the code point is represented as a single byte with the same value. This means that ASCII characters (which have code point values between 0 and 127) can be represented in UTF-8 encoding using a single byte.

  • If the code point value is between 128 and 2047 (inclusive), the code point is represented as 2 bytes. The first byte starts with the binary value 110, followed by 5 bits that represent the most significant bits of the code point value. The second byte starts with the binary value 10, followed by 6 bits that represent the least significant bits of the code point value.

  • If the code point value is between 2048 and 65535 (inclusive), the code point is represented as 3 bytes. The first byte starts with the binary value 1110, followed by 4 bits that represent the most significant bits of the code point value. The second and third bytes start with the binary value 10, followed by 6 bits each that represent the remaining bits of the code point value.

  • If the code point value is between 65536 and 1114111 (inclusive), the code point is represented as 4 bytes. The first byte starts with the binary value 11110, followed by 3 bits that represent the most significant bits of the code point value. The second, third, and fourth bytes start with the binary value 10, followed by 6 bits each that represent the remaining bits of the code point value.

By using a variable-length encoding scheme, UTF-8 encoding can represent all Unicode code points using a sequence of 1 to 4 bytes. This allows UTF-8 to be a compact and efficient encoding scheme. UTF-8 is a superset of ASCII and fully compatible with it.

UTF-8 has unique patterns with the first byte, and a fixed pattern with trailing bytes. This allows for easy validation of a correct UTF-8 sequence, quick "scrolling" to a random position and synchronizing quickly where a character will start.

UTF-16 encoding

Code Point RangeNumber of BytesBinary Format
0 to 655351 code unit (2 bytes)xxxxxxxx xxxxxxxx
65536 to 11141112 code units (4 bytes)110110yy yyyyyyyy 110111xx xxxxxxxx
  • For code points in the range of 0 to 65535, UTF-16 encoding represents each code point using a single 16-bit code unit.
  • For code points in the range of 65536 to 1114111, UTF-16 encoding represents each code point using a pair of 16-bit code units, known as a surrogate pair. The first 16-bit code unit (known as the high surrogate) has a value in the range of 0xD800 to 0xDBFF, while the second 16-bit code unit (known as the low surrogate) has a value in the range of 0xDC00 to 0xDFFF.

UTF-32 encoding

Code Point RangeNumber of Code UnitsBinary Format
0 to 11141111 code unit (4 bytes)00000000 xxxxxxxx xxxxxxxx xxxxxxxx

UTF-32 encoding represents each code point using a single 32-bit code unit, which means that every Unicode code point is represented using exactly 4 bytes of memory.

Why not UTF-24 encoding

Although it is theoretically possible to create a fixed-length encoding scheme using 3 bytes to represent each Unicode code point, such a scheme would not provide any significant advantages over existing ones like UTF-8, UTF-16, or UTF-32 in terms of processing or space efficiency. Many software systems and programming languages are optimized for these standard Unicode encoding schemes, making them more convenient and widely supported.

Furthermore, most of the commonly used Unicode code points are smaller than 65536, which means that using three bytes per code point would result in unnecessary wastage of space. Therefore, despite the theoretical possibility of a 3-byte fixed-length encoding scheme, it is not practical to use it in most real-world scenarios.

Byte order mark

The Unicode encoding of a text file can be determined by examining the byte order mark (BOM) at the beginning of the file, or by analyzing the byte sequences of the file.

EncodingByte Order Mark
UTF-8EF BB BF (optional)
UTF-16FE FF (big-endian) or FF FE (little-endian)
UTF-3200 00 FE FF (big-endian) or FF FE 00 00 (little-endian)

Code page

The legacy term "code page" originated from IBM's EBCDIC-based mainframe systems. Originally, the code page numbers referred to the page numbers in the IBM standard character set manual.

Vendors that use a code page system allocate their own code page number to a character set and its encoding, even if it is better known by another name; for example, UTF-8 has been assigned page numbers 1208 at IBM, 65001 at Microsoft, and 4110 at SAP.

The following table lists Windows code pages used by Microsoft in its own Windows operating system.

Microsoft Code PageCode Page NumberDescription
Windows-12521252Western European languages
Windows-12501250Central and Eastern European languages
Windows-12511251Cyrillic languages
Windows-12531253Greek language
Windows-12541254Turkish language
Windows-12551255Hebrew language
Windows-12561256Arabic language
Windows-12571257Baltic languages
Windows-12581258Vietnamese language
UTF-8650018-bit Unicode
UTF-16LE120016-bit Unicode, Little Endian
UTF-16BE120116-bit Unicode, Big Endian
UTF-32LE1200032-bit Unicode, Little Endian
UTF-32BE1200132-bit Unicode, Big Endian
UTF-7650007-bit Unicode
UTF-1120008-bit Unicode
UTF-EBCDIC1200EBCDIC-based Unicode

New Character Types

Why char not good for UTF-8

In C++, char is a fundamental type that represents a byte-sized unit of data. Historically, it has been used to represent both ASCII characters and other narrow character sets, depending on the execution environment.

Suppose we have the following C++ code (in C++11), with the source file saved as UTF-8 text:

// "你吃饭了吗?" literal is treated as a plain array of bytes, interpreted by
// the compiler as Windows-1252 single byte encoding.
const char* utf8_str = "你吃饭了吗?"; 

If the source file containing the Chinese characters "你吃饭了吗?" is saved as UTF-8 text, then the encoded representation of the text will also be in UTF-8 format. However, if the platform where the code is compiled is using a different encoding, such as Windows-1252, then the compiler may attempt to interpret the Chinese characters as single-byte characters in the Windows-1252 encoding, because the type of the variable utf8_str is declared as a plain char array, which relies on the execution environment to provide the encoding context.

For example, the Chinese character "你" is represented by three bytes in UTF-8, which are 0xE4 0xBD 0xA0. When interpreted as Windows-1252, the first byte 0xE4 is an invalid character, so the compiler replaces it with the ASCII replacement character 0x3F. As a result, every byte of the UTF-8 encoded string "你吃饭了吗?" is replaced with the ASCII replacement character 0x3F before being assigned to utf8_str. The mismatched data can cause unexpected results and errors in the program.

Execution environment explained

The "execution character set of the platform" refers to the character encoding scheme used by the operating system and/or the compiler to represent text data internally in a computer program.

In C and C++, the execution character set determines how characters are represented in the char data type. The specific character set used can vary depending on the platform, compiler, and locale settings.

For example, on Windows systems, the default execution character set is typically based on the Windows-1252 code page, which is a superset of ASCII that includes characters for European languages. On Unix-based systems, the default execution character set is typically based on the ASCII encoding.

char8_t was introduced in C++20 to provide a distinct type that is guaranteed to represent an 8-bit code unit of UTF-8 encoded Unicode text. This allows for safer and more efficient handling of UTF-8 strings, as developers can use char8_t to represent individual code units of the UTF-8 encoding. This can help to avoid issues such as misinterpreting multi-byte sequences or incorrectly handling invalid code points.

In the following code, utf8_str will have the correct UTF-8 code point values, regardless of the execution character set of the platform.

// char8_t is a new C++20 type. The "u8" prefix makes sure the string literal is 
// interpreted as UTF-8 encoded text while enforcing type safety with char8_t.
// Without "u8" prefix, the string literal will be treated as "const char*" type,
// which is a type mismatch with char8_t, thus failing compiling.
const char8_t* utf8_str = u8"你吃饭了吗?"; 
// std::cout << utf8_str << std::endl; // This won't compile

In C++20, there is no char8_t-aware I/O streams (the overloaded std::cout for char8_t, char16_t and char32_t are marked as "delete". It is expected that the issue will be resolved in C++23 or C++26.

char16_t and char32_t were introduced in C++11 to provide support for Unicode text encoding. char16_t represents a 16-bit code unit of UTF-16 encoded Unicode text, while char32_t represents a 32-bit code unit of UTF-32 encoded Unicode text.

TypeIntroduced inMain Reason for IntroductionLiteral PrefixSample Code
char8_tC++20UTF-8 encodingu8const char8_t* str = u8"吃了吗";
char16_tC++11UTF-16 encodinguconst char16_t* str = u"吃了吗";
char32_tC++11UTF-32 encodingUconst char32_t* str = U"吃了吗";

The string literal prefix u8, u, U were introduced in C++11. The following code won't pass compilation with C++11 because they cannot be applied to characters. It is since C++17 that these literal prefix are allowed to be used with a character.

char utf8c = u8'a'; // C++11 will fail but C++17/20 can pass

Also the following code would fail compiling because the value cannot fit a single byte.

char utf8c = u8'好';

std::cout cannot be used to output UTF-8 string to console. Use printf instead. On Windows, remember to set the active code page of the Windows commandline console to UTF-8 by running chcp command first.

chcp 65001

The following code uses printf to output an UTF-8 string.

#include <iostream>

using namespace std;

// Remember to run Windows commandline command "chcp 65001" first to set the active
// code page to UTF-8.

int main() {
  // Null terminator automatically appended.
  char8_t utf8Chars[] = u8"你好世界";
  // Will have two null terminators. 
  char8_t utf8CharsWithNull[] = u8"你好世界\0"; 

  auto len_1 = std::char_traits<char8_t>::length(utf8Chars);
  auto len_2 = std::char_traits<char8_t>::length(utf8CharsWithNull);

  cout << "length(utf8Chars) = " 
       << len_1 
       << endl; // output 12

  cout << "length(utf8CharsWithNull) = " 
       << len_2 
       << endl; // output 12

  cout << "sizeof(char8_t) = " 
       << sizeof(char8_t) 
       << endl; // output 1
  
  // std::cout << utf8Words << std::endl; // This would fail compiling.  
  printf("%s", reinterpret_cast<char*>(&utf8Chars[0]));

  /*
  for (std::size_t i = 0; i < len; i++) {
    std::cout << utf8Chars[i] << '\n'; // This would fail compiling.
  }
  */

  return 0;
}

In C++20, the use of the std::codecvt facet is deprecated and discouraged. To display a UTF-8 string character on the Windows commandline console, we need to utilize the platform-specific MultiByteToWideChar function provided by Windows. This will convert the UTF-8 text to wide characters, which can then be output using std::wcout. If we need to access a particular character in the UTF-16 or UTF-32 text based on its position, we should apply the same approach.

#include <iostream>
#include <locale>
#include <string>
#include <Windows.h>

using namespace std;

// Remember to run Windows commandline command "chcp 65001" first to set the active
// code page to UTF-8.

int main() {
    u8string my_string = u8"こんにちは";

    // my_string[0] is the byte value of the UTF-8 text at byte position 0.
    // The actual character could have multiple bytes.
    // std::cout << my_string[0] << std::endl; would fail compiling.

    // Get the required buffer size  
    int len = MultiByteToWideChar(CP_UTF8,
                                  0, 
                                  reinterpret_cast<const char*>(my_string.data()), 
                                  static_cast<int>(my_string.size()), 
                                  nullptr, 
                                  0);

    // Create a buffer of the required size
    wstring my_wstring(len, 0);

    // Convert to UTF-16 
    MultiByteToWideChar(CP_UTF8, 
                        0, 
                        reinterpret_cast<const char*>(my_string.data()), 
                        static_cast<int>(my_string.size()), 
                        &my_wstring[0], 
                        len); 

    locale::global(locale("en_US.UTF-8"));

    // Output the string
    wcout << my_wstring << endl; 

    for (int i = 0; i < len; i++) {
       wcout << my_wstring[i] << endl;    
    }

    return 0;
}

Automatic String Literal Concatenation

Automatic concatenation of adjacent string literals is a feature present in both C and C++ programming languages. It allows the compiler to automatically merge two or more string literals that are placed next to each other, without any explicit concatenation operator. This can be useful for breaking long strings into shorter, more manageable pieces, while still treating them as a single string constant.

Here is an example:

const char* my_string = "Hello,"
                        "World!";

The compiler will automatically concatenate the two string literals, resulting in the following:

const char* my_string = "Hello,World!";

This feature has its roots in the C programming language. It was inherited by C++ in the early 1980s.

Notes on automatic string literal concatenation

Some nuances and caveats of using automatic concatenation of adjacent string literals:

Whitespace not strictly required

Adjacent string literals can be separated by whitespace, like a space, a tab, or a newline, for the concatenation to occur. However, white space between the literals is not strictly required, so the following is still valid in both C and C++:

const char* my_string = "Hello,""World";

The compile will automatically concatenate the adjacent string literals, resulting in the following:

const char* my_string = "Hello,World";

It's a good practice to include whitespace between adjacent string literals for better readability and maintainability.

Compile time concatenation

The concatenation happens at compile-time, not at runtime, which means it has no performance overhead.

Variables or expressions not allowed

Automatic concatenation can only be used with string literals, not with variables or other expressions.

Mixed encodings

Be aware that trying to concatenate string literals with different character encodings may lead to compilation errors or unexpected behavior. For example, the following code will result in compiler error "concatenation of string literals with conflicting encoding prefixes".

const char8_t* utf8Chars = u8"Hello," 
                           L"World!";

If one of the string literals does not have prefix, it will be treated as having the same as others, hence the following is a valid operation:

const char8_t* utf8Chars = u8"Hello," 
                           "World!"; // Equivalent to u8"World!"

The + operator

Using the + operator for concatenation works differently than automatic concatenation of adjacent string literals. In C++, the + operator can be used to concatenate std::string objects or a std::string object and a string literal. However, the + operator cannot be used to concatenate two string literals directly.

Here is an example:

#include <iostream>
#include <string>

int main() {
    std::string str1 = "Hello, ";
    std::string str2 = "World!";
    
    std::string result = str1 + str2 + "Oh Yeah"; // Valid in C++
    
    std::cout << result << std::endl;
    return 0;
}

In the example above, the + operator is used to concatenate two std::string objects. However, trying to do this with string literals directly will lead to a compilation error:

const char* result = "Hello, " + "World!" + "Oh Yeah; // NOT valid in C++ (or C)

C does not have the std::string class and the + operator for concatenation. Use functions like strcat or strncat from the string.h library to concatenate character arrays (null-terminated strings). Remember to allocate enough memory for the concatenated result and ensure that the destination string is null-terminated.

Here's an example of using strcat and strncat functions in C:

#include <stdio.h>
#include <string.h>

int main() {
    char str1[20] = "Hello, ";
    char str2[] = "world!";
    char str3[20] = "I am a string.";

    // Using strcat
    strcat(str1, str2);
    printf("str1 after strcat: %s\n", str1);

    // Using strncat
    strncat(str3, str2, 4);
    printf("str3 after strncat: %s\n", str3);

    return 0;
}

In the above code, we have used two different functions for concatenating strings.

  • strcat function concatenates str2 to the end of str1 and modifies str1. After the strcat operation, str1 will contain the concatenated string.

  • strncat function concatenates a specified number of characters (in this case, 4) from str2 to the end of str3 and modifies str3. After the strncat operation, str3 will contain the concatenated string.

The output of the above code will be:

str1 after strcat: Hello, world!
str3 after strncat: I am a string.worl

Library Support

Deprecated library support

ComponentPurposeStatus
template<class InternT, class ExternT, class StateT> class codecvt defined in header <locale>Provides a template class for converting between different character encodingsDeprecated in C++20
<codecvt> headerProvides a set of templates for character encoding conversion, including std::codecvt_utf8, std::codecvt_utf16, and std::codecvt_utf8_utf16Deprecated in C++17
std::wstring_convertProvides a higher-level interface for converting between wide character strings (std::wstring) and narrow character strings (std::string)Deprecated in C++17

New string types

String TypeDescriptionBasic DefinitionIntroduced in C++
u8stringA string of 8-bit characters encoded in UTF-8std::basic_string<char8_t>C++20
u16stringA string of 16-bit characters encoded in UTF-16std::basic_string<char16_t>C++11
u32stringA string of 32-bit characters encoded in UTF-32std::basic_string<char32_t>C++11

std::pmr::u8string

std::pmr::u8string is a variant of the std::basic_string template that represents a sequence of 8-bit characters encoded in UTF-8 format, and allows for custom memory allocation using user-defined memory resources. It is part of the C++20 Polymorphic Memory Resource library (std::pmr).

To use std::pmr::u8string, you need to include the <string> and <memory_resource> headers, and create a std::pmr::memory_resource object to use as the memory allocator. You can then create an instance of std::pmr::u8string by passing the memory allocator as a constructor argument.

Here's an example of how to use std::pmr::u8string:

#include <iostream>
#include <string>
#include <memory_resource>

int main()
{
    // create a memory pool using std::pmr::monotonic_buffer_resource
    std::pmr::monotonic_buffer_resource pool(1024);

    // create an std::pmr::u8string using the memory pool
    std::pmr::u8string str(u8"Hello, world!", &pool);

    // print the string to the console
    printf(reinterpret_cast<char*>(str.data()));

    return 0;
}

C11 way

FunctionDescription
mbrtoc16Converts a multibyte sequence to a 16-bit wide character
c16rtombConverts a 16-bit wide character to a multibyte sequence
mbrtoc32Converts a multibyte sequence to a 32-bit wide character
c32rtombConverts a 32-bit wide character to a multibyte sequence

These are C11 functions.

In the function name mbrtoc16, the "rto" stands for "read to". This function reads a multibyte character sequence and converts it to a 16-bit wide character. The "c16" part of the function name indicates that the output is a 16-bit character, while the "mb" part indicates that the input is a multibyte character sequence.

Here's an example of using the mbrtoc16 function to convert a multibyte sequence to a 16-bit wide character:

#include <stdio.h>
#include <uchar.h>
#include <locale.h>
#include <wchar.h>

int main() {
    setlocale(LC_ALL, "en_US.UTF-8");

    char mbstr[] = "Hello, world!"; // Note char8_t is not part of C language yet.
    char16_t wc16;
    mbstate_t state = { 0 };
    size_t res = mbrtoc16(&wc16, mbstr, sizeof(mbstr), &state);
    if (res == (size_t)-1 || res == (size_t)-2) {
        printf("Error: invalid multibyte sequence\n");
        return 1;
    }
    printf("The first character is: %lc\n", (wint_t)wc16);

    return 0;
}

Namespace

C++ namespaces provide a way to group related declarations and definitions, such as classes, functions, and variables, under a common name. This helps to avoid naming conflicts between different parts of a program or different libraries that may be used together.

Namespaces were introduced into the C++ standard with the release of C++98. The syntax for declaring and defining namespaces is similar to that used for classes. Here's an example:

// Declaration of a namespace
namespace MyNamespace {
    int x;
    void foo();
}

// Definition of the namespace's contents
namespace MyNamespace {
    int x = 42;
    void foo() {
        // Implementation of the function
    }
}

In this example, MyNamespace is declared and defined to contain an integer variable x and a function foo(). The namespace's contents can be accessed using the scope resolution operator ::, like this:

int main() {
    MyNamespace::x = 10;
    MyNamespace::foo();
    return 0;
}

Inline Namespace

What is inline namespace

When a namespace is declared as inline, it means that its members are automatically injected into the enclosing parent namespace, as if they were defined directly in the parent namespace. This allows clients of the namespace to refer to its members without needing to qualify them with the namespace name.

For example, consider the following code:

namespace outer {
    inline namespace inner {
        void foo() {}
    }
}

Here, inner is an inline namespace that is declared within the outer namespace. This means that foo() can be accessed either as outer::inner::foo() or simply as outer::foo().

Use case

C++ inline namespaces were introduced in the C++11 standard to provide a mechanism for versioning and incremental updates of libraries, without breaking backward compatibility.

An inline namespace can be used to provide an updated version of a library's interface, while still allowing old code to use the previous version. By using an inline namespace, the new version of the library can be introduced without breaking the existing code that depends on the old version.

Here is an example of how an inline namespace can be used:

#include <iostream>

/*
// Initial version of the library
namespace MyLib {
    void foo() {
        std::cout << "Hello, world!" << std::endl;
    }
}
*/

// Updated version of the library, in an inline namespace
namespace MyLib {
    inline namespace v1 {
        void foo() {
            std::cout << "Hello, World!" << std::endl;
        }
    }
    
    namespace v2 {
        void foo() {
            std::cout << "Hello, C++11!" << std::endl;
        }
    }
}

// Usage of the library
int main() {
    MyLib::foo();     // calls the initial version of foo
    MyLib::v2::foo(); // calls the updated version of foo
    return 0;
}

This code demonstrates how backward compatibility is maintained in a library called MyLib, which defines two versions of a function named foo(). The output of the program will be:

Hello, World!
Hello, C++11!

New Nested Namespace Syntax

Prior to C++17, nested namespaces are defined like this:

namespace A {
    namespace B {
        namespace C {
            int foo() { return 5; }
        }
    }
}

With C++17, the same nested namespaces can be defined using the inline syntax concisely:

namespace A::B::C {
    int foo() { return 5; }
}

Both of these code snippets achieve the same result: defining a function foo() in the namespace A::B::C. The inline namespace definition syntax introduced in C++17 allows for a more compact and readable way to define nested namespaces.

Nested inline namespace

The combination of the nested namespace definition syntax (introduced in C++17) and the inline namespace declaration is allowed in C++20.

The following is valid in C++20:

namespace A::B::inline C {
    int foo() { return 5; }
}

In this code, the inline keyword is applied to the C namespace within the nested namespace definition A::B. This declares C as an inline namespace within the enclosing namespace B.

Note inline keyword can appear before any namespace name except namespace A.

Unnamed Namespace

The unnamed namespace (or anonymous namespace) is a feature in C++ that was introduced in the C++98 standard. It provides a way to declare identifiers (e.g., functions, variables, or types) with internal linkage, meaning they are only visible within the scope of their parent namespace, or translation unit (i.e., the source file) in which they are defined.

Unnamed namespaces can be declared using the namespace keyword, followed by a pair of braces, like this:

namespace {
    // Your code here
}

For example, a helper function or a constant that is only needed within a single source file, can be put in an unnamed namespace to prevent it from being accessible in other parts of the program:

// File: my_file.cpp
#include "my_file.h"

namespace {
    const int someConstant = 42;

    void helperFunction() {
        // Implementation here
    }
}

void myPublicFunction() {
    helperFunction();
    // Other implementation details
}

In this example, someConstant and helperFunction are only visible within my_file.cpp and won't conflict with any other code using the same names.

Another example:

namespace my_namespace {
    namespace {
        void helperFunction() {
            // Implementation here
        }
    }

    void publicFunction() {
        helperFunction(); // This is allowed since helperFunction() is in the same parent namespace
    }
}

In this example, helperFunction() is declared within an unnamed namespace inside my_namespace. Although helperFunction() has internal linkage and is not visible outside of the translation unit, it can still be accessed by other functions within the same parent namespace (my_namespace), such as publicFunction().

Merged Namespace

If a namespace is defined multiple times, its contents are merged together. For example:

// First definition of namespace MyNamespace
namespace MyNamespace {
    int x = 1;
    void foo() {
        // Implementation of the function
    }
}

// Second definition of namespace MyNamespace, with different contents
namespace MyNamespace {
    int y = 2;
    void bar() {
        // Implementation of the function
    }
}

// Usage of the namespace contents
int main() {
    MyNamespace::foo();
    MyNamespace::bar();
    std::cout << MyNamespace::x + MyNamespace::y << std::endl;
    return 0;
}

Howerver, if the same variable is defined multiple times, a redefinition error will occur:


#include <iostream>

namespace Namespace1 {
    int x = 1;
}

namespace Namespace1 {
    int x = 2;
}

int main() {
    std::cout << Namespace1::x << std::endl;
    std::cout << Namespace2::x << std::endl;
    return 0;
}


We'll see the following compiler error:

<source>:8:9: error: redefinition of 'int Namespace1::x'
    8 |     int x = 2;
      |         ^
<source>:4:9: note: 'int Namespace1::x' previously defined here
    4 |     int x = 1;
      |         ^
<source>: In function 'int main()':
<source>:13:18: error: 'Namespace2' has not been declared
   13 |     std::cout << Namespace2::x << std::endl;

Global Namespace

In C++, the global namespace is the outermost namespace that encompasses all the code in a program. When you define a variable, function, or type without explicitly placing it in a named or unnamed namespace, it becomes part of the global namespace. The global namespace is accessible from anywhere in the program, making its members visible across different translation units.

Although using the global namespace can make it easier to access identifiers without needing to specify a particular namespace, it is generally not recommended to place many identifiers in the global namespace, as it can lead to name clashes and reduced code maintainability. In large projects, putting too many identifiers in the global namespace can make it difficult to determine the purpose or origin of a particular identifier.

Instead, it's usually better to use named namespaces to organize and encapsulate your code, which helps prevent name collisions and improve code readability.

Here's an example that demonstrates the difference between global and named namespaces:

// Global namespace
int globalVariable = 10;

void globalFunction() {
    // Implementation here
}

// Named namespace
namespace my_namespace {
    int myVariable = 20;

    void myFunction() {
        // Implementation here
    }
}

int main() {
    globalFunction(); // Accessing a function in the global namespace
    my_namespace::myFunction(); // Accessing a function in a named namespace

    return 0;
}

In this example, globalVariable and globalFunction() are defined in the global namespace, while myVariable and myFunction() are defined within the named namespace my_namespace. To access members of a named namespace, use the namespace qualifier ::.

Scope resolution operator ::

The global namespace can be accessed explicitly by using the scope resolution operator ::. This can be helpful when an identifier in the global namespace shares the same name as an identifier in a different namespace, or it is desirable to explicitly refer to the global namespace version of an identifier.

Here's an example demonstrating the use of :: to access the global namespace:

#include <iostream>

// Global namespace
int myVariable = 10;

namespace my_namespace {
    int myVariable = 20;

    void printVariables() {
        std::cout << "Global namespace myVariable: " << ::myVariable << std::endl;
        std::cout << "my_namespace myVariable: " << myVariable << std::endl;
    }
}

int main() {
    my_namespace::printVariables();
    return 0;
}

In this example, there are two variables with the same name myVariable, one in the global namespace and another in the named namespace my_namespace. Inside the printVariables() function, resolution operator :: is specified to access the myVariable from the global namespace, while the unqualified myVariable refers to the one in the my_namespace.

Compile Time Evaluation

In C++, compile-time evaluation refers to the ability to evaluate expressions and perform computations at compile-time, rather than at runtime. This can be achieved using keywords such as constexpr, consteval, and constinit.

KeywordIntroduced inUsage
constinitC++20Defines objects that are guaranteed to be initialized with a constant expression.
constexprC++11Indicates that a function or object can be evaluated at compile-time.
constevalC++20Similar to constexpr, but functions marked with consteval must be evaluated at compile-time.

In addition to these keywords, C++ also includes several other features that enable compile-time evaluation, such as template metaprogramming and the std::integral_constant class template. These features allow for complex computations and logic to be performed at compile-time, leading to more efficient and optimized code.

Performance boost with compile time evaluation

The ability to perform compile-time evaluation is an important part of the C++ language, as it enables developers to create more efficient and optimized code. The C++ standard includes a number of requirements and guidelines for how these features should be implemented and used. These guidelines help ensure that code that uses compile-time evaluation is portable and can be used across different platforms and architectures.

Compile-time evaluation can help performance in several ways:

  1. Reduce runtime overhead: When values or expressions are evaluated at compile time, the resulting code can be optimized by the compiler. This can reduce the amount of runtime overhead that would be incurred if the same calculations were performed at runtime.

  2. Eliminate runtime errors: By evaluating values or expressions at compile time, potential runtime errors can be caught and eliminated before the program is even executed. This can help improve the stability and reliability of the program.

  3. Enable constant propagation: When values are known at compile time, they can be propagated throughout the code as constants. This can eliminate unnecessary memory accesses and reduce the number of instructions that need to be executed, leading to faster program execution.

  4. Allow for more aggressive optimization: By providing the compiler with information about values and expressions at compile time, the compiler can perform more aggressive optimizations, such as loop unrolling, instruction scheduling, and register allocation. These optimizations can improve program performance by reducing the number of instructions that need to be executed and by maximizing the use of hardware resources.

A real-life sample

The following shows a picture of NEMA-TS2 16-channel Malfunction Management Unit (MMU). Credit: Rob Klug

Image

A Malfunction Management Unit (MMU) is a device utilized in the traffic signal control industry to detect conflicts that may arise when conflicting traffic flows are given right of way simultaneously. This is achieved through the use of a soldering board at the hardware level, which defines the compatibility of each pair of different channels. Essentially, each channel is physically connected to the signal head in the field through load switches, and the compatibility between the channels is relayed to the MMU through this hardware board.

The following illustrates an application of C++ compile time evaluation approach. It is part of the open source C++ Virtual Traffic Cabinet Framework (VTC). VTC framework is developed using modern C++ 20.

The code provides O(1) complexity for returning the start position of a given channel. Note the template functions have zero runtime overhead, while all searching are done at compile time. Apart from the performance benefits, the implementation is concise and generic for any sizable current or future evoluation of MMU compatibility cards.

/*!
 * The size of channel compatibility set. For example, for Channel 1 of MMU16,
 * its compatibility set includes 1-2, 1-3, 1-4, ..., 1-16, thus the size is 15.
 * @tparam Channel - The given MMU chanel.
 * @tparam MaxChannel - Max number of channels the MMU supports.
 * @return The size of the compatibility set of the given channel.
 */
template<size_t Channel, size_t MaxChannel> requires ((Channel >= 1) && (Channel <= MaxChannel))
constexpr size_t ChannelSegmentSize()
{
  return (MaxChannel - Channel);
}

/*!
 * The start position (0-based) for the given MMU channel in the fixed-size MMU channel compatibility byte array.
 * @tparam Channel - The given MMU channel.
 * @tparam MaxChannel - Max number of channels the MMU supports.
 * @return The start position (0-based) for the given MMU channel.
 * @remarks MMU channel compatibility is represented by a fixed-size byte array, for
 * MMU16, the byte array has 120 bytes. Each channel has a start position and total number of relevant
 * bytes in the stream describing the channel's compatibility.
 */
template<size_t Channel, size_t MaxChannel = 16> requires ((Channel >= 1) && (Channel <= MaxChannel))
constexpr size_t ChannelSegmentStartPos()
{
  if constexpr (Channel == 1) {
    return 0;
  } else if constexpr (Channel == 2) {
    return ChannelSegmentSize<1, MaxChannel>();
  } else {
    return ChannelSegmentSize<Channel - 1, MaxChannel>() + ChannelSegmentStartPos<Channel - 1>();
  }
}

constexpr

constexpr is a C++ keyword that was introduced in C++11 to allow the evaluation of expressions at compile time. It specifies that the value of a variable or function can be computed at compile time, and therefore can be used in places where a constant expression is required.

constexpr vs. const

const only guarantees that the value of a variable cannot be changed after it is initialized, whereas constexpr guarantees that the value of a variable can be computed at compile time. Therefore, constexpr is more powerful than const because it enables the use of constant expressions in more contexts.

Here are some examples of how constexpr can be used:

constexpr int square(int x) {
    return x * x;
}

constexpr int x = 5;

// y is computed at compile time
constexpr int y = square(x); 

// z is computed at run time
const int z = square(6); 

constexpr int arr_size = 10;

// arr_size is a constant expression
int arr[arr_size]; 

constexpr char c = 'A' + 1;

// static_assert is a compile-time assertion
static_assert(c == 'B', "c should be equal to 'B'"); 

constexpr function

To make a function constexpr, it must meet the following conditions:

  1. Must have a Non-void return type.
// Must return a non-void type, like int here
constexpr int square(int x) { 
    return x * x;
}

A constexpr function cannot have a return type of void, as it must produce a constant expression.

  1. Must be defined with constexpr keyword.
// Use the 'constexpr' keyword before the function definition
constexpr int factorial(int n) { 
    return (n <= 1) ? 1 : n * factorial(n - 1);
}
  1. Must not contain any definitions of variables with non-const-qualified types, unless they are initialized with a constant expression:
// Must use const-qualified type.
constexpr int sum(int a, int b) {
    const int result = a + b; 
    return result;
}

// Non-const variables are allowed as long as they are 
// initialized with a const expression.
// This is only valid when (a + b) produces a constant
// expression.
constexpr int add(int a, int b) {
    // 'sum' is initialized with a constant expression (a + b)
    int sum = a + b; 
    return sum;
}
  1. May include control structures and constructs, such as if, switch, for, while, and do-while loops, provided they don't violate other constexpr constraints. static_assert, typedef, using, if constexpr, and returnare also allowed.
#include <iostream>

constexpr int factorial(int n) {
    int result = 1;
    for (int i = 1; i <= n; ++i) {
        result *= i;
    }
    return result;
}

int main() {
    constexpr auto a = factorial(5);
    return 0;
}

The generated assembly code confirms that variable a is evaluated at the compile time:

main:                                 
        push    rbp
        mov     rbp, rsp
        mov     dword ptr [rbp - 4], 0
        mov     dword ptr [rbp - 8], 120
        xor     eax, eax
        pop     rbp
        ret
  1. Can only call other constexpr functions.
constexpr int square(int x) {
    return x * x;
}

// Only call other constexpr functions
constexpr int square_sum(int a, int b) {
    return square(a) + square(b); 
}
  1. Must produce constant expressions when called with constant expressions.
#include <iostream>

constexpr int power(int base, int exponent) {
    int result = 1;
    for (int i = 0; i < exponent; ++i) {
        result *= base;
    }
    return result;
}

int main() {
    constexpr auto b = power(2, 5);
    return 0;
}

The following assembly code confirms that no run time computation is performed when calculating power(2, 5).

main:
        push    rbp
        mov     rbp, rsp
        mov     dword ptr [rbp - 4], 0
        mov     dword ptr [rbp - 8], 32
        xor     eax, eax
        pop     rbp
        ret
  1. Can modify constexpr object that has a lifetime extends longer than the constexpr function.
constexpr int next(int x)
{
    return ++x;
}

char buffer[next(5)] = { 0 };

Constructor

constexpr constructors in C++ are used to create constant expressions of user-defined types during compile-time. They are useful because they allow for more efficient code by performing computations at compile-time and enabling the usage of user-defined types in other constexpr contexts.

constexpr constructors were introduced in C++11, along with the general constexpr specifier.

Conditions (or constraints) for constexpr constructors:

  1. The constructor must not be a copy or move constructor.
  2. Every expression and construct used in the constructor must be a constant expression.
  3. Every base class and member of the class must have a constexpr constructor.
  4. Every constructor call and full-expression in the constructor's member initializers must be a constant expression.

Here's an example of a constexpr constructor:

class Point {
public:
    constexpr Point(int x, int y) : x_(x), y_(y) {
        // Since C++14, the body of a constexpr constructor can include
        // other constructs like if statements and loops, as long as they
        // meet the constexpr requirements.
        if (x_ < 0) { x_ = 0; }
        if (y_ < 0) { y_ = 0; }
    }

    constexpr int getX() const { return x_; }
    constexpr int getY() const { return y_; }

private:
    int x_;
    int y_;
};

int main() {
    constexpr Point p1(1, 2);
    constexpr int x = p1.getX();
    constexpr int y = p1.getY();
}

Member initializer

When defining a constexpr constructor, the constructor's member initializer list must only contain constant expressions. This means that when initializing member variables or calling base class constructors, the expressions used must be evaluable compile-time. This is required to guarantee that the object can be constructed as a constant expression during compile-time.

Here's an example to illustrate this requirement:

class Base {
public:
    constexpr Base(int value) : value_(value) {}

private:
    int value_;
};

class Derived : public Base {
public:
    // Both initializers are constant expressions
    constexpr Derived(int baseValue, int derivedValue) 
        : Base(baseValue), derivedValue_(derivedValue) {} // Both initializers are constant expressions

private:
    int derivedValue_;
};

int main() {
    // Constructed as a constant expression during compile-time
    constexpr Derived d(1, 2); 
}

Destructor

If a class has a constexpr constructor and is meant to be used in a constexpr context, then the destructor should be trivial. A trivial destructor does not perform any custom actions, allowing the object to be safely used in a constexpr context.

A destructor is considered trivial if:

  1. It is not user-provided (i.e., the compiler generates the destructor implicitly).
  2. The class has no virtual functions or virtual base classes.
  3. All direct base classes have trivial destructors.
  4. For all non-static data members of the class that are of class type (or array thereof), each such class has a trivial destructor.

Here's an example of a class with a constexpr constructor and a trivial destructor:

class Point {
public:
    constexpr Point(int x, int y) : x_(x), y_(y) {}

    // Destructor is trivial (not user-provided and no custom actions)
    // ~Point() = default;

    constexpr int getX() const { return x_; }
    constexpr int getY() const { return y_; }

private:
    int x_;
    int y_;
};

int main() {
    constexpr Point p(1, 2);
}

constexpr function returning void

A member function of a class can be declared constexpr and have a return type of void, for performing a sequence of actions at compile time. For example:

class MyClass {
public:
    constexpr void doSomething() {
        myData = 42; // Set a constexpr data member
    }

    constexpr int getMyData() const {
        return myData; // Return the value of the constexpr data member
    }

private:
    int myData = 0; // Define a constexpr data member
};

int main() {
    constexpr MyClass obj;
    obj.doSomething(); // This call is evaluated at compile time
    static_assert(obj.getMyData() == 42, "Unexpected value of myData");
}

Note that constexpr void doSomething() does not have to be qualified with const.

Precision of floating-point constexpr

In C++11 and later, constexpr functions can compute floating-point expressions and return floating-point values as constant expressions.

One limitation of constexpr floating-point computations is that they must terminate in a finite number of steps known at compile time, which means that they cannot compute certain mathematical functions or operations that require an infinite number of steps or iterations. Because of this, the use of functions like std::sin and std::sqrt within constexpr functions is not allowed inside constexpr function.

Additionally, the standard imposes specific requirements on the rounding behavior of constexpr floating point operations. For example, if a constexpr floating point operation results in a value that cannot be represented exactly, the result must be rounded in a manner consistent with the floating point rounding mode specified by the implementation.

The C++ standard requires that constexpr functions produce the same results as their non-constexpr counterparts when called with the same arguments.

This means that if a non-constexpr function performs a floating point computation with a certain precision, a constexpr function that performs the same computation must produce a result that is at least as precise. The standard does not specify a minimum level of precision, but it requires that the result of a constexpr floating point computation be consistent and reproducible, so that the same result is obtained every time the computation is performed.

In practice, the precision of constexpr floating point computations will depend on the compiler and the platform being used. In general, compilers will try to produce constexpr results that are as precise as possible, but there may be cases where the precision is lower than the runtime counterpart due to limitations of the compiler or platform.

std::numeric_limits

std::numeric_limits is a class template defined in the C++ standard library that provides information about the properties of arithmetic types, such as minimum and maximum representable values, number of significant digits, and whether the type is signed or unsigned.

The std::numeric_limits class template has the following general syntax:

template<typename T>
class numeric_limits {
public:
    static constexpr bool is_specialized;
    static constexpr T min() noexcept;
    static constexpr T max() noexcept;
    static constexpr T lowest() noexcept;
    static constexpr int digits;
    static constexpr int digits10;
    static constexpr int max_digits10;
    static constexpr bool is_signed;
    static constexpr bool is_integer;
    static constexpr bool is_exact;
    static constexpr int radix;
    static constexpr T epsilon() noexcept;
    static constexpr T round_error() noexcept;
    static constexpr int min_exponent;
    static constexpr int min_exponent10;
    static constexpr int max_exponent;
    static constexpr int max_exponent10;
    static constexpr bool has_infinity;
    static constexpr bool has_quiet_NaN;
    static constexpr bool has_signaling_NaN;
    static constexpr float_denorm_style has_denorm;
    static constexpr bool has_denorm_loss;
    static constexpr T infinity() noexcept;
    static constexpr T quiet_NaN() noexcept;
    static constexpr T signaling_NaN() noexcept;
    static constexpr T denorm_min() noexcept;
};

The std::numeric_limits class template provides a set of static member functions and constants that can be used to query the properties of the template parameter type T. These functions and constants are all constexpr, which means that they can be evaluated at compile-time and used in constant expressions.

The constexpr specifier is useful for several reasons in the context of std::numeric_limits. For one, it allows the properties of a type to be determined at compile-time, which can be useful for optimization purposes. Additionally, it enables the use of these properties in other constexpr contexts, such as defining other constexpr functions or variables. This can help improve the efficiency and readability of code. For example:

#include <iostream>
#include <limits>

template<typename T>
constexpr bool is_power_of_two(T value) {
    return value != 0 && (value & (value - 1)) == 0;
}

template<typename T>
constexpr T next_power_of_two(T value) {
    static_assert(std::numeric_limits<T>::is_integer, "Type must be an integer");
    static_assert(std::numeric_limits<T>::is_signed == false, "Type must be unsigned");

    if (is_power_of_two(value)) {
        return value;
    } else {
        T result = 1;
        while (result < value) {
            result <<= 1;
        }
        return result;
    }
}

int main() {
    constexpr unsigned int x = 31;
    constexpr auto y = next_power_of_two(x);
    std::cout << "The next power of two after " << x << " is " << y << '\n';
    return 0;
}

C++20 constexpr math functions

In C++20, many math functions from the <cmath> library were made constexpr. This enables complex mathematical operations at compile time, which can lead to more efficient and optimized code.

The main advantage of using constexpr math functions is that they enable calculations at compile time rather than at runtime. This can lead to performance improvements because the compiler can optimize the code based on the known constant values. Additionally, because the values are known at compile time, they can be used in places where a constant expression is needed, such as in array sizes and template arguments.

Here are some important points to remember about constexpr math functions:

  1. Only a subset of math functions from the <cmath> library are constexpr in C++20. Other functions may still be evaluated at runtime.

  2. The arguments provided to a constexpr function must be constant expressions themselves. Otherwise, the function call will not be evaluated at compile time.

  3. constexpr math functions are subject to the same floating-point rounding and accuracy limitations as their runtime counterparts. In other words, you should be aware of potential floating-point inaccuracies when using constexpr functions in calculations.

  4. Some compilers may not yet fully support C++20 or all of its constexpr math functions. Be sure to check the documentation of the compiler being used to ensure that it supports the specific functions.

Here is a list of selected math functions that became constexpr in C++20. Note that this list is not exhaustive, but it covers some of the most commonly used functions. Once again, these functions became constexpr in C++20, not C++17.

Here's the table sorted by function name in ascending order:

FunctionDescriptionSince
absAbsolute valueC++20
acosArc cosine functionC++20
acoshInverse hyperbolic cosine functionC++20
asinArc sine functionC++20
asinhInverse hyperbolic sine functionC++20
atanArc tangent functionC++20
atan2Arc tangent function with two parametersC++20
atanhInverse hyperbolic tangent functionC++20
cbrtCube rootC++20
ceilCeiling functionC++20
copysignCopy sign of a numberC++20
cosCosine functionC++20
coshHyperbolic cosine functionC++20
divIntegral divisionC++20
dremDeprecated; use remainder insteadC++20
erfError functionC++20
erfcComplementary error functionC++20
expExponential functionC++20
exp2Base-2 exponential functionC++20
expm1Exponential function minus 1C++20
fdimPositive differenceC++20
floorFloor functionC++20
fmaFused multiply-addC++20
fmaxMaximum of two floating-point valuesC++20
fminMinimum of two floating-point valuesC++20
fmodFloating-point remainder (modulo)C++20
frexpBreak floating-point number into fractionC++20
gammaDeprecated; use tgamma insteadC++20
gamma_rDeprecated; use lgamma insteadC++20
hypotHypotenuseC++20
ilogbIntegral logarithm of exponent base-2C++20
j0Bessel function of the first kind of order 0C++20
j1Bessel function of the first kind of order 1C++20
jnBessel function of the first kind of order nC++20
ldexpMultiply by integral power of 2C++20
lgammaNatural logarithm of the absolute value of the gamma functionC++20
llrintRound to long long integral valueC++20
llroundRound to nearest long long integerC++20
logNatural logarithmC++20
log10Base-10 logarithmC++20
log1pNatural logarithm of 1 plus argumentC++20
log2Base-2 logarithmC++20
logbBase-2 logarithm of exponentC++20
lrintRound to long integral valueC++20
lroundRound to nearest long integerC++20
maxMaximum of two valuesC++20
minMinimum of two valuesC++20
modfDecompose a floating-point number into its integer and fractional partsC++20
nanGenerate quiet NaNC++20
nearbyintRound to integral value in current rounding modeC++20
nextafterNext representable floating-point valueC++20
nexttowardNext representable floating-point value toward a long doubleC++20
powPower functionC++20
remainderRemainder of the floating-point divisionC++20
remquoRemainder and quotient of the floating-point divisionC++20
rintRound to integral valueC++20
roundRound to nearest integerC++20
scalbDeprecated; use scalbn or scalbln insteadC++20
scalblnScale floating-point number by a power of FLT_RADIX as a long integerC++20
scalbnScale floating-point number by a power of FLT_RADIXC++20
significandGet the significand of a floating-point numberC++20
sinSine functionC++20
sinhHyperbolic sine functionC++20
sqrtSquare rootC++20
tanTangent functionC++20
tanhHyperbolic tangent functionC++20
tgammaGamma functionC++20
truncTruncate functionC++20
y0Bessel function of the second kind of order 0C++20
y1Bessel function of the second kind of order 1C++20
ynBessel function of the second kind of order nC++20

In C++17, lambda expressions can be used as constexpr by default, meaning they can be evaluated at compile-time. This feature enables developers to perform computations at compile-time, reducing runtime overhead and improving performance in certain cases. It can also make the code more readable and easier to understand.

Lambda expressions are anonymous functions that can be defined and used within code. They have the following general syntax:

[capture](parameters) -> return_type { function_body }

Using lambda as constexpr in C++17:

Since C++17, lambdas are implicitly constexpr by default, which means they can be used in constant expressions, as long as the lambda body and its captures are constexpr-compatible. Here's an example to illustrate this:

#include <iostream>

int main() {
    constexpr auto square = [](int x) {
        return x * x;
    };

    constexpr int result = square(5);
    static_assert(result == 25, "Square of 5 should be 25");

    std::cout << "Square of 5: " << result << std::endl;
    return 0;
}

Benefits of using lambda as constexpr

  1. Compile-time computation: Using constexpr lambdas can shift computation from runtime to compile-time, potentially improving performance for computationally expensive operations.

  2. Readability and expressiveness: By using lambdas, one can write more expressive and readable code, as functions can be defined and used in-place, right where they are needed.

  3. Type inference: Lambdas can deduce the return type automatically, making the code shorter and easier to understand.

  4. Better optimization: Since the lambda is evaluated at compile-time, the compiler has more opportunities to optimize the code further.

  5. Enhanced safety: Using constexpr ensures that the lambda can only be used in constant expressions, which can help catch errors early in the development process.

Runtime degrading

A constexpr lambda can degrade into a runtime lambda when it's used in a context that doesn't require a constant expression or when it doesn't meet the requirements for a constexpr function. In such cases, the lambda will be evaluated at runtime instead of compile-time.

Here are some conditions that can cause a constexpr lambda to degrade into a runtime lambda:

  1. Non-constexpr parameters or captures: If the lambda captures or accepts non-constexpr variables as parameters, the lambda will not be able to be evaluated at compile-time. For example:
int non_const_var = 10;
auto lambda = [non_const_var](int x) {
    return x * non_const_var;
};
int result = lambda(5); // This will be evaluated at runtime
  1. Non-constexpr expressions in the lambda body: If the lambda body contains expressions that cannot be evaluated at compile-time, the lambda will not be constexpr. For example:
#include <iostream>
#include <cmath>

constexpr auto sqrt_lambda = [](double x) {
    return std::sqrt(x); // std::sqrt is not constexpr (prior to C++20)
};

int main() {
    double result = sqrt_lambda(25.0); // This will be evaluated at runtime
    std::cout << "Square root of 25: " << result << std::endl;
    return 0;
}
  1. Using the lambda in a non-constexpr context: Even if the lambda itself is constexpr, if it is used in a context that doesn't require a constant expression, it will be evaluated at runtime. For example:
constexpr auto square = [](int x) {
    return x * x;
};

int main() {
    int input = 0;
    std::cout << "Enter an integer: ";
    std::cin >> input;

    int result = square(input); // This will be evaluated at runtime
    std::cout << "Square of " << input << ": " << result << std::endl;
    return 0;
}

In this example, although the square lambda is constexpr, it is used with a runtime input value, so it's evaluated at runtime.

When a constexpr lambda degrades into a runtime lambda, it doesn't cause any errors or warnings. It simply means that the lambda is evaluated at runtime, and the performance advantages and safety guarantees of a constexpr lambda are not achieved.

Inlining constexpr

In C++17, a constexpr static data member is implicitly inline. This means that the static data member has the same address in every translation unit that uses it, and there is no need to provide a separate definition for the data member in a source file.

The following example would produce a linker error pre-C++ 17:

// MyClass.h
class MyClass {
public:
    static constexpr int myConstExpr = 42;
};

// main.cpp
#include "MyClass.h"
#include <iostream>

void printAddress(const int *ptr);

int main() {
    // Taking the address of myConstExpr, this requires a definition.
    printAddress(&MyClass::myConstExpr); 
    return 0;
}

void printAddress(const int *ptr) {
    std::cout << "Address of myConstExpr: " << ptr << std::endl;
}

In this case, the address of MyClass::myConstExpr is required, so a separate definition is needed in a source file for pre-C++17:

// MyClass.cpp (pre-C++17)
#include "MyClass.h"

// Definition in source file required to avoid linker errors
const int MyClass::myConstExpr; 

However, in C++17, the separate definition is not necessary, as the constexpr static data member is implicitly inlined:

// MyClass.h (C++17)
class MyClass {
public:
    // Automatically inlined, no separate definition required
    static constexpr int myConstExpr = 42; 
};

The following code will not produce a linker error for pre-C++17. This is because the compilier just does a compile time replacement for the line std::cout << "Value of myConstExpr: " << MyClass::myConstExpr << std::endl;, directly replacing MyClass::myConstExpr with 42. There is no addressing involved, hence no linker error.

// MyClass.h
class MyClass {
public:
    static constexpr int myConstExpr = 42;
};

// main.cpp
#include "MyClass.h"
#include <iostream>

int main() {
    std::cout << "Value of myConstExpr: " << MyClass::myConstExpr << std::endl;
    return 0;
}

Conditional Compilation

if constexpr and #if

C++'s if constexpr is not directly intended to replace conditional defines (e.g., #ifdef or #if). While they serve somewhat similar purposes, they have different use cases and operate at different stages of the compilation process.

#ifdef and #if are preprocessor directives in C++ that allow conditional compilation. They operate at the preprocessing stage, which occurs before the actual compilation. Conditional defines are typically used to conditionally include or exclude sections of code based on compile-time conditions or macros.

On the other hand, if constexpr is a feature introduced in C++17 that allows compile-time evaluation of conditions within the context of template metaprogramming or constexpr functions. It is part of the regular C++ code and is evaluated during the compilation process, not the preprocessing stage. if constexpr allows you to conditionally choose between different branches of code based on compile-time constant expressions.

Here's an example to illustrate the difference:

#include <iostream>

#define USE_FEATURE

void doSomething() {
#ifdef USE_FEATURE
    std::cout << "Feature is enabled." << std::endl;
#else
    std::cout << "Feature is disabled." << std::endl;
#endif
}

template <bool UseFeature>
void doSomethingTemplate() {
    if constexpr (UseFeature) {
        std::cout << "Feature is enabled." << std::endl;
    } else {
        std::cout << "Feature is disabled." << std::endl;
    }
}

int main() {
    doSomething();  // Output depends on the USE_FEATURE macro.

    doSomethingTemplate<true>();  // Output depends on the template argument.
    doSomethingTemplate<false>();

    return 0;
}

In this example, doSomething() uses a conditional define to determine which section of code to compile based on the USE_FEATURE macro. On the other hand, doSomethingTemplate() is a function template that utilizes if constexpr to conditionally choose between different code branches at compile time based on the template argument.

While if constexpr can sometimes be used to achieve similar conditional behavior as conditional defines, their usage and capabilities are different. Conditional defines are more flexible and can be controlled externally via macros or command-line options, while if constexpr operates within the confines of the C++ code and allows compile-time decision making based on template arguments or constexpr conditions.

Short-circuit behavior

Unlike regular if statements, where the short-circuit behavior applies to the evaluation of the condition, if constexpr evaluates the condition at compile-time, and all branches are checked for syntactic correctness regardless of the condition's value.

In this example:

template <typename T>
void foo(T value) {
    if constexpr (std::is_integral_v<T> && (value > 0)) {
        // Code specific to integral types and positive values
        // ...
    } else {
        // Code for other cases
        // ...
    }
}

Both std::is_integral_v<T> and (value > 0) will be evaluated during compilation, regardless of the outcome of the condition. This means that any type-dependent or invalid code inside the discarded branch may still lead to compilation errors.

Branch elimination

In an if constexpr statement, the condition is evaluated at compile-time. If the condition is determined to be false during compilation, the code inside the branch that is not taken (either if or else) is discarded by the compiler. The discarded branch is not checked for syntactic correctness or compiled.

This compile-time evaluation and branch elimination make if constexpr useful for conditional compilation and optimizing code based on compile-time conditions.

By discarding the unused branch, the compiler avoids checking its syntax and does not generate any corresponding object code. This can help improve the compile time and reduce the size of the resulting binary executable.

Always provide else branch

It is generally a good practice to provide an else branch or alternative handling for all possible cases in an if constexpr statement to avoid potential runtime issues and ensure that all scenarios are properly handled.

template<class T>
auto subtract(T a, T b) {
    if constexpr (std::is_same<T, double>::value) {
        if (std::abs(a - b) < 0.0001) {
            return 0.0;
        } else {
            return a - b;
        }
    } else if constexpr (std::is_integral<T>::value) {
        return a - b;
    } else {
        static_assert(always_false<T>::value, "Non-handled type for subtract function");
    }
}

In this code, both double and integral types are explicitly handled. If a type is used that is neither double nor an integral type, the static_assert will trigger a compile-time error with a clear message, which is generally preferable to a more obscure error about invalid operations. This is a more defensive programming strategy that makes sure all potential types are handled.

constexpr virtual method

In C++20, virtual methods can be declared as constexpr, enabling their evaluation during compile time. This allows for potential optimizations where the virtual method can be resolved and reduced to a simple assignment without the overhead of a function call.

Note - such optimizations occur when the static type of the object is known at compile time.

Consider an example where the base class has a non-constexpr virtual method, but the derived class overrides it as constexpr:

class Base {
public:
    virtual int getValue() { return 42; }
};

class Derived : public Base {
public:
    constexpr int getValue() override { return 10; }
};

Suppose an object of the derived class with the static type known at compile time:

Derived der = Derived();
int value = der.getValue();

With proper compiler optimizations, the constexpr virtual method getValue can be evaluated at compile time and reduced to a direct assignment without a function call overhead. The resulting assembly code might resemble the following:

mov DWORD PTR [ebp-4], 10

This assembly code demonstrates a direct assignment of the constant value 10 to the variable value without any function call involved. The compiler can determine the value of getValue at compile time, considering the known static type of the object.

It's important to note that the specific optimization and resulting assembly code may vary depending on the compiler, compiler flags, and optimizations enabled. However, with appropriate optimizations, a constexpr virtual method can indeed be optimized to a simple assignment during compile time, avoiding the function call overhead.

try-catch

In C++20, the language standard introduced the ability to use try-catch blocks inside constexpr functions. Prior to C++20, constexpr functions were limited to containing only a subset of operations that were considered "constexpr-friendly." This limitation prevented the use of exceptions, dynamic memory allocation, and other runtime-only features.

With C++20, the restrictions on constexpr functions have been relaxed, and try-catch blocks are now allowed inside constexpr functions. This change allows for more expressive and flexible constexpr functions, enabling them to handle exceptions and perform more complex operations at compile time.

The primary motivation behind allowing try-catch blocks in constexpr functions is to enable error handling and better handling of unexpected situations during compile-time evaluation. It allows constexpr functions to handle exceptions and provide a fallback mechanism in case of errors. This can be useful in scenarios where you want to perform complex computations at compile time, but need to handle potential errors gracefully.

Here's an example that demonstrates the usage of try-catch blocks inside a constexpr function:

constexpr int divide(int a, int b) {
    try {
        return a / b;
    } catch (...) {
        return 0; // fallback value in case of division by zero or other exceptions
    }
}

int main() {
    constexpr int result = divide(10, 2);
    static_assert(result == 5, "Division failed at compile time!");
    return 0;
}

In the above example, the divide function attempts to perform division but handles the potential exception by catching any exception thrown. If an exception occurs, it returns a fallback value of 0.

It's important to note a few caveats and considerations when using try-catch blocks in constexpr functions:

  1. Exceptions inside constexpr functions are only evaluated during compile time. If an exception is thrown, the program won't terminate at runtime. Instead, the exception is handled by the constexpr function, and the program continues execution.
  2. The exception handling in constexpr functions is limited to exceptions that are handled within the constexpr function itself. It does not allow for exceptions to propagate to the calling context.
  3. The use of dynamic memory allocation (e.g., new, malloc) is still not allowed in constexpr functions, even with the introduction of try-catch blocks.

Overall, the addition of try-catch blocks in constexpr functions in C++20 expands the capabilities of compile-time evaluation and allows for more robust error handling during constexpr computations.

Default Initialization of constexpr Objects

In C++20, the language standard introduced the ability to use trivial default construction for constexpr objects. Trivial default construction means that a constexpr object can be default-initialized without explicitly providing a constructor or initializer.

Here is an example that demonstrates the usage of trivial default construction in a constexpr function:

struct X {
    bool val;
};

constexpr void f() {
    X x;
}

The above code only works with C++20. C++ 17 requires that explicit initialization for constexpr objects must be provided to ensure their proper initialization. Here's an example of explicit initializing a constexpr object in C++17:

struct X {
    bool val;
};

constexpr void f() {
    X x{true}; // Explicit initialization required in C++17
}

The following example demonstrates the usage of trivial default construction in a more practical scenario:

#include <array>

constexpr std::array<int, 5> createArray() {
    std::array<int, 5> arr;
    for (int i = 0; i < arr.size(); ++i) {
        arr[i] = i * i;
    }
    return arr;
}

int main() {
    constexpr std::array<int, 5> result = createArray();
    // Use the constexpr array at compile time
    static_assert(result[2] == 4, "Unexpected value at compile time!");
    return 0;
}

In this example, the constexpr function createArray creates an array of integers and assigns values to its elements using a loop. The array arr is default-initialized without explicitly providing an initializer because std::array is a trivial type. The function returns the resulting array, which can then be used at compile time.

By allowing trivial default construction for constexpr objects, C++20 simplifies the initialization process for certain types and enables more concise and efficient constexpr code. It can be particularly beneficial when working with trivial types or when initializing objects that don't require explicit initialization before use.

consteval and constinit

consteval

consteval keyword was introduced in C++20 as a new kind of function declaration known as a "consteval function." A consteval function is designed to be evaluated (and must be evaluable) at compile-time within constant expressions.

To be valid, a consteval function must have a literal type, meaning that its type can be used within a constant expression. Additionally, the body of a consteval function must be fully evaluated at compile-time, without any runtime execution. If these requirements are not met, the compiler will generate an error.

Here's an example of a consteval function:

consteval int square(int x) {
    return x * x;
}

Difference between consteval and constexpr

constexpr int add(int x, int y) {
    return x + y;
}

consteval int multiply(int x, int y) {
    return x * y;
}

int main() {
    constexpr int result1 = add(3, 4);        // Evaluates at compile-time
    consteval int result2 = multiply(5, 6);   // Evaluates at compile-time

    int x = 2, y = 3;
    int result3 = add(x, y);                  // Evaluates at runtime

    return 0;
}

In the code above, the add function is declared as constexpr, allowing it to be evaluated at both compile-time and runtime. The multiply function is declared as consteval, ensuring that it is evaluated strictly at compile-time within constant expressions.

constinit

The constinit specifier is introduced in C++20 to qualify a variable with static storage duration. A variable marked with constinit specifier must be initialized with compile-time constant expressions and it guarantees that the initialization will be done during the static initialization phase. It prevents the variables with static storage duration to be initialized at runtime.

  • constinit cannot be used together with constexpr or consteval as constinit is used for static initialization of variables, which happens before the program starts the execution, whereas constexpr and consteval are used to evaluate the expression at compile time.

  • constinit forces constant initialization of static or thread-local variables. It can help to limit static order initialization fiasco by using precompiled values and well-defined order rather than dynamic initialization and linking order

  • constinit does not mean that the object is immutable. constinit variable cannot be used in constant expressions

#include <array>

// init at compile time
constexpr int compute(int v) { return v*v*v; }
constinit int global = compute(10);

// won't work:
// constinit int another = global;

int main() {
    // but allow to change later...
    global = 100;

    // global is not constant!
    // std::array<int, global> arr;
}
main:
 push   rbp
 mov    rbp,rsp
 mov    DWORD PTR [rip+0x2efc],0x64        # 404010 <global>
 mov    eax,0x0
 pop    rbp
 ret
 nop    DWORD PTR [rax+rax*1+0x0]

The following table summaries all const specifiers (credit: Bartłomiej Filipek)

KeywordOn Auto VariablesOn Static/Thread_Local VariablesOn FunctionsOn Constant Expressions
constYesYesAs const member functionsSometimes
constexprYes or Implicit (in constexpr functions)YesTo indicate constexpr functionsYes
constevalNoNoTo indicate consteval functionsYes (as a result of a function call)
constinitNoTo force constant initializationNoNo, a constinit variable is not a constexpr variable

std::is_constant_evaluated

std::is_constant_evaluated function was introduced in C++20 as a standard library feature. It provides a way to check whether a function is being evaluated in a constant expression context or a non-constant expression context. This feature enables developers to write code that behaves differently during compile-time evaluation compared to runtime execution.

The motivation behind introducing std::is_constant_evaluated is to allow for explicit compile-time evaluation, which provides more control and flexibility in code execution. It allows developers to optimize certain operations or choose alternate code paths specifically for constant expressions.

Here's an example that demonstrates the usage of std::is_constant_evaluated:

#include <iostream>

void printEvaluationContext() {
    if (std::is_constant_evaluated()) {
        std::cout << "Constant expression evaluation" << std::endl;
    } else {
        std::cout << "Runtime execution" << std::endl;
    }
}

constexpr int doubleValue(int value) {
    if (std::is_constant_evaluated()) {
        return value * 2;  // Constant expression evaluation
    } else {
        std::cout << "Runtime evaluation" << std::endl;
        return value;      // Runtime execution
    }
}

int main() {
    printEvaluationContext();

    constexpr int result1 = doubleValue(10);
    std::cout << "Result 1: " << result1 << std::endl;

    int value = 20;
    int result2 = doubleValue(value);
    std::cout << "Result 2: " << result2 << std::endl;

    return 0;
}

Other C++20 Enhancements

In C++20, several enhancements were made to the constexpr feature, including the ability to modify members of a union and the inclusion of certain language constructs like dynamic_cast, typeid, and inlined assembly within constexpr functions.

Modifying members of a union in constexpr

In earlier versions of C++, modifying a member of a union within a constexpr context was not allowed. However, starting from C++20, it became possible. Here's an example that demonstrates this:

#include <iostream>

union MyUnion {
    int i;
    float f;
};

constexpr int modifyUnionMember(int value) {
    MyUnion u;
    u.i = value;
    return u.f;  // Modify the float member
}

int main() {
    constexpr int modifiedValue = modifyUnionMember(42);
    std::cout << "Modified value: " << modifiedValue << std::endl;
    return 0;
}

dynamic_cast and typeid within constexpr

C++20 also introduced the ability to use dynamic_cast and typeid operators within constexpr functions. This allows for dynamic type checks and type information retrieval during compile-time evaluation. Here's an example:

#include <iostream>
#include <typeinfo>

struct Base {
    virtual ~Base() {}
};

struct Derived : Base {};

constexpr bool isDerivedFromBase(const Base* obj) {
    return dynamic_cast<const Derived*>(obj) != nullptr;
}

constexpr const std::type_info& getTypeInfo(const Base* obj) {
    return typeid(*obj);
}

int main() {
    constexpr Base* basePtr = new Derived();
    constexpr bool isDerived = isDerivedFromBase(basePtr);
    constexpr const std::type_info& typeInfo = getTypeInfo(basePtr);

    std::cout << "Is Derived from Base? " << isDerived << std::endl;
    std::cout << "Type info: " << typeInfo.name() << std::endl;

    delete basePtr;
    return 0;
}

Inlined assembly within constexpr

C++20 also allows the use of inlined assembly within constexpr functions, enabling low-level operations during compile-time evaluation. Here's an example:

#include <iostream>

constexpr int addNumbersInlineAssembly(int a, int b) {
    int result;
    asm("add %[a], %[b];"
        : [result] "=r" (result)
        : [a] "r" (a), [b] "r" (b)
    );
    return result;
}

int main() {
    constexpr int sum = addNumbersInlineAssembly(10, 20);
    std::cout << "Sum: " << sum << std::