Library Support

Deprecated library support

ComponentPurposeStatus
template<class InternT, class ExternT, class StateT> class codecvt defined in header <locale>Provides a template class for converting between different character encodingsDeprecated in C++20
<codecvt> headerProvides a set of templates for character encoding conversion, including std::codecvt_utf8, std::codecvt_utf16, and std::codecvt_utf8_utf16Deprecated in C++17
std::wstring_convertProvides a higher-level interface for converting between wide character strings (std::wstring) and narrow character strings (std::string)Deprecated in C++17

New string types

String TypeDescriptionBasic DefinitionIntroduced in C++
u8stringA string of 8-bit characters encoded in UTF-8std::basic_string<char8_t>C++20
u16stringA string of 16-bit characters encoded in UTF-16std::basic_string<char16_t>C++11
u32stringA string of 32-bit characters encoded in UTF-32std::basic_string<char32_t>C++11

std::pmr::u8string

std::pmr::u8string is a variant of the std::basic_string template that represents a sequence of 8-bit characters encoded in UTF-8 format, and allows for custom memory allocation using user-defined memory resources. It is part of the C++20 Polymorphic Memory Resource library (std::pmr).

To use std::pmr::u8string, you need to include the <string> and <memory_resource> headers, and create a std::pmr::memory_resource object to use as the memory allocator. You can then create an instance of std::pmr::u8string by passing the memory allocator as a constructor argument.

Here's an example of how to use std::pmr::u8string:

#include <iostream>
#include <string>
#include <memory_resource>

int main()
{
    // create a memory pool using std::pmr::monotonic_buffer_resource
    std::pmr::monotonic_buffer_resource pool(1024);

    // create an std::pmr::u8string using the memory pool
    std::pmr::u8string str(u8"Hello, world!", &pool);

    // print the string to the console
    printf(reinterpret_cast<char*>(str.data()));

    return 0;
}

C11 way

FunctionDescription
mbrtoc16Converts a multibyte sequence to a 16-bit wide character
c16rtombConverts a 16-bit wide character to a multibyte sequence
mbrtoc32Converts a multibyte sequence to a 32-bit wide character
c32rtombConverts a 32-bit wide character to a multibyte sequence

These are C11 functions.

In the function name mbrtoc16, the "rto" stands for "read to". This function reads a multibyte character sequence and converts it to a 16-bit wide character. The "c16" part of the function name indicates that the output is a 16-bit character, while the "mb" part indicates that the input is a multibyte character sequence.

Here's an example of using the mbrtoc16 function to convert a multibyte sequence to a 16-bit wide character:

#include <stdio.h>
#include <uchar.h>
#include <locale.h>
#include <wchar.h>

int main() {
    setlocale(LC_ALL, "en_US.UTF-8");

    char mbstr[] = "Hello, world!"; // Note char8_t is not part of C language yet.
    char16_t wc16;
    mbstate_t state = { 0 };
    size_t res = mbrtoc16(&wc16, mbstr, sizeof(mbstr), &state);
    if (res == (size_t)-1 || res == (size_t)-2) {
        printf("Error: invalid multibyte sequence\n");
        return 1;
    }
    printf("The first character is: %lc\n", (wint_t)wc16);

    return 0;
}