Library Support

Deprecated library support

Component	Purpose	Status
`template<class InternT, class ExternT, class StateT> class codecvt` defined in header `<locale>`	Provides a template class for converting between different character encodings	Deprecated in C++20
`<codecvt>` header	Provides a set of templates for character encoding conversion, including `std::codecvt_utf8`, `std::codecvt_utf16`, and `std::codecvt_utf8_utf16`	Deprecated in C++17
`std::wstring_convert`	Provides a higher-level interface for converting between wide character strings (`std::wstring`) and narrow character strings (`std::string`)	Deprecated in C++17

New string types

String Type	Description	Basic Definition	Introduced in C++
u8string	A string of 8-bit characters encoded in UTF-8	`std::basic_string<char8_t>`	C++20
u16string	A string of 16-bit characters encoded in UTF-16	`std::basic_string<char16_t>`	C++11
u32string	A string of 32-bit characters encoded in UTF-32	`std::basic_string<char32_t>`	C++11

`std::pmr::u8string`

std::pmr::u8string is a variant of the std::basic_string template that represents a sequence of 8-bit characters encoded in UTF-8 format, and allows for custom memory allocation using user-defined memory resources. It is part of the C++20 Polymorphic Memory Resource library (std::pmr).

To use std::pmr::u8string, you need to include the <string> and <memory_resource> headers, and create a std::pmr::memory_resource object to use as the memory allocator. You can then create an instance of std::pmr::u8string by passing the memory allocator as a constructor argument.

Here's an example of how to use std::pmr::u8string:

#include <iostream>
#include <string>
#include <memory_resource>

int main()
{
    // create a memory pool using std::pmr::monotonic_buffer_resource
    std::pmr::monotonic_buffer_resource pool(1024);

    // create an std::pmr::u8string using the memory pool
    std::pmr::u8string str(u8"Hello, world!", &pool);

    // print the string to the console
    printf(reinterpret_cast<char*>(str.data()));

    return 0;
}

C11 way

Function	Description
`mbrtoc16`	Converts a multibyte sequence to a 16-bit wide character
`c16rtomb`	Converts a 16-bit wide character to a multibyte sequence
`mbrtoc32`	Converts a multibyte sequence to a 32-bit wide character
`c32rtomb`	Converts a 32-bit wide character to a multibyte sequence

These are C11 functions.

In the function name mbrtoc16, the "rto" stands for "read to". This function reads a multibyte character sequence and converts it to a 16-bit wide character. The "c16" part of the function name indicates that the output is a 16-bit character, while the "mb" part indicates that the input is a multibyte character sequence.

Here's an example of using the mbrtoc16 function to convert a multibyte sequence to a 16-bit wide character:

#include <stdio.h>
#include <uchar.h>
#include <locale.h>
#include <wchar.h>

int main() {
    setlocale(LC_ALL, "en_US.UTF-8");

    char mbstr[] = "Hello, world!"; // Note char8_t is not part of C language yet.
    char16_t wc16;
    mbstate_t state = { 0 };
    size_t res = mbrtoc16(&wc16, mbstr, sizeof(mbstr), &state);
    if (res == (size_t)-1 || res == (size_t)-2) {
        printf("Error: invalid multibyte sequence\n");
        return 1;
    }
    printf("The first character is: %lc\n", (wint_t)wc16);

    return 0;
}