Multibyte strings – HP Integrity NonStop J-Series User Manual
Page 60

Click on the banner to return to the user guide home page.
©Copyright 1996 Rogue Wave Software
Multibyte Strings
Class
RWCString
provides limited support for multibyte strings, sometimes used in representing
various alphabets (see
). Because a multibyte character can
consist of two or more bytes, the length of a string in bytes may be greater than or equal to the
number of actual characters in the string.
If the
RWCString
contains multibyte characters, you should use member function mbLength() to
return the number of characters. On the other hand, if you know that the RWCString does not
contain any multibyte characters, then the results of length() and mbLength() will be the same,
and you may want to use length() because it is much faster. Here's an example using a multibyte
string in Sun:
RWCString Sun("\306\374\315\313\306\374");
cout << Sun.length(); // Prints "6"
cout << Sun.mbLength(); // Prints "3"
The string in Sun is the name of the day Sunday in Kanji, using the EUC (Extended UNIX Code)
multibyte code set. With the EUC, a single character may be 1 to 4 bytes long. In this example,
the string Sun consists of 6 bytes, but only 3 characters.
In general, the second or later byte of a multibyte character may be null. This means the length in
bytes of a character string may or may not match the length given by strlen(). Internally,
RWCString
makes no assumptions
[3]
about embedded nulls, and hence can be used safely with
character sets that use null bytes. You should also keep in mind that while RWCString::data()
always returns a null-terminated string, there may be earlier nulls in the string. All of these
effects are summarized in the following program:
#include
#include
#include
main() {
RWCString a("abc"); // 1
RWCString b("abc\0def"); // 2
RWCString c("abc\0def", 7); // 3
cout << a.length(); // Prints "3"
cout << strlen(a.data()); // Prints "3"