Localizing alphabets with rwcstring and rwwstring – HP Integrity NonStop J-Series User Manual
Page 236

Click on the banner to return to the user guide home page.
©Copyright 1996 Rogue Wave Software
Localizing Alphabets with RWCString and
RWWString
Localizing alphabets begins with allowing them to be represented. As mentioned in
, Tools.h++ code is "8-bit clean" to accommodate the extended character set. All
of the English alphabet is described in 7 bits, leaving the eighth free for umlauts, cedillas, and
other diacritical marks and special characters. And because even 8 bits often isn't enough to
represent all the character glyphs of various languages, Tools.h++ also allows two kinds of
extensions: multibyte and wide-character encodings.
Multibyte encodings use a sequence of one or more bytes to represent a single character.
(Typically the ASCII characters are still one byte long.) These encodings are compact, but may
be inconvenient for indexing and substring operations. Wide character encodings, in contrast,
place each character in a 16- or 32-bit integral type called a wchar_t, and represent a string as an
array of wchar_t. Usually it is possible to translate a string encoded in one form into the other.
Tools.h++ two efficient string types,
RWCString
and
RWWString,
were discussed in
.
RWCString represents strings of 8-bit chars, with some support for multibyte strings.
RWWString
represents strings of wchar_t. Both provide access to Standard C Library support
for local collation conventions with the member function collate() and the global function
strXForm(). In addition, the library provides conversions between wide and multibyte
representations. The wide- and multibyte-character encodings used are those of the host system.
But representation of alphabets can be even more complex. For example, is a character upper
case, lower case, or neither? In a sorted list, where do you put the names that begin with
accented letters? What about Cyrillic names? How are wide-character strings represented on
byte streams? Standards bodies and corporate labs are addressing these issues, but the results
are not yet portable. For the time being, Tools.h++ strives to make best use of what they
provide.