beautypg.com

Localizing alphabets with rwcstring and rwwstring – HP Integrity NonStop J-Series User Manual

Page 236

background image

Click on the banner to return to the user guide home page.

©Copyright 1996 Rogue Wave Software

Localizing Alphabets with RWCString and
RWWString

Localizing alphabets begins with allowing them to be represented. As mentioned in

Chapter 2

(

Eight-bit Clean

)

, Tools.h++ code is "8-bit clean" to accommodate the extended character set. All

of the English alphabet is described in 7 bits, leaving the eighth free for umlauts, cedillas, and
other diacritical marks and special characters. And because even 8 bits often isn't enough to
represent all the character glyphs of various languages, Tools.h++ also allows two kinds of
extensions: multibyte and wide-character encodings.

Multibyte encodings use a sequence of one or more bytes to represent a single character.
(Typically the ASCII characters are still one byte long.) These encodings are compact, but may
be inconvenient for indexing and substring operations. Wide character encodings, in contrast,
place each character in a 16- or 32-bit integral type called a wchar_t, and represent a string as an
array of wchar_t. Usually it is possible to translate a string encoded in one form into the other.

Tools.h++ two efficient string types,

RWCString

and

RWWString,

were discussed in

Chapter 3

.

RWCString represents strings of 8-bit chars, with some support for multibyte strings.

RWWString

represents strings of wchar_t. Both provide access to Standard C Library support

for local collation conventions with the member function collate() and the global function
strXForm(). In addition, the library provides conversions between wide and multibyte
representations. The wide- and multibyte-character encodings used are those of the host system.

But representation of alphabets can be even more complex. For example, is a character upper
case, lower case, or neither? In a sorted list, where do you put the names that begin with
accented letters? What about Cyrillic names? How are wide-character strings represented on
byte streams? Standards bodies and corporate labs are addressing these issues, but the results
are not yet portable. For the time being, Tools.h++ strives to make best use of what they
provide.

This manual is related to the following products: