beautypg.com

Appendix e, Character sets used in e-mails – Xylem CHATTER RTU and E-mail User Manual

Page 185

background image

USERS MANUAL

CHATTER™ DATA LOGGER

185

Appendix E

Character sets used in e-mails

Country

Character set

Text

Latin1

ISO-8859-1

Latin1 covers most West European languages, such as French (fr), Spanish
(es), Catalan (ca), Basque (eu), Portuguese (pt), Italian (it), Albanian (sq),
Rhaeto-Romanic (rm), Dutch (nl), German (de), Danish (da), Swedish (sv),
Norwegian (no), Finnish (fi), Faroese (fo), Icelandic (is), Irish (ga), Scottish (gd),
and English (en), incidentally also Afrikaans (af) and Swahili (sw), thus in effect
also the entire American continent, Australia and much of Africa. The most
notable exceptions are Zulu (zu) and other Bantu languages using Latin
Extended-B letters, and of course Arabic in North Africa, and Guarani (gn)
missing GEIUY with ~ tilde. The lack of the ligatures Dutch IJ, French OE and
,,German`` quotation marks is considered tolerable. The lack of the new C=-
resembling Euro currency symbol U+20AC has opened the discussion of a
new Latin0.

Latin2

ISO-8859-2

Latin2 covers the languages of Central and Eastern Europe: Czech (cs),
Hungarian (hu), Polish (pl), Romanian (ro), Croatian (hr), Slovak (sk), Slovenian
(sl), Sorbian. For Romanian the S and T had better use commas instead of
cedilla as in Turkish: the U+015F LATIN SMALL LETTER S WITH CEDILLA at
=BA ought to be read as U+0219 LATIN SMALL LETTER S WITH COMMA
BELOW etc.

Latin3

ISO-8859-3

Latin3 is popular with authors of Esperanto (eo) and Maltese (mt), and it
covered Turkish before the introduction of Latin5 in 1988.

Latin4

ISO-8859-4

Latin4 introduced letters for Estonian (et), the Baltic languages Latvian (lv,
Lettish) and Lithuanian (lt), Greenlandic (kl) and Lappish. Note that Latvian
requires the cedilla on the =BB U+0123 LATIN SMALL LETTER G WITH
CEDILLA to jump on top. Latin4 was followed by Latin6.

Cyrillic

ISO-8859-5

With these Cyrillic letters you can type Bulgarian (bg), Byelorussian (be),
Macedonian (mk), Russian (ru), Serbian (sr) and pre-1990 (no ghe with upturn)
Ukrainian (uk). The ordering is based on the (incompatibly) revised GOST
19768 of 1987 with the Russian letters except for ë sorted by Russian alphabet
(ABVGDE).

Arabic

ISO-8859-6

This is the Arabic alphabet, unfortunately the basic alphabet for the Arabic (ar)
language only and not containing the four extra letters for Persian (fa) nor the
eight extra letters for Pakistani Urdu (ur). This fixed font is not well-suited for
text display. Each Arabic letter occurs in up to four (2²) presentation forms:
initial, medial, final or separate. To make Arabic text legible you'll need a
display engine that analyses the context and combines the appropriate
glyphs on top of a handler for the reverse writing direction shared with
Hebrew. The rendering algorithm is described in the Unicode book and I
have implemented it in my arabjoin perl script.

Greek

ISO-8859-7

This is (modern monotonic) Greek (el) to me. ISO-8859-7 was formerly known
as ELOT-928 or ECMA-118:1986.

Hebrew

ISO-8859-8

And this is the Hebrew script used by Hebrew (iw) and Yiddish (ji). Like Arabic
it is written leftwards, so get your dusty old bidirectional typewriters out of the
closet! We are promised to see a Bidirectional Algorithm Reference
Implementation published as Unicode Technical Report #9 in the near future.

Latin5

ISO-8859-9

Latin5 replaces the rarely needed Icelandic letters ðýþ in Latin1 with the
Turkish ones.

Latin6

ISO-8859-10

Introduced in 1992, Latin6 rearranged the Latin4 characters, dropped some
symbols and the Latvian ŗ, added the last missing Inuit (Greenlandic
Eskimo) and non-Skolt Sami (Lappish) letters and reintroduced the Icelandic
ðýþ to cover the entire Nordic area. Skolt Sami still needs a few more accents.
Note that RFC 1345 and GNU recode contain errors and use a preliminary
and different latin6.