HTML Character Sets

November 10, 2010 by: aBisona

HTML Character Sets

To display an HTML page correctly, the browser must know what character-set to use.

The character-set for the early world wide web was ASCII. ASCII supports the numbers from 0-9, the uppercase and lowercase English alphabet, and some special characters.

Complete ASCII reference.

Since many countries use characters which are not a part of ASCII, the default character-set for modern browsers is ISO-8859-1.

Complete ISO-8859-1 reference.

If a web page uses a different character-set than ISO-8859-1, it should be specified in the <meta> tag.

Try it yourself


ISO Character Sets

It is the International Standards Organization (ISO) that defines the standard character-sets for different alphabets/languages.

The different character-sets being used around the world are listed below:

Character set Description Covers

ISO-8859-1
Latin alphabet part 1
North America, Western Europe, Latin America, the Caribbean, Canada, Africa

ISO-8859-2
Latin alphabet part 2
Eastern Europe

ISO-8859-3
Latin alphabet part 3
SE Europe, Esperanto, miscellaneous others

ISO-8859-4
Latin alphabet part 4
Scandinavia/Baltics (and others not in ISO-8859-1)

ISO-8859-5
Latin/Cyrillic part 5
The languages that are using a Cyrillic alphabet such as Bulgarian, Belarusian, Russian and Macedonian

ISO-8859-6
Latin/Arabic part 6
The languages that are using the Arabic alphabet

ISO-8859-7
Latin/Greek part 7
The modern Greek language as well as mathematical symbols derived from the Greek

ISO-8859-8
Latin/Hebrew part 8
The languages that are using the Hebrew alphabet

ISO-8859-9
Latin 5 part 9
The Turkish language. Same as ISO-8859-1 except Turkish characters replace Icelandic ones

ISO-8859-10
Latin 6 Lappish, Nordic, Eskimo
The Nordic languages

ISO-8859-15
Latin 9 (aka Latin 0)
Similar to ISO 8859-1 but replaces some less common symbols with the euro sign and some other missing characters

ISO-2022-JP
Latin/Japanese part 1
The Japanese language

ISO-2022-JP-2
Latin/Japanese part 2
The Japanese language

ISO-2022-KR
Latin/Korean part 1
The Korean language


The Unicode Standard

Because the character-sets listed above are limited in size, and are not compatible in multilingual environments, the Unicode Consortium developed the Unicode Standard.

The Unicode Standard covers all the characters, punctuations, and symbols in the world.

Unicode enables processing, storage and interchange of text data no matter what the platform, no matter what the program, no matter what the language.


The Unicode Consortium

The Unicode Consortium develops the Unicode Standard. Their goal is to replace the existing character-sets with its standard Unicode Transformation Format (UTF).

The Unicode Standard has become a success and is implemented in XML, Java, ECMAScript (JavaScript), LDAP, CORBA 3.0, WML, etc. The Unicode standard is also supported in many operating systems and all modern browsers.

The Unicode Consortium cooperates with the leading standards development organizations, like ISO, W3C, and ECMA.

Unicode can be implemented by different character-sets. The most commonly used encodings are UTF-8 and UTF-16:

Character-set Description

UTF-8
A character in UTF8 can be from 1 to 4 bytes long. UTF-8 can represent any character in the Unicode standard. UTF-8 is backwards compatible with ASCII. UTF-8 is the preferred encoding for e-mail and web pages

UTF-16
16-bit Unicode Transformation Format is a variable-length character encoding for Unicode, capable of encoding the entire Unicode repertoire. UTF-16 is used in major operating systems and environments, like Microsoft Windows 2000/XP/2003/Vista/CE and the Java and .NET byte code environments

Tip: The first 256 characters of Unicode character-sets correspond to the 256 characters of ISO-8859-1.

Tip: All HTML 4 processors already support UTF-8, and all XHTML and XML processors support UTF-8 and UTF-16!

Share and Enjoy:
  • Print
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google Bookmarks
  • Blogplay
Have you found this script useful? Please support author by PayPal donation.
Filed under: HTML,Webmasters
Tags:

Comments

One Response to “HTML Character Sets”
  1. Backlinks says:

    Nice!! Great Ifo. Great People. Great Blog. Thank you for all the great sharing that is being done here.

Leave a Reply