Some Notes on the UTF-8 Tag
First of all, as promised, the pages on this site (except the charts) should display correctly under
Netscape with the addition of the following line to the header of the html:
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8">
Now, here's why I'm not adding this tag to these sheets myself...
The Internet Explorer 5.0 treats UTF-8 as a character set, but does not
have any provision for assigning a specific font for UTF-8. (IE 4.x did have
such a provision.)
The best way to view these pages is with the Internet Explorer 5.0 with the
encoding set to "User Defined", and I'm sorry that this is so, as
I do not wish to devise "browser-specific" web pages.
When the [ charset="UTF-8" ] tag is used, the Internet Explorer 5.0
will behave unpredictably, ignoring the user's preferred fonts settings and
ignoring any font-face tags specified by the web page designer.
What this means is that the (IE 5.0) user cannot compare the performance of two different
Unicode fonts on a multilingual page without getting a misleading picture of
the fonts' character repertoire.
To illustrate:
The picture above was made using a sample of html calling for various scripts. The Internet
Explorer is set to " View - Encoding - User Defined ".
Under " Tools - Internet Options - Fonts - User Defined ", the Code2000 font was selected
to make the left half of the picture, the Bitstream Cyberbit font was selected to make the right half...
If the UTF-8 tag had been used, in order to switch to Cyberbit, I had guessed that
it would be necessary to:
" Tools - Internet Options - Fonts - Latin " select Bitstream Cyberbit...
" Tools - Internet Options - Fonts - Cyrillic " select Bitstream Cyberbit...
" Tools - Internet Options - Fonts - Armenian " select Bitstream Cyberbit...
" Tools - Internet Options - Fonts - Hebrew " select Bitstream Cyberbit...
" Tools - Internet Options - Fonts - Thai " select Bitstream Cyberbit...
" Tools - Internet Options - Fonts - Japanese " select Bitstream Cyberbit...
" Tools - Internet Options - Fonts - Chinese " select Bitstream Cyberbit...
...manually change the font for each script, which seemed like an
excessive amount of work.
Of course, when I tested this theory, it didn't pan out. Just changing the Latin-based
setting to Bitstream Cyberbit seemed to do the trick. After changing the Latin setting, the IE 5.0
was displaying Cyrillic, Thai, etc. in Cyberbit face even though my system has a different
font specified for Cyrillic and Thai.
Using script/lang tags in the HTML doesn't seem to make any difference.
Unfortunately, without the charset=UTF-8 tag, these pages are not viewable on some of
the other platforms/browsers.
Netscape, for example, seems to require the UTF-8 tag even when decimal numeric
character references are used (NCR's).
NCR's are one method of expressing Unicode. UTF-8 is an encoding scheme for Unicode.
NCR's are not UTF-8, so it seems odd that a UTF-8 tag would be required for a document
which does not contain any UTF-8 material.
Some of the pages (linked on the scriptlinks page) are in UTF-8 encoding format and the
UTF-8 tag is used in the HTML header. These sheets should display properly...
Here's a test.
The following
are the same string of characters using font-face tags for the larger Unicode-based TTFs.
For IE 5.x users, the UTF-8 tag is present in the header of this page, so this test will not
display correctly.
(There is a faction which is pushing for the removal of the font-face tag from HTML. Hopefully this
won't happen, as the tag is the only way that a web page author can specify a font. Not to
suggest that web developers should always make font-specific pages, but
there remain many instances where specific fonts are absolutely required. Perhaps
this is properly a decision to be made by individual web authors rather than a committee.)
As soon as the IE 5.0 finds the UTF-8 tag, it automatically switches the user's
preset encoding preference to UTF-8 encoding. The user must switch it back to "Latin based",
"Chinese - Traditional", "User Defined", or, whatever...
In order for the following to display properly, IE 5.0 users must [ View -
Encoding - User Defined ]. And, this only works briefly. As soon as the IE 5.0 screen is refreshed,
or it reads another page with the UTF-8 tag, it will once again automatically switch the
user's preferred encoding scheme to UTF-8. (Note: if the target page is encoded in UTF-8,
changing the IE 5.0 encoding to User Defined will result in gibberish.)
So, because of the IE 5.0's erratic behavior with the UTF-8 tag, I generally do not use
the tag in web pages unless the page is actually encoded in UTF-8.
font face = MingLiu
CHINESE: 凗凘凙凚
ŊŒŮǕʃẼ
ЖИРѦӤ
ԱԲԳԴԵ
א ב ג ד ה
ฒณดตถ
たちつてと
font face = Code2000
CHINESE: 凗凘凙凚
ŊŒŮǕʃẼ
ЖИРѦӤ
ԱԲԳԴԵ
א ב ג ד ה
ฒณดตถ
たちつてと
font face = Bitstream Cyberbit
CHINESE: 凗凘凙凚
ŊŒŮǕʃẼ
ЖИРѦӤ
ԱԲԳԴԵ
א ב ג ד ה
ฒณดตถ
たちつてと
font face = GulimChe
CHINESE: 凗凘凙凚
ŊŒŮǕʃẼ
ЖИРѦӤ
ԱԲԳԴԵ
א ב ג ד ה
ฒณดตถ
たちつてと
font face = MS Hei
CHINESE: 凗凘凙凚
ŊŒŮǕʃẼ
ЖИРѦӤ
ԱԲԳԴԵ
א ב ג ד ה
ฒณดตถ
たちつてと
font face = MS Song
CHINESE: 凗凘凙凚
ŊŒŮǕʃẼ
ЖИРѦӤ
ԱԲԳԴԵ
א ב ג ד ה
ฒณดตถ
たちつてと
font face = Tahoma
CHINESE: 凗凘凙凚
ŊŒŮǕʃẼ
ЖИРѦӤ
ԱԲԳԴԵ
א ב ג ד ה
ฒณดตถ
たちつてと
font face = Lucida Sans Unicode
CHINESE: 凗凘凙凚
ŊŒŮǕʃẼ
ЖИРѦӤ
ԱԲԳԴԵ
א ב ג ד ה
ฒณดตถ
たちつてと
font face = Times New Roman
CHINESE: 凗凘凙凚
ŊŒŮǕʃẼ
ЖИРѦӤ
ԱԲԳԴԵ
א ב ג ד ה
ฒณดตถ
たちつてと
font face = My Default Font
CHINESE: 凗凘凙凚
ŊŒŮǕʃẼ
ЖИРѦӤ
ԱԲԳԴԵ
א ב ג ד ה
ฒณดตถ
たちつてと
On my system, because of the UTF-8 tag, the Chinese glyphs appear identical
for Code2000, Bitstream Cyberbit, MS Song, Tahoma, Lucida Sans Unicode, and
Times New Roman.
The line of text which is supposed to be from Lucida Sans Unicode is using
Code2000 for Armenian, MS Song for Chinese, Tahoma for Thai, Code2000 for
Hiragana, and is mixing typefaces for the Latin even though Lucida Sans has
all but one of the Latin characters in its repertoire.
Bitstream Cyberbit has an attractive Chinese character set, yet my system is displaying
the Chinese portion of the supposed Cyberbit text using MS Song.
Because of the UTF-8 tag, the IE 5.0 display for each of the specified fonts above is
similarly incorrect.
I am grateful to Jaap Pranger for alerting me to several errors
in the HTML of this page.