PDA

View Full Version : test unicode


jdh
May 21st, 2007, 03:12 PM
giờ bị cặn

jdh
May 21st, 2007, 03:40 PM
Experimental results (determined by using browser's [FF] "view source"):

The meta tag in the html header generated by vbulletin reads as follows:

meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"

The unicode characters (the ones with the diacritical marks) actually exist in the raw HTML of the test msg. as decimal number escape sequences of the form &#nnnn

Choosing UTF-8 encoding (instead of ISO-8859-1) in the browser [FF] did not affect the rendering. However the particular UNICODE characters in this example are not a wide sample of either Vietnamese vowels or of UNICODE UTF-8 characters. The fact that the rendering was the same under both of these encodings seems to indicate that ISO-8859-1 must have a subset of codes which is compatible with a subset of the UTF-8 codes (reasonable design criterion no doubt).

BTW my testing was done on Windows 98 (UNICODE support rather lacking) and I do not have the Microsoft Arial Unicode MS TTF installed. This TTF requires a license from MS (e.g. Word 2000, which I have but haven't bothered to find the install CD to load up that font yet).

The Character Map application in Windows 98 doesn't seem to work at all with this (as expected). If I get curious enough, I might try it on NT 4.0 SP6a.

DH

jdh
May 21st, 2007, 03:45 PM
giờ bị cặn


giữ

dự định

trying some harder vowels

DH

jdh
May 21st, 2007, 03:53 PM
If any sysop wants to move or erase this thread, no problem. However, leaving it alone would be ok with me too, in case anybody is interested in the encoding and rendering of international alphabets, etc.

DH

Judy G. Russell
May 21st, 2007, 09:27 PM
If any sysop wants to move or erase this thread, no problem. However, leaving it alone would be ok with me too, in case anybody is interested in the encoding and rendering of international alphabets, etc.I suspect that what you see depends on a lot of factors -- the fonts on your computer, the software being used by the forum, and the like -- above and beyond the simple encoding.

jdh
May 21st, 2007, 11:51 PM
I suspect that what you see depends on a lot of factors -- the fonts on your computer, the software being used by the forum, and the like -- above and beyond the simple encoding.
It appears that the Vietnamese (a relatively small subset of the full UNICODE 60,000+ characters) UNICODE characters (specifically UTF-8) were already included in the true type font version 2.76 that came out with Windows 98 , at least in the standard Windows fonts such as Arial, Times New Roman, etc. I verified that my current font files for the standard fonts were version 2.76 and that the date on the files appeared to be the release date of Windows 98 SE.
And also it appears that the default encoding that vbulletin declares in the meta tag in the header is UTF-8 (part of UNICODE), which is probably not necessary for most customers of vbulletin, but nevertheless a good default. (This is irrelevant to us, but I have no idea if vbulletin would allow any other option for this meta tag in the header of all the pages.)

The full UNICODE font files from Microsoft and other vendors are huge (over 10Mb for each single font) and I don't have any of those big files in my font folder on this computer. With those huge TTFonts you can display UNICODE characters for probably many more non-european languages (such as South Asia, East Asia, etc.).

From what I've been seeing (and not just in Vietnamese) UNICODE is becoming much more popular around the world. People don't have to install as much specialized software on their computers and they can work in multiple languages in a single document without jumping thru hoops.

DH

Judy G. Russell
May 22nd, 2007, 08:52 AM
From what I've been seeing (and not just in Vietnamese) UNICODE is becoming much more popular around the world. People don't have to install as much specialized software on their computers and they can work in multiple languages in a single document without jumping thru hoops.Well, without jumping through as many hoops. You still have to have support for the subset of Unicode that you want.