|
Canonical Names of CharSets Chart
January, 2006
Ever get your Character Set tags and labels mixed up when switching between syntaxes in a web or XML production system? Refer to the handy chart on this page and keep yourself in order, keep your users happy. What perl knows as one name, Java may recognize differently, and XML (or HTML) may know still a different name for the same thing.
|
Java
|
Perl
|
XML
|
HTML
|
---|
ISO8859_1
|
iso-8859-1
|
ISO-8859-1
|
ISO-8859-1
|
ASCII
|
ascii
|
US-ASCII
|
US-ASCII
|
UTF8
|
utf8
|
UTF-8
|
UTF-8
|
UTF-16
|
UTF-16
|
UTF-16
|
UTF-16
|
EUC_JP
|
euc-jp
|
EUC-JP
|
EUC-JP
|
SJIS
|
shiftjis
|
Shift_JIS
|
Shift_JIS
|
GBK
|
cp936
|
GBK
|
GBK
|
EUC_KR
|
euc-kr
|
EUC-KR
|
EUC-KR
|
Big5
|
big5-eten
|
Big5
|
Big5
|
Big5_HKSCS
|
big5-hkscs
|
Big5-HKSCS
|
Big5-HKSCS
|
EUC_CN
|
euc-cn
|
GB2312
|
GB2312
|
References:
HTML and XML: IANA MIME types list
Java: Java Encodings Doc
Perl: easily found using Encode::resolve_alias($alias);
Perl internationalization requires 5.6.*, but 5.8 is highly recommended. Perl does automatic translation of an identified encoding alias to the canonical. Requesting encoding "sjis" in perl will automatically use "shiftjis". It's very flexible. Java, on the other hand, is less flexible: exact specification of target encoding is recommended. HTML and XML are flexible in the use of upper and lowercase letters, but still the exact canonical syntax shown here is required.
Use the HTML and XML syntax shown here in your page CharSet META HTTP-EQUIV declaration. You may view the source of this page for an example.
|
|
|