GB2312 Character Set

GB: An abbreviation of Guojia Biaozhun, or Guo Biao, meaning "national standard" in Chinese.

GB2312: A coded character set established by the government of People's Republic of China in 1980.

Main features of GB2312:

  • It contains 7445 characters, including 6763 Hanzi and 682 non-Hanzi characters.
  • It is for simplified Chinese characters only. The traditional Chinese characters are included in Big5 character set.
  • It is used mainly in mainland China and Singapore.

GB2312 arranges characters into a matrix of 94 rows and 94 columns based on the following rules:

        # of 
Rows Chars Characters 01 94 Special symbols
02 72 Paragraph numbers 03 94 Latin characters 04 83 Hiragana characters 05 86 Katakana characters 06 48 Greek characters 07 66 Cyrillic characters 08 63 Pinyin accented vowels and zhuyin symbols 09 76 Box and table drawing symbols 16-55 3755 Hanzi level 1, ordered by pinyin 56-87 3008 Hanzi level 2, ordered by radical, then stroke


This book provides you a list of all characters in GB2312 and thier row numbers and columns.  

Row 01: Regular Symbols

R.C.    GB   Uni. UTF-8   R.C.   GB   Uni. UTF-8 

0101   A1A1 3000 E38080 0102 、 A1A2 3001 E38081 0103 。 A1A3 3002 E38082 0104 · A1A4 00B7 C2B7 0105 ˉ A1A5 02C9 CB89 0106 ˇ A1A6 02C7 CB87 0107 ¨ A1A7 00A8 C2A8 0108 〃 A1A8 3003 E38083 0109 々 A1A9 3005 E38085 0110 — A1AA 2014 E28094

[출처 : http://www.herongyang.com/gb2312/symbol.html]


GB2312 Codes

GB2312 assigns a 2-byte native code for each character.

The first byte is called the high byte, containing the row number plus 32;

the second byte is called the low byte, containing the column number plus 32.

For example, if a character is located at row 16 and column 1, its high byte will be 16 + 32 = 48 (0x30), and log byte will be 1 + 32 = 33 (0x21). Put them together, its native code will be 0x3021.

I guess the reason to add 32 on both row number and column is for the byte value to not fall into the low value range, which is usually reserved to represents controlling commands in many computer systems.

However, the byte values of GB2312 native codes are still colliding with ASCII codes. To resolve this problem, a value of 128 is added to both bytes of the native codes. For example, if a character is located at row 16 and column 1, its native code will be 0x3021, and its modified code will be 0xB0A1. (-> 128 = 0x80, 0x3021 + 0x8080 = 0xB0A1)

These modified codes are adopted as the GB2312 standard codes, which can be safely mixed together with ASCII codes.

This book provides you a list all GB2312 characters and their codes.

:: Row/Col 에서 GB2312로 변환하는 방법임, ASCII에서 GB2312 가 아님!

 

GB2312 vs. Unicode

GB2312 character set is sub set of Unicode character set. This means that every character defined in GB2312 is also defined in Unicode. However, GB2312 codes and Unicode codes are totally un-related. For example, GB2312 character with code value of 0xB0A1 has a Unicode code value of 0x554A. There no mathematical formula to convert a GB2312 code to a Unicode code of the same character. This book provides you a complet map of all GB2312 codes and thier corresponding Unicode codes. The corresponding UTF-8 (Unicode Transformation Format - 8-bit) are also listed in the map.

 

[출처 : http://www.herongyang.com/gb2312/overview.html]

'모종의 음모 > GB2312(중국어 간체)' 카테고리의 다른 글

GB2312 ASCII 부분 처리  (0) 2009.03.16
GB2312 to Unicode mapping table  (0) 2009.03.16
Posted by 구차니