<title>R&eacute;sultats pour&nbsp;manual</title>

[출처 : http://advancedsearch.motorola.com/socialsearch/query?q=manual&qp...]



(머리에) 총 맞은 것처럼 문득 이런 생각이 들었습니다.
'아하! 프랑스어 핸드폰 메뉴얼을 보면 프랑스어 입력 방법을 알 수 있지 않을까!'

라는 전제는 맞았는데..
Oh my god!
내가 프랑스어를 전혀 모른다는 필요조건을 만족 시켜 주지 않음으로 인한 좌절 OTL 털썩

아무튼 V150 이라는 핸드폰의 메뉴얼을 보니..












먼소리여!!!! OTL


[출처 : http://www.motorola.com/Hellomoto/...]
Posted by 구차니
00220 char iso_8859_15_chars[] =
00221 {
00222       0x60, 0x27, // GRAVE ACCENT --> APOSTROPHE
00223       0xA0, 0x20, // NO-BREAK SPACE --> SPACE
00224       0xA2, 0x63, // CENT SIGN --> c
00225       0xA6, 0x53, // LATIN CAPITAL LETTER S WITH CARON --> S
00226       0xA8, 0x73, // LATIN SMALL LETTER S WITH CARON --> s
00227       0xA9, 0x43, // COPYRIGHT SIGN --> C
00228       0xAA, 0x61, // FEMININE ORDINAL INDICATOR --> a
00229       0xAB, 0x3C, // LEFT-POINTING DOUBLE ANGLE QUOTATION MARK --> <
00230       0xAC, 0x2D, // NOT SIGN --> -
00231       0xAD, 0x2D, // SOFT HYPHEN --> -
00232       0xAE, 0x52, // REGISTERED SIGN --> R
00233       0xAF, 0x2D, // MACRON --> -
00234       0xB0, 0x6F, // DEGREE SIGN --> o
00235       0xB1, 0x2B, // PLUS-MINUS SIGN --> +
00236       0xB2, 0x32, // SUPERSCRIPT TWO --> 2
00237       0xB3, 0x33, // SUPERSCRIPT THREE --> 3
00238       0xB4, 0x5A, // LATIN CAPITAL LETTER Z WITH CARON --> Z
00239       0xB5, 0x75, // MICRO SIGN --> u
00240       0xB6, 0x49, // PILCROW SIGN --> I
00241       0xB7, 0x2E, // MIDDLE DOT --> .
00242       0xB8, 0x7A, // LATIN SMALL LETTER Z WITH CARON --> z
00243       0xB9, 0x31, // SUPERSCRIPT ONE --> 1
00244       0xBA, 0x6F, // MASCULINE ORDINAL INDICATOR --> o
00245       0xBB, 0x3E, // RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK --> >
00246       0xBC, 0x4F, // LATIN CAPITAL LIGATURE OE --> O
00247       0xBD, 0x6F, // LATIN SMALL LIGATURE OE --> o
00248       0xBE, 0x59, // LATIN CAPITAL LETTER Y WITH DIAERESIS --> Y
00249       0xC0, 0x41, // LATIN CAPITAL LETTER A WITH GRAVE --> A
00250       0xC1, 0x41, // LATIN CAPITAL LETTER A WITH ACUTE --> A
00251       0xC2, 0x41, // LATIN CAPITAL LETTER A WITH CIRCUMFLEX --> A
00252       0xC3, 0x41, // LATIN CAPITAL LETTER A WITH TILDE --> A
00253       0xC7, 0x09, // LATIN CAPITAL LETTER C WITH CEDILLA --> 0x09 (LATIN CAPITAL LETTER C WITH CEDILLA)
00254       0xC8, 0x45, // LATIN CAPITAL LETTER E WITH GRAVE --> E
00255       0xCA, 0x45, // LATIN CAPITAL LETTER E WITH CIRCUMFLEX --> E
00256       0xCB, 0x45, // LATIN CAPITAL LETTER E WITH DIAERESIS --> E
00257       0xCC, 0x49, // LATIN CAPITAL LETTER I WITH GRAVE --> I
00258       0xCD, 0x49, // LATIN CAPITAL LETTER I WITH ACUTE --> I
00259       0xCE, 0x49, // LATIN CAPITAL LETTER I WITH CIRCUMFLEX --> I
00260       0xCF, 0x49, // LATIN CAPITAL LETTER I WITH DIAERESIS --> I
00261       0xD0, 0x44, // LATIN CAPITAL LETTER ETH --> D
00262       0xD2, 0x4F, // LATIN CAPITAL LETTER O WITH GRAVE --> O
00263       0xD3, 0x4F, // LATIN CAPITAL LETTER O WITH ACUTE --> O
00264       0xD4, 0x4F, // LATIN CAPITAL LETTER O WITH CIRCUMFLEX --> O
00265       0xD5, 0x4F, // LATIN CAPITAL LETTER O WITH TILDE --> O
00266       0xD7, 0x78, // MULTIPLICATION SIGN --> x
00267       0xD9, 0x55, // LATIN CAPITAL LETTER U WITH GRAVE --> U
00268       0xDA, 0x55, // LATIN CAPITAL LETTER U WITH ACUTE --> U
00269       0xDB, 0x55, // LATIN CAPITAL LETTER U WITH CIRCUMFLEX --> U
00270       0xDD, 0x59, // LATIN CAPITAL LETTER Y WITH ACUTE --> Y
00271       0xDE, 0x62, // LATIN CAPITAL LETTER THORN --> b
00272       0xE1, 0x61, // LATIN SMALL LETTER A WITH ACUTE --> a
00273       0xE2, 0x61, // LATIN SMALL LETTER A WITH CIRCUMFLEX --> a
00274       0xE3, 0x61, // LATIN SMALL LETTER A WITH TILDE --> a
00275       0xE7, 0x09, // LATIN SMALL LETTER C WITH CEDILLA --> LATIN CAPITAL LETTER C WITH CEDILLA
00276       0xEA, 0x65, // LATIN SMALL LETTER E WITH CIRCUMFLEX --> e
00277       0xEB, 0x65, // LATIN SMALL LETTER E WITH DIAERESIS --> e
00278       0xED, 0x69, // LATIN SMALL LETTER I WITH ACUTE --> i
00279       0xEE, 0x69, // LATIN SMALL LETTER I WITH CIRCUMFLEX --> i
00280       0xEF, 0x69, // LATIN SMALL LETTER I WITH DIAERESIS --> i
00281       0xF0, 0x6F, // LATIN SMALL LETTER ETH --> o
00282       0xF3, 0x6F, // LATIN SMALL LETTER O WITH ACUTE --> o
00283       0xF4, 0x6F, // LATIN SMALL LETTER O WITH CIRCUMFLEX --> o
00284       0xF5, 0x6F, // LATIN SMALL LETTER O WITH TILDE --> o
00285       0xF7, 0x2F, // DIVISION SIGN --> / (SOLIDUS)
00286       0xFA, 0x75, // LATIN SMALL LETTER U WITH ACUTE --> u
00287       0xFB, 0x75, // LATIN SMALL LETTER U WITH CIRCUMFLEX --> u
00288       0xFD, 0x79, // LATIN SMALL LETTER Y WITH ACUTE --> y
00289       0xFE, 0x62, // LATIN SMALL LETTER THORN --> b
00290       0xFF, 0x79, // LATIN SMALL LETTER Y WITH DIAERESIS --> y
00291
00292       0   , 0
00293 };

00295 int special_char2gsm(char ch, char *newch)
00296 {
00297   int table_row = 0;
00298   char *table = iso_8859_15_chars;
00299 
00300   while (table[table_row *2])
00301   {
00302     if (table[table_row *2] == ch)
00303     {
00304       if (newch)
00305         *newch = table[table_row *2 +1];
00306       return 1;
00307     }
00308     table_row++;
00309   }
00310 
00311   return 0;
00312 }
[출처 : http://smstools.sourcearchive.com/documentation/3.1/charset_8c-source.html]

ISO 8859-15 Added the Euro sign and other rationalisations to ISO 8859-1

[링크 : http://en.wikipedia.org/wiki/ISO/IEC_8859-15]


Unicode Character 'LATIN CAPITAL LETTER A WITH GRAVE' (U+00C0)


Encodings
HTML Entity (decimal) &#192;
HTML Entity (hex) &#xc0;
HTML Entity (named) &Agrave;
How to type in Microsoft Windows Alt +00C0
Alt 0192
UTF-8 (hex) 0xC3 0x80 (c380)
UTF-8 (binary) 11000011:10000000
UTF-16 (hex) 0x00C0 (00c0)
UTF-16 (decimal) 192
UTF-32 (hex) 0x000000C0 (00c0)
UTF-32 (decimal) 192
C/C++/Java source code "\u00C0"
Python source code u"\u00C0"
More...


[링크 : http://www.fileformat.info/info/unicode/char/00c0/index.htm]


그렇게나 출장기간동안 나를 광분 상태로 몰아 넣었떤 저 조합방법의 이름이..
ISO 8859-15 였다..

젠장 -ㅁ-! 8859는 지겹도록 보던건데, 15는 새로 생긴건가?!?!?
Posted by 구차니

Dead key

A dead key or key combination does not generate a character when struck, but modifies the character generated by the key struck immediately after. On some systems, there is no indication to the user that a dead key has been struck, but in some text-entry systems the diacritical mark is displayed along with an indication that the system is waiting for another keystroke: either the base character to be marked, an additional diacritical mark, or space to produce the diacritical mark in isolation.

Many languages use the Latin alphabet and have diacritically-marked letters for which unique keys do not exist on all keyboards. For example, on some keyboard layouts, the acute accent key is a dead key; in this case, striking acute accent then a results in á. Acute accent followed by space results in an acute accent in isolate form.

Most modern old keyboards conform to the ISO 9995 layout. This layout was first defined by the user group at AFNOR in 1984 working under the direction of Alain Souloumiac [1]. Based on this work, a well known ergonomic expert wrote a report (Yves Neuville, Le clavier bureautique et informatique, Cedic-Natan 1985) which was adopted at the ISO Berlin meeting in 1985 and became the reference for the keyboards’ layout.

In Mac OS X, many keyboard layouts employ dead keys. The U.S. Extended layout employs dead keys extensively (reached with option and option-shift) allowing a large inventory of characters to be easily typed. In the U.S. layout, the following smaller selection of dead keys appears (all reached with simply option):

  • option-e (á, é, í, ó, ú)
  • option-` (à, è, ì, ò, ù)
  • option-u (ä, ë, ï, ö, ü, ÿ)
  • option-i (â, ê, î, ô, û)
  • option-n (ã, õ, ñ)
  • option-c (ç)

The user simply types the base character after striking the dead key. For example, the key-strokes option-e and e result in the character é. In Mac OS X, pressing one of these key combinations creates the accent and highlights it, then the final character appears when the key for the base character is pressed. Some diacritically-marked Latin letters, of course, such as ŵ (used in Welsh), cannot be typed with the U.S. layout. That layout, which predates Unicode, provides access only to characters found in the legacy Mac Roman character set and does not support other diacritics, such as ˇ (caron), that are not commonly found in Western European languages (but which are commonly used in many Eastern European languages). However, the Mac OS X U.S. Extended keyboard layout, which was released after Unicode support became common, does provide access to many more diacritics.

The X Window System (used by most Unix-like operating systems, including most Linux distributions) support a Compose key. This dead key allows access to a wide range of extra characters by interpreting the next keystrokes following it. Some keyboards have a key labelled "Compose", but any key can be configured to serve this function.

In AmigaOS dead keys were called "deaf keys" and were generated by the pressing of ALT key (Eg: "ALT-F" combination of keys + "a" key results in "á"; "ALT-G" combine + "e" results in "è"; etc.). AmigaOS was the first Operating System to use officially an international approved standard ANSI ISO8859-1 layout for all its internal codepage operations and keyboard layout.

[링크 : http://en.wikipedia.org/wiki/Keyboard_layout#Dead_key]


Dead keys are commonly used to generate accented letters, because that way one does not need one key for each possible combination of letter and accent, but only one dead key for each accent in addition to the usual letter keys.

For example, if a keyboard has a dead key `, the French character e accent grave (è) can be generated by pressing first `, then e. Usually pressing a dead key followed by space produces the character denoted by the dead key; e.g. ¨space results in “¨”.

By construction, this has no restrictions on a typewriter, so you could place one on a q for example: With Unicode combining characters, this might look like q́. On the other hand, computers often do not work this way; ´q results in ´q.

In Microsoft Word, using the Control key with a key that usually resembles the diacritic (e.g. ^ for a circumflex) acts as a dead key. Many non-English keyboard layouts have dead keys directly on the keyboard. The US-International keyboard layout available on Windows and the X Window System place dead keys directly on similar-looking punctuation marks.

Old computer systems such as the MSX often had a special labeled “dead key”, which in combination with the Ctrl and Shift keys could add the accents ´, `, ˆ and ¨ to vowels that were typed subsequently.

[링크 : http://en.wikipedia.org/wiki/Dead_key]


Some common compose combinations
Hitting
compose
then
this
and
then
this
renders
this
' a á
' A Á
" a ä
" A Ä
` a à
` A À
~ a ã
~ A Ã
^ a â
^ A Â
o a å
o A Å
Vowels support most of the above
s s ß
, c ç
, C Ç
O R ®
O C ©
< < «
> >  »
. . ·
x x ×
-  : ÷
^ 0 °
^ 1 ¹
^ 2 ²
^ 3 ³
s o/0 §
1 2 ½
1 4 ¼
3 4 ¾
/ O ø
/ O Ø
- d ð
- D Ð
~ n ñ
t h þ
T H Þ
a e æ
A E Æ
 !  ! ¡
 ?  ? ¿
- L £
= E
= Y ¥
| c ¢
o x ¤
/ / \

[링크 : http://en.wikipedia.org/wiki/Compose_key]

Posted by 구차니

AZERTY

The AZERTY layout is used in France, Belgium and some neighbouring countries. It differs from the QWERTY layout thus:

  • A and Q are swapped,
  • Z and W are swapped,
  • M is moved to the right of L (where colon/semicolon is on a US keyboard),
  • The digits 0 to 9 are on the same keys, but to be typed the shift key must be pressed. The unshifted positions are used for accented characters.

The French and Belgian AZERTY keyboards also have special characters used in the French language, such as ç, à, é and è, and other characters such as &, ", ' and §, all located under the numbers.

Some French people use the Canadian Multilingual standard keyboard. The Portuguese (Portugal) keyboard layout may also be preferred, as it provides all French accents (aigu, grave, tréma, tilde, circumflex, cedilla, and also quotation marks «») and its dead-letter option for all the accent keys allow for easy input of all the possibilities in French and most other languages (áàäãâéèëêíìïîóòöõôúùüû). Ç is, however, a separate key, as can be seen above.

French





Canadian French


This keyboard layout is commonly used in Canada by French-speaking Canadians. It is the most popular layout for laptops and stand-alone keyboards targeting French speakers. Although not as versatile as the Canadian Multilingual Standard keyboard, it can be used to type all accented French characters. Of course, it allows to write English as well. It remains popular mainly because of its close similarity to the basic US keyboard commonly used by English-speaking Canadians. As a general rule the French (France) keyboard layout is not used by Canadians.





[링크 : http://en.wikipedia.org/wiki/Keyboard_layout]
Posted by 구차니
 

아래 링크중 가장 그나마 잘보이는 녀석을 슬쩍 -ㅁ-


대충보니.. 알파벳 순서대로 나열한 느낌이다..

[출처 : http://www.symbian-freak.com/forum/ 글중 하나]
Posted by 구차니
sndpeek :
real-time audio visualization

sndpeek is just what it sounds (and looks) like:

  • real-time 3D animated display/playback
  • can use mic-input or wav/aiff/snd/raw/mat file (with playback)
  • time-domain waveform
  • FFT magnitude spectrum
  • 3D waterfall plot
  • lissajous! (interchannel correlation)
  • rotatable and scalable display
  • freeze frame! (for didactic purposes)
  • real-time spectral feature extraction (centroid, rms, flux, rolloff)
  • available on MacOS X, Linux, and Windows under GPL
  • part of the sndtools distribution.


기침소리의 스펙트럼 OTL

openGL 기반이고, 각종 플랫폼에서 사용이 가능한(Win/Linux/Mac) 버전이다.
sndpeek는 rt_lnc의 부분이라고 한다. 아무튼, 이 녀석을 분석하면 내가 원하는 실시간 변형도 가능하려나...

README 내용중 발췌

supported platforms:
  - MacOS X (CoreAudio)
  - Linux (ALSA/OSS/Jack)
  - Windows/Cygwin (DirectSound)

그리고 라이센스는 GPL 이다.(홈페이지에는 기재가 되어있지 않고, 소스 파일에 GPL 문서가 들어있다.)

[공식 : http://soundlab.cs.princeton.edu/software/sndpeek/]
[상위 : http://soundlab.cs.princeton.edu/software/rt_lpc/]

'모종의 음모 > noise cancelling' 카테고리의 다른 글

waveInOpen() waveOutOpen()  (0) 2009.03.26
sampling rate 관련 의문  (2) 2009.03.26
wav format 관련 문서  (0) 2009.03.26
MCI Reference  (2) 2009.03.19
소음제거 프로그램  (0) 2009.03.16
Posted by 구차니
Windows Multimedia
MCI Reference

This section lists the MCI functions, structures, messages, macros, commands, and command strings, which are documented under Multimedia Reference. These elements are grouped as follows.


Notifications

MM_MCINOTIFY
MM_MCISIGNAL


Retrieving Information

mciGetCreatorTask
mciGetDeviceID
mciGetDeviceIDFromElementID
mciGetErrorString


Sending Commands

mciExecute
mciSendCommand

mciSendString


Time Formats

MCI_HMS_HOUR
MCI_HMS_MINUTE
MCI_HMS_SECOND
MCI_MAKE_HMS
MCI_MAKE_MSF
MCI_MAKE_TMSF
MCI_MSF_FRAME
MCI_MSF_MINUTE
MCI_MSF_SECOND
MCI_TMSF_FRAME
MCI_TMSF_MINUTE
MCI_TMSF_SECOND
MCI_TMSF_TRACK


Yield Procedures

mciGetYieldProc
mciSetYieldProc


Configuring a Device

break
configure
escape
index
MCI_BREAK
MCI_BREAK_PARMS
MCI_CONFIGURE
MCI_DGV_SET_PARMS
MCI_DGV_SETAUDIO_PARMS
MCI_DGV_SETVIDEO_PARMS
MCI_ESCAPE
MCI_INDEX
MCI_SEQ_SET_PARMS
MCI_SET
MCI_SET_PARMS
MCI_SETAUDIO
MCI_SETTIMECODE
MCI_SETTUNER
MCI_SETVIDEO
MCI_SPIN
MCI_VCR_SET_PARMS
MCI_VCR_SETAUDIO_PARMS
MCI_VCR_SETTUNER_PARMS
MCI_VCR_SETVIDEO_PARMS
MCI_VD_ESCAPE_PARMS
MCI_WAVE_SET_PARMS
set
setaudio
settimecode
settuner
setvideo
spin


Controlling Playback

freeze
load
MCI_DGV_FREEZE_PARMS
MCI_DGV_LOAD_PARMS
MCI_DGV_PAUSE_PARMS
MCI_DGV_PLAY_PARMS
MCI_DGV_RESUME_PARMS
MCI_DGV_STOP_PARMS
MCI_FREEZE
MCI_LOAD
MCI_LOAD_PARMS
MCI_OVLY_LOAD_PARMS
MCI_PAUSE
MCI_PLAY
MCI_PLAY_PARMS
MCI_RESUME
MCI_STOP
MCI_UNFREEZE
MCI_VCR_PLAY_PARMS
MCI_VD_PLAY_PARMS
pause
play
resume
stop
unfreeze


Controlling the Position

cue
mark
MCI_CUE
MCI_DGV_CUE_PARMS
MCI_DGV_SIGNAL_PARMS
MCI_DGV_STEP_PARMS
MCI_MARK
MCI_SEEK
MCI_SEEK_PARMS
MCI_SIGNAL
MCI_STEP
MCI_VCR_CUE_PARMS
MCI_VCR_SEEK_PARMS
MCI_VCR_STEP_PARMS
MCI_VD_STEP_PARMS
seek
signal
step


Editing

copy
cut
delete
MCI_COPY
MCI_CUT
MCI_DELETE
MCI_DGV_COPY_PARMS
MCI_DGV_CUT_PARMS
MCI_DGV_DELETE_PARMS
MCI_DGV_PASTE_PARMS
MCI_PASTE
MCI_UNDO
MCI_WAVE_DELETE_PARMS
paste
undo


Miscellaneous

MCI_GENERIC_PARMS


Opening and Closing

close
MCI_CLOSE
MCI_DGV_OPEN_PARMS
MCI_OPEN
MCI_OPEN_PARMS
MCI_OVLY_OPEN_PARMS
MCI_WAVE_OPEN_PARMS
open


Realizing a Palette

MCI_REALIZE
realize


Repainting a Frame

MCI_DGV_UPDATE_PARMS
MCI_UPDATE
update


Retrieving Information

capability
info
list
MCI_DGV_INFO_PARMS
MCI_DGV_LIST_PARMS
MCI_DGV_STATUS_PARMS
MCI_GETDEVCAPS
MCI_GETDEVCAPS_PARMS
MCI_INFO
MCI_INFO_PARMS
MCI_LIST
MCI_STATUS
MCI_STATUS_PARMS
MCI_SYSINFO
MCI_SYSINFO_PARMS
MCI_VCR_LIST_PARMS
MCI_VCR_STATUS_PARMS
status
sysinfo


Saving

MCI_DGV_RECORD_PARMS
MCI_DGV_SAVE_PARMS
MCI_OVLY_SAVE_PARMS
MCI_RECORD
MCI_RECORD_PARMS
MCI_SAVE
MCI_SAVE_PARMS
MCI_VCR_RECORD_PARMS
record
save


Video Control

capture
MCI_CAPTURE
MCI_DGV_MONITOR_PARMS
MCI_DGV_QUALITY_PARMS
MCI_DGV_RESERVE_PARMS
MCI_DGV_RESTORE_PARMS
MCI_MONITOR
MCI_QUALITY
MCI_RESERVE
MCI_RESTORE
monitor
quality
reserve
restore


Window or Display Rectangles

MCI_DGV_PUT_PARMS
MCI_DGV_RECT_PARMS
MCI_DGV_WINDOW_PARMS
MCI_OVLY_RECT_PARMS
MCI_OVLY_WINDOW_PARMS
MCI_PUT
MCI_WHERE
MCI_WINDOW
put
where
window



[링크 : http://msdn.microsoft.com/en-us/library/ms710984(VS.85).aspx]

'모종의 음모 > noise cancelling' 카테고리의 다른 글

waveInOpen() waveOutOpen()  (0) 2009.03.26
sampling rate 관련 의문  (2) 2009.03.26
wav format 관련 문서  (0) 2009.03.26
openGL audio spectrum visualization - sndpeek  (0) 2009.03.19
소음제거 프로그램  (0) 2009.03.16
Posted by 구차니
가끔 드는 생각이지만...
네이밍센스가 없는건지, 아니면 원리원칙을 따르고 싶은건지 모르겠지만..

타이어는 언어들 사이에 들어 있지 않다면 어떻게 보일까?

Languae of Thai ?
아니면
Tire ?

태국이나 타이랜드(Thailand)는 같은 지명이지만, 언젠가 부터 태국보다는 타이로 불리게 되었는데
언어에 있어서는 '타이어'라고 하면 미쉐린만 떠오른다.


아무튼, 위키피디아 마저도 태국어 에서 타이어로 넘겨준다.
딱히 틀린건 아니지만... 그래도 먼가 찜찜한 느낌인디...

[타이어 : http://ko.wikipedia.org/wiki/%ED%83%80%EC%9D%B4%EC%96%B4_(%EC%96%B8%EC%96%B4)]

'모종의 음모 > UFO:AI 한글화' 카테고리의 다른 글

UFO:AI 2.3 용 po 파일  (2) 2009.11.27
UFO:AI 2.3 개발자 버전  (0) 2009.11.27
번역의 어려움  (5) 2009.03.12
보안등급 - Clearance  (0) 2009.03.09
UFO:AI 한글화 작업은 빡시군요..  (6) 2009.03.07
Posted by 구차니

GB2312 에서 ASCII 로 0~127 까지는 영역은 1byte 그대로 사용한다

즉, gb2312로 인코딩 되었을 경우에는

 

 0~127  ASCII (1byte)
 A0 <=  간체 (2byte)

 

로 해석을 하면 되며, GB2312 폰트 테이블 상의 ASCII 0~127 영역은 


 R.C.   GB   Uni. UTF-8 

0301 ! A3A1 FF01 EFBC81 부터 시작을 한다.

 

하지만 우리가 사용하는 영역은 0x0021 부터 시작하므로 0~127 문자열은 gb2312로 변환후 유니코드 값에

단순산술로  0xFFE0 를 더해 주어야 한다.

 

잡솔 : 아래 Row 03에 대해서 문자를 선택해보면 !의 경우에 크기가 다름을 알 수 있다.

! <- ASCII

<- GB2312



'모종의 음모 > GB2312(중국어 간체)' 카테고리의 다른 글

GB2312, Unicode  (0) 2009.03.16
GB2312 to Unicode mapping table  (0) 2009.03.16
Posted by 구차니

GB2312 Character Set

GB: An abbreviation of Guojia Biaozhun, or Guo Biao, meaning "national standard" in Chinese.

GB2312: A coded character set established by the government of People's Republic of China in 1980.

Main features of GB2312:

  • It contains 7445 characters, including 6763 Hanzi and 682 non-Hanzi characters.
  • It is for simplified Chinese characters only. The traditional Chinese characters are included in Big5 character set.
  • It is used mainly in mainland China and Singapore.

GB2312 arranges characters into a matrix of 94 rows and 94 columns based on the following rules:

        # of 
Rows Chars Characters 01 94 Special symbols
02 72 Paragraph numbers 03 94 Latin characters 04 83 Hiragana characters 05 86 Katakana characters 06 48 Greek characters 07 66 Cyrillic characters 08 63 Pinyin accented vowels and zhuyin symbols 09 76 Box and table drawing symbols 16-55 3755 Hanzi level 1, ordered by pinyin 56-87 3008 Hanzi level 2, ordered by radical, then stroke


This book provides you a list of all characters in GB2312 and thier row numbers and columns.  

Row 01: Regular Symbols

R.C.    GB   Uni. UTF-8   R.C.   GB   Uni. UTF-8 

0101   A1A1 3000 E38080 0102 、 A1A2 3001 E38081 0103 。 A1A3 3002 E38082 0104 · A1A4 00B7 C2B7 0105 ˉ A1A5 02C9 CB89 0106 ˇ A1A6 02C7 CB87 0107 ¨ A1A7 00A8 C2A8 0108 〃 A1A8 3003 E38083 0109 々 A1A9 3005 E38085 0110 — A1AA 2014 E28094

[출처 : http://www.herongyang.com/gb2312/symbol.html]


GB2312 Codes

GB2312 assigns a 2-byte native code for each character.

The first byte is called the high byte, containing the row number plus 32;

the second byte is called the low byte, containing the column number plus 32.

For example, if a character is located at row 16 and column 1, its high byte will be 16 + 32 = 48 (0x30), and log byte will be 1 + 32 = 33 (0x21). Put them together, its native code will be 0x3021.

I guess the reason to add 32 on both row number and column is for the byte value to not fall into the low value range, which is usually reserved to represents controlling commands in many computer systems.

However, the byte values of GB2312 native codes are still colliding with ASCII codes. To resolve this problem, a value of 128 is added to both bytes of the native codes. For example, if a character is located at row 16 and column 1, its native code will be 0x3021, and its modified code will be 0xB0A1. (-> 128 = 0x80, 0x3021 + 0x8080 = 0xB0A1)

These modified codes are adopted as the GB2312 standard codes, which can be safely mixed together with ASCII codes.

This book provides you a list all GB2312 characters and their codes.

:: Row/Col 에서 GB2312로 변환하는 방법임, ASCII에서 GB2312 가 아님!

 

GB2312 vs. Unicode

GB2312 character set is sub set of Unicode character set. This means that every character defined in GB2312 is also defined in Unicode. However, GB2312 codes and Unicode codes are totally un-related. For example, GB2312 character with code value of 0xB0A1 has a Unicode code value of 0x554A. There no mathematical formula to convert a GB2312 code to a Unicode code of the same character. This book provides you a complet map of all GB2312 codes and thier corresponding Unicode codes. The corresponding UTF-8 (Unicode Transformation Format - 8-bit) are also listed in the map.

 

[출처 : http://www.herongyang.com/gb2312/overview.html]

'모종의 음모 > GB2312(중국어 간체)' 카테고리의 다른 글

GB2312 ASCII 부분 처리  (0) 2009.03.16
GB2312 to Unicode mapping table  (0) 2009.03.16
Posted by 구차니