Get Managerial Training Here
字符集(charset)和字符编码(character encoding)的区别 |
| Published: September 23, 2007, 9:24 pm |
| Tags: develop, linux, encoding |
|
from this post: http://www.thescripts.com/forum/thread214891.htmlIt's important to distinquish between characters (or charsets) andcharacter encodings. They are two different things. A charset is a mapthat defines which numeric value represents a particular glyph. Acharacter encoding defines how numeric values are serialized into astream of bytes. For example Unicode can be encoded as UTF-8 which whichis space effecient and provides compatibility with the ASCII and ISO-8859-1charsets. Or it could be encoded as UCS4-LE which is not space effientbut it can be easier to do heavy text processing with it.Here's a nice link about programming with extended charsets although itis a little UTF-8/*nix centric:http://www.cl.cam.ac.uk/~mgk25/unicode.html [ Full article ] |
|
|
No Comments...