Board index » cppbuilder » MBCS vs UNICODE

MBCS vs UNICODE


2006-10-02 11:17:27 PM
cppbuilder62
Hi,
If i understood well MBCS is also used for eastern languages like
chinese, japanese and so on.... as UNICODE is.
So which one is the best format ?
I see a lot of applications developed with UNICODE.
What are the advantage of UNICODE vs MBCS ? (except 2 bytes characters
coding)
Till now i wrote applications only in SBCS...so ANSI.
thanks a lot,
Alain
 
 

Re:MBCS vs UNICODE

--== Alain ==-- < XXXX@XXXXX.COM >wrote:
Quote
What are the advantage of UNICODE vs MBCS ? (except 2 bytes characters
coding)
With Unicode, a given number of characters takes a predictable amount of
memory.
The usual advantage of Unicode is (simplifying a little) that algorithms
can be nice and simple. Take, for example, something that wants a
substring of the 104th to the 110th character, inclusive. With Unicode,
you just index to character 104, and copy it and the next 6 characters.
With MBCS, you have to go to the start of the string, and process each
character sequence until you get to the 104th character. Only then can
you copy it and the next 6 characters.
Unicode will (usually) take up more memory than MBCS. On the other hand,
as far as disc is concerned, you can convert Unicode to (usually) UTF8
pretty easily as you load and save, so long as you're not loading and
saving records (in which case, that substring problem occurs again).
You don't have to use UTF8, but of the MBCS formats, it seems to me to
make the most sense.
I personally work exclusively with UTF-16 in memory (pretending it's
actually full Unicode - there is a difference, but it's infrequent), and
serialise to disc as UTF-8, which is more compact for Western scripts,
usually readable with ancient tools, and unambiguous (unlike serialising
UTF-16, which has byte order issues).
If you were dealing mostly with CJK (Chinese/Japanese/Korean), then you
might wish to use a different MBCS - UTF-8 is a little Western-centric,
and its conversion algorithm assumes mostly ASCII with a few extended
characters. If you have few ASCII characters, then a different algorithm
can do better - USCS for example.
Alan Bellingham
--
ACCU Conference: 11-14 April 2007 - Paramount Oxford Hotel