Board index » cppbuilder » Swedish characters

Swedish characters


2003-10-08 10:33:32 AM
cppbuilder62
How can I read and print Swedish characters like ? ?and ?
correctly?
I read from a text file, and when I compile with gcc it works
all right, but when I try with bcc32 it doesn't...
 
 

Re:Swedish characters

"Taras Kentrschynskyj" < XXXX@XXXXX.COM >writes:
Quote
How can I read and print Swedish characters like ? ?and ?
correctly?
I read from a text file, and when I compile with gcc it works
all right, but when I try with bcc32 it doesn't...
The first thing I'd try to find out if the difference occurs on the reading
or the writing part.
 

Re:Swedish characters

Here are some tests:
*****************************
string s1 = "?, s2;
cin>>s2; // input: ?
cout << s1 << s2 << endl;
output (gcc):
output (bcc):
*******************************
char fn[10] = "char.txt";
ifstream in;
ofstream out(fn);
string s2;
cin>>s2; // input: ?
out << s2;
out.close();
in.open(fn);
in>>s2;
cout << s2 << endl;
output (gcc): ? file content: ?
output (bcc): ? file content: ?
*****************************
char fn[10] = "char.txt"; // file content: ?
ifstream in(fn);
string s2;
in>>s2;
cout << s2 << endl;
output (gcc): ?
output (bcc): ?
 

{smallsort}

Re:Swedish characters

"Taras Kentrschynskyj" < XXXX@XXXXX.COM >writes:
Quote
string s1 = "?, s2;
cin>>s2; // input: ?
cout << s1 << s2 << endl;

output (gcc):
output (bcc):
So the string read from standard input is written correctly to standard
output. It seems that reading and writing works correctly.
What seems not to work correctly is the treatment of string literals in
the program (such as s1).
The next thing I'd look at are how these characters are encoded. I.e. I'd
convert each char to int and write the resulting value to standard output.
 

Re:Swedish characters

Here's the result:
cin>>s; // input: ?
cout << s[0] << s[1] << s[2] << endl;
cout << (int)s[0] << (int)s[1] << (int)s[2] << endl;
cout << '? << '? << '? << endl;
cout << (int)'? << (int)'? << (int)'? << endl;
output(gcc):
?
-27-28-10
?
-27-28-10
output(bcc):
?
-122-124-108
?
-27-28-10
 

Re:Swedish characters

"Taras Kentrschynskyj" < XXXX@XXXXX.COM >writes:
Quote
Here's the result:

cin>>s; // input: ?
cout << s[0] << s[1] << s[2] << endl;
cout << (int)s[0] << (int)s[1] << (int)s[2] << endl;
cout << '? << '? << '? << endl;
cout << (int)'? << (int)'? << (int)'? << endl;


output(gcc):
?
-27-28-10
?
-27-28-10
The Latin-1?codes of the three characters are 229, 228 and 246 respectively.
When these codes are converted to (signed) 8 bit char, the result is
-27 -28 -10. Programs created by your gcc installation seem to encode
characters read from standard input in that encoding; your text editor seems
to use this encoding as well.
Quote
output(bcc):
?
-122-124-108
?
-27-28-10
The Codepage 850?codes of the three characters are 132, 134 and 148
respectively. When these codes are converted to (signed) 8 bit char, the
result is -122 -124 -108.
The codes 229, 228 and 246 encode the characters ? ?and ?respectively
in the Codepage 850. When these codes are converted to (signed) 8 bit char,
the result is -27 -28 -10.
You are working with three tools:
- gcc
- bcc
- a text editor
The text editor and the program generated by gcc seem to use Latin-1, while
the program generated by bcc uses Codepage 850. Since the character literals
are Latin-1 encoded, they aren't correctly treated.
As an easy test for this reasoning, you could edit the source file using a
hex editor. Change the byte representing the character literal ?from
0xE4 to 0x84. The program compiled from this modified source should then
write the character literal ?as ?
?www.utoronto.ca/webdocs/HTMLdocs/NewHTML/iso_table.html
?www.kostis.net/charsets/cp850.htm
 

Re:Swedish characters

XXXX@XXXXX.COM (Thomas Maeder [TeamB]) wrote:
Quote
The text editor and the program generated by gcc seem to use Latin-1, while
the program generated by bcc uses Codepage 850. Since the character literals
are Latin-1 encoded, they aren't correctly treated.
OK, thank you. But can you force bcc to use another character set than cp850? I've tried the setlocale function, without success, though...
 

Re:Swedish characters

"Taras Kentrschynskyj" < XXXX@XXXXX.COM >writes:
Quote
XXXX@XXXXX.COM (Thomas Maeder [TeamB]) wrote:
>The text editor and the program generated by gcc seem to use Latin-1, while
>the program generated by bcc uses Codepage 850. Since the character literals
>are Latin-1 encoded, they aren't correctly treated.

OK, thank you. But can you force bcc to use another character set than
cp850? I've tried the setlocale function, without success, though...
AFAIK, the locale related functionality of Standard C and Standard C++ only
affect how characters read from files are treated. Since the program generated
by Borland C++ treats these correctly, I don't think that changing the locale
settings will help you.
I don't know if you can change the encoding Borland C++ assumes character
literals to have. Maybe somebody else can help here?