Board index » jbuilder » convert utf-8 to unicode

convert utf-8 to unicode


2004-02-12 12:00:06 AM
jbuilder13
When I connecting a java client to a .net server via a socket I do not seem
to get the coding right. The Swedish letter εδ?appears as question marks
(?) when they reach my java client. It does not mater if I send the string
as unicode, ascii or utf-8 from the .net server.
Does any one have a solution for this problem?
/Tony
 
 

Re:convert utf-8 to unicode

"ombak" < XXXX@XXXXX.COM >wrote in message
Quote
When I connecting a java client to a .net server via a socket I do not
seem
to get the coding right. The Swedish letter εδ?appears as question marks
(?) when they reach my java client. It does not mater if I send the string
as unicode, ascii or utf-8 from the .net server.

Does any one have a solution for this problem?

I feel your pain. I have just been struggling with a similar problem with
Hebrew yet again.
I haven't used sockets yet but the general solution is the same.
First, you are correct in that the socket will always be providing UTF-8
data. That means your swedish letters will start at value 0xC1.
If you know the code page and assuming you have UTF-8 data in a String, you
can convert it in one line.
String codePage = "windows-1252";
String unicodeValue = new String(utfData.getBytes(),codePage);
In my case, I used Cp1255. This works fine on Windows but there is a
problem. My default locale is English USA because otherwise, JBuilder
editing becomes very strange.
When my code is tried on a machine that has Hebrew as the default the
conversion doesn't work and all I get is question marks just like you.
The problem lies in getBytes(). the String class gets this wrong. So, you
need to get the bytes yourself something like.
byte[] stringAsBytes = new byte[value.length()];
for (int u=0;u<value.length();u++)
{
char test = value.charAt(u);
byte low=(byte)( test &0x00FF);
stringAsBytes[u] = low;
}
retValue = new String(stringAsBytes,codePage);
Still with me? Good.
Warning. You may need the international JRE rather than the standard one.
You can find out what charsets are supported with the following code.
Map map = Charset.availableCharsets();
Iterator it = map.keySet().iterator();
while (it.hasNext()) {
// Get charset name
String charsetName = (String)it.next();
// Get charset
Charset charset = Charset.forName(charsetName);
}
If the code page you want is not there, then you will have to do the
conversion yourself.
Now, in your case it should be much easier
Try reading
javaalmanac.com/egs/java.nio.charset/ConvertChar.html
also
private Charset charset = Charset.forName("ISO-8859-1");
private SocketChannel channel;
... create a buffer and the socket and read , then
ByteBuffer buffer = ByteBuffer.allocate(1024);
while ((channel.read(buffer)) != -1)
{
buffer.flip();
System.out.println(charset.decode(buffer));
buffer.clear();
}
HTH
Test on a machine with Swedish as the default and again with English as
default.
ExpatEgghead