Board index » delphi » Re: Base64 conversion for unicode

Re: Base64 conversion for unicode


2005-07-11 04:52:00 PM
delphi217
Surely a widestring is just a stream of bytes to be encoded, and then
decoded back to a widestring?
/Matthew Jones/
 
 

Re: Base64 conversion for unicode

Why would it fail? it is supposed to be a byte stream. I am being awkward as
I've not been aware of Unicode issues in email (where I have mainly met
base64). I just can not see why a unicode string is any different to an ANSI
string or a binary file.
/Matthew Jones/
 

Re: Base64 conversion for unicode

The decoding of a unicode widestring (to base64) differs from the encoding
of ASCII string - if you take ASCII encoded string and try to decode it as
unicode, you will fail.
The encoding/decoding is not very complicated, I just don't want to spend
time doing it, so I am looking for a solution someone else made.
"Matthew Jones" <XXXX@XXXXX.COM>writes
Quote
Surely a widestring is just a stream of bytes to be encoded, and then
decoded back to a widestring?

/Matthew Jones/
 

Re: Base64 conversion for unicode

On Mon, 11 Jul 2005 17:01:12 +0200, "Adi A" <XXXX@XXXXX.COM>
writes:
Quote
The decoding of a unicode widestring (to base64) differs from the encoding
of ASCII string - if you take ASCII encoded string and try to decode it as
unicode, you will fail.
But this has nothing to do with the base64 encoding. Base64 encoding
encodes a group of 3 BYTES into a group of 4 characters. Thus there is
no special version for an Unicode widestring.
Regards from Germany
Franz-Leo
 

Re: Base64 conversion for unicode

Since I don't have an example of unicode base64 in Delphi, I can only supply
it for C#:
Convert.ToBase64String(Encoding.ASCII.GetBytes(sourcestring))
and
Convert.ToBase64String(Encoding.UNICODE.GetBytes(sourcestring))
Those two lines produce completly different base64 results, both in content
and in lengh (the unicode is approx. twice as long)
My guess is that the difference is in the bytes used: "Hello" in ASCII will
use 5 bytes (therefor 8 characters in base64), while in unicode it will
require 10 bytes (therefor 16 characters in base64).
"Franz-Leo Chomse" <XXXX@XXXXX.COM>writes
Quote
On Mon, 11 Jul 2005 17:01:12 +0200, "Adi A" <XXXX@XXXXX.COM>
writes:

>The decoding of a unicode widestring (to base64) differs from the
encoding
>of ASCII string - if you take ASCII encoded string and try to decode it
as
>unicode, you will fail.

But this has nothing to do with the base64 encoding. Base64 encoding
encodes a group of 3 BYTES into a group of 4 characters. Thus there is
no special version for an Unicode widestring.

Regards from Germany

Franz-Leo

 

Re: Base64 conversion for unicode

See my reply to Franz-Leo Chomse in this thread.
"Matthew Jones" <XXXX@XXXXX.COM>writes
Quote
Why would it fail? it is supposed to be a byte stream. I am being awkward as
I've not been aware of Unicode issues in email (where I have mainly met
base64). I just can not see why a unicode string is any different to an ANSI
string or a binary file.

/Matthew Jones/
 

Re: Base64 conversion for unicode

Matthew Jones writes:
Quote
Why would it fail? it is supposed to be a byte stream. I am being
awkward as I have not been aware of Unicode issues in email (where I've
mainly met base64). I just can not see why a unicode string is any
different to an ANSI string or a binary file.

/Matthew Jones/
I think it would depend on the function declaration. AFAIK, Base64 is
an encoding method, to be able to use binary data in text oriented
areas. But I can imagine a Base64Encode function which takes a string
for the input buffer, because Delphi strings handle binary data quite
well.
If the declaration is something like:
function Base64Encode( const input : string ) : string;
it will go wrong with WideString, because it will be converted to
ansistring, before passed to the function.
However, if the declaration is something like:
function Base64Encode( const input; inputsize : Integer ) string;
and you call it with MyOutput := Base64Encode( MyWideString[1], Length(
MyWideString ) * 2 ); it should work.
Diederik
BTW, is this for some kind of predefined communication? If not, why not
use UTF8, instead of Base64? it is much more efficient, and most of the
time human readable.
BTW2, if the encode function uses a string for input, you can solve it
by calling it this way (untested)
function MyWideBase64Encode( const input : WideString ) : string;
var buffer : string;
begin
SetLength( buffer, Length( input ) * 2 );
Move( input[1], buffer[1], Length( input ) * 2 );
result := Base64Encode( buffer );
end;
 

Re: Base64 conversion for unicode

Adi A writes:
Quote
Since I don't have an example of unicode base64 in Delphi, I can only supply
it for C#:

Convert.ToBase64String(Encoding.ASCII.GetBytes(sourcestring))
and
Convert.ToBase64String(Encoding.UNICODE.GetBytes(sourcestring))

Those two lines produce completly different base64 results, both in content
and in lengh (the unicode is approx. twice as long)
My guess is that the difference is in the bytes used: "Hello" in ASCII will
use 5 bytes (therefor 8 characters in base64), while in unicode it will
require 10 bytes (therefor 16 characters in base64).
That's correct, and it confirms what Franz-Leo Chomse wrote.
What you have to do is to *move* the raw contents of the WideString onto
an AnsiString.
SetLength(lStr,2*Length(lWStr));
Move(Pointer(lWStr)^,Pointer(lStr)^,Length(lStr));
Result := ToBase64String(lStr);
--
Henrick Hellström
www.streamsec.com
 

Re: Base64 conversion for unicode

You nailed the problem, and your function (MyWideBase64Encode) solved it.
Thank you very much
"Diederik van Bochove" <XXXX@XXXXX.COM>writes
Quote
Matthew Jones writes:

>Why would it fail? it is supposed to be a byte stream. I am being
>awkward as I have not been aware of Unicode issues in email (where I've
>mainly met base64). I just can not see why a unicode string is any
>different to an ANSI string or a binary file.
>
>/Matthew Jones/

I think it would depend on the function declaration. AFAIK, Base64 is
an encoding method, to be able to use binary data in text oriented
areas. But I can imagine a Base64Encode function which takes a string
for the input buffer, because Delphi strings handle binary data quite
well.

If the declaration is something like:

function Base64Encode( const input : string ) : string;

it will go wrong with WideString, because it will be converted to
ansistring, before passed to the function.

However, if the declaration is something like:

function Base64Encode( const input; inputsize : Integer ) string;

and you call it with MyOutput := Base64Encode( MyWideString[1], Length(
MyWideString ) * 2 ); it should work.

Diederik

BTW, is this for some kind of predefined communication? If not, why not
use UTF8, instead of Base64? it is much more efficient, and most of the
time human readable.

BTW2, if the encode function uses a string for input, you can solve it
by calling it this way (untested)

function MyWideBase64Encode( const input : WideString ) : string;
var buffer : string;
begin
SetLength( buffer, Length( input ) * 2 );
Move( input[1], buffer[1], Length( input ) * 2 );
result := Base64Encode( buffer );
end;