Board index » kylix » Re: Borland Classic Products

Re: Borland Classic Products


2005-10-06 01:49:47 AM
kylix2
Marco van de Voort wrote:
Quote
On 2005-10-04, Thomas Miller < XXXX@XXXXX.COM >wrote:


>What about GTK#2? What is it doing?


GTK#2 ? What's that? Do you mean GTK# v2 or GTK number 2? (GTK# is C# .NET
GTK)

Anyway, GTK is an engine under the LCL/VCL/CLX layer, just like GDI or QT
is. (and in the future Carbon). Win32(GDI) specific stuff of course won't work
unless emulated.
Sorry GTK v2. I now it has UNICODE support built into the framework.
What does it use? I would want to stay compatible with it so that what
ever work we do on Delphi would move easily to Lazarus.
I am even getting heavily involved with Chrome and hoping that they will
want to get involved. A common class library across Delphi, Lazarus,
and Chrome would be assume for the OP community.
--
Thomas Miller
Wash DC Delphi SIG Chairperson
Delphi Client/Server Certified Developer
BSS Accounting & Distribution Software
BSS Enterprise Accounting FrameWork
www.bss-software.com
www.cpcug.org/user/delphi/index.html
https://sourceforge.net/projects/uopl/
sourceforge.net/projects/dbexpressplus
 
 

Re:Re: Borland Classic Products

On 2005-10-05, Thomas Miller < XXXX@XXXXX.COM >wrote:
Quote
Marco van de Voort wrote:

Sorry GTK v2. I now it has UNICODE support built into the framework.
What does it use?
Everything on *nix uses UTF8.
 

Re:Re: Borland Classic Products

Quote
I am even getting heavily involved with Chrome and hoping that they will
want to get involved. A common class library across Delphi, Lazarus,
and Chrome would be assume for the OP community.
AFAIK Chrome start string indexing at 0 while Delphi at 1... Sooo... :-)
best regards
Thomas
 

{smallsort}

Re:Re: Borland Classic Products

And? I don't see this as a problem at all. The idea is for all of them
to have a similar interface.
dk_sz wrote:
Quote
>I am even getting heavily involved with Chrome and hoping that they will
>want to get involved. A common class library across Delphi, Lazarus,
>and Chrome would be assume for the OP community.


AFAIK Chrome start string indexing at 0 while Delphi at 1... Sooo... :-)

best regards
Thomas


--
Thomas Miller
Wash DC Delphi SIG Chairperson
Delphi Client/Server Certified Developer
BSS Accounting & Distribution Software
BSS Enterprise Accounting FrameWork
www.bss-software.com
www.cpcug.org/user/delphi/index.html
https://sourceforge.net/projects/uopl/
sourceforge.net/projects/dbexpressplus
 

Re:Re: Borland Classic Products

Florian Klaempfl < XXXX@XXXXX.COM >wrote:
Quote
Why using WideString i.e. UCS-2 encoding? Use UTF-8 and you've basically
to change some string comparisons and search routines. When WideStrings
were invented, UTF-8 wasn't available but I think today, UTF-8 is the
way to go: for a lot languages UTF-8 requires less space than UCS-2. And
if you want to parse widestrings today ,you need also complicated
routines because of the surrogation pairs which weren't present when
widestrings were invented iirc.
Have you ever written any code dealing with UTF-8? It's fine when dealing
with whole strings, but quickly becomes a mess when you have to slice and
dice them.
In UTF-8, a single grapheme (user-visible character) takes between 1 to 6
bytes. That's a *lot* more complicated than dealing with surrogate pairs,
which can be ignored most of the time, and require only one if/else branch
the rest of the time. Surrogate pairs did exist when WideString was
"invented". They just didn't have any characters defined. So at that time,
UCS-2 was indeed the best space/time trade-off.
Today, the way to go would be UTF-32/UCS-4 for in-memory strings (4 bytes
per character, no exceptions), and UTF-8 for storage. Though UTF-32 uses 4
times the amount of memory in the worst case, and only two thirds in the
best (but very rare) case, it is extremely simple and therefore extremely
fast to process. You can't beat ADD EAX, 4 when walking through a string.
For storage, the overhead of "compressing" the UTF-32 into UTF-8 is
generally worthwhile.
Kind regards,
Jan Goyvaerts
--
The world's most powerful grep tool just got better.
www.powergrep.com/
 

Re:Re: Borland Classic Products

Marco van de Voort < XXXX@XXXXX.COM >wrote:
Quote
Everything on *nix uses UTF8.
libc uses UCS-4.
wchar_t is 4 bytes.
Files are typically saved as UTF-8.
Kind regards,
Jan Goyvaerts
--
The world's most powerful grep tool just got better.
www.powergrep.com/
 

Re:Re: Borland Classic Products

Thomas Miller < XXXX@XXXXX.COM >wrote:
Quote
And? I don't see this as a problem at all. The idea is for all of them
to have a similar interface.

>AFAIK Chrome start string indexing at 0 while Delphi at 1.
This is a major issue that will lead to many off-by-one bugs when trying to
code for both platforms.
Suppose you wanted to implement a function that works like the good old
"Copy" function. What does Copy(S, 1, MaxInt) do? In Delphi it will return
the whole string, no matter what.
I've never tried Chrome, so I don't know how nice a language it is, or not.
But from what I've read, there are so many ways in which it is different
from Delphi (Borland Pascal) that any common library would be full of
IFDEFs.
Kind regards,
Jan Goyvaerts
--
The world's most powerful grep tool just got better.
www.powergrep.com/
 

Re:Re: Borland Classic Products

"Rudy Velthuis [TeamB]" < XXXX@XXXXX.COM >wrote:
Quote
>3. Fix all code that expects a character to equal one byte.

3 is by far the biggest challenge, BTW.
Absolutely. Tough it does help a lot if your new code will be using UCS-2
(ignoring surrogates) or UCS-4. While a character is then 2 or 4 bytes,
it's still constant. Going to UTF-8 or UTF-16 (respecting surrogates) is
far more involved.
Kind regards,
Jan Goyvaerts
--
The world's most powerful grep tool just got better.
www.powergrep.com/
 

Re:Re: Borland Classic Products

On 2005-10-13, Jan Goyvaerts < XXXX@XXXXX.COM >wrote:
Quote
per character, no exceptions), and UTF-8 for storage. Though UTF-32 uses 4
times the amount of memory in the worst case, and only two thirds in the
best (but very rare) case, it is extremely simple and therefore extremely
fast to process. You can't beat ADD EAX, 4 when walking through a string.

For storage, the overhead of "compressing" the UTF-32 into UTF-8 is
generally worthwhile.
Hmm, even the literals could be UTF8. In the first copy-on-write (ref count
-1 to 1) they could be converted.
 

Re:Re: Borland Classic Products

On 2005-10-13, Jan Goyvaerts < XXXX@XXXXX.COM >wrote:
Quote
Marco van de Voort < XXXX@XXXXX.COM >wrote:

>Everything on *nix uses UTF8.

libc uses UCS-4.
libintl seems to use UCS-4 yes. I wonder why i got that idea that it was
UTF8. Or has it changed over the last years ?
 

Re:Re: Borland Classic Products

Jan Goyvaerts wrote:
Quote
I've never tried Chrome, so I don't know how nice a language it is, or not.
But from what I've read, there are so many ways in which it is different
from Delphi (Borland Pascal) that any common library would be full of
IFDEFs.
For what it's worth... there's a review of Chrome on DevSource.com
right now.
www.devsource.com/article2/0,1895,1869522,00.asp
 

Re:Re: Borland Classic Products

Jan Goyvaerts wrote:
Quote
Florian Klaempfl < XXXX@XXXXX.COM >wrote:


>Why using WideString i.e. UCS-2 encoding? Use UTF-8 and you've basically
>to change some string comparisons and search routines. When WideStrings
>were invented, UTF-8 wasn't available but I think today, UTF-8 is the
>way to go: for a lot languages UTF-8 requires less space than UCS-2. And
>if you want to parse widestrings today ,you need also complicated
>routines because of the surrogation pairs which weren't present when
>widestrings were invented iirc.


Have you ever written any code dealing with UTF-8? It's fine when dealing
with whole strings, but quickly becomes a mess when you have to slice and
dice them.

In UTF-8, a single grapheme (user-visible character) takes between 1 to 6
bytes. That's a *lot* more complicated than dealing with surrogate pairs,
which can be ignored most of the time, and require only one if/else branch
the rest of the time. Surrogate pairs did exist when WideString was
"invented". They just didn't have any characters defined. So at that time,
UCS-2 was indeed the best space/time trade-off.

Today, the way to go would be UTF-32/UCS-4 for in-memory strings (4 bytes
per character, no exceptions), and UTF-8 for storage. Though UTF-32 uses 4
times the amount of memory in the worst case, and only two thirds in the
best (but very rare) case, it is extremely simple and therefore extremely
fast to process. You can't beat ADD EAX, 4 when walking through a string.

For storage, the overhead of "compressing" the UTF-32 into UTF-8 is
generally worthwhile.

Kind regards,
Jan Goyvaerts

This is why we are stuck in a holding pattern. I don't know any of this
stuff. I am still looking for someone, or a couple of you to help with
this. I love all this discussion, I am learning a ton!!!
--
Thomas Miller
Chrome Portal Project Manager
Wash DC Delphi SIG Chairperson
Delphi Client/Server Certified Developer
BSS Accounting & Distribution Software
BSS Enterprise Accounting FrameWork
www.bss-software.com
www.cpcug.org/user/delphi/index.html
sourceforge.net/projects/chromeportal/
sourceforge.net/projects/uopl/
sourceforge.net/projects/dbexpressplus
 

Re:Re: Borland Classic Products

Quote
>
>Have you ever written any code dealing with UTF-8? It's fine when
>dealing with whole strings, but quickly becomes a mess when you have
>to slice and dice them.
>
>In UTF-8, a single grapheme (user-visible character) takes between 1
>to 6 bytes. That's a *lot* more complicated than dealing with
>surrogate pairs, which can be ignored most of the time, and require
>only one if/else branch the rest of the time. Surrogate pairs did
>exist when WideString was "invented". They just didn't have any
>characters defined. So at that time, UCS-2 was indeed the best
>space/time trade-off.
>
>Today, the way to go would be UTF-32/UCS-4 for in-memory strings (4
>bytes per character, no exceptions), and UTF-8 for storage. Though
>UTF-32 uses 4 times the amount of memory in the worst case, and only
>two thirds in the best (but very rare) case, it is extremely simple
>and therefore extremely fast to process. You can't beat ADD EAX, 4
>when walking through a string.
>
>For storage, the overhead of "compressing" the UTF-32 into UTF-8 is
>generally worthwhile.
>
>Kind regards,
>Jan Goyvaerts
>
Actualy.. UTF-8 takes at most 4 bytes. It's still a PITA to slice tho.