Board index » delphi » Algorythms

Algorythms


2005-09-26 10:49:50 PM
delphi139
Anything can be done once you know the scheme, right. Well, I was wondering
if anyone can point me to such things. I need a link (or more) to some
"algorythm repositories" to help with what I am doing. Basically I am looking
for theoretical guidelines for fast text (rtf, html, xml, ...) parsing and
some other similar stuff. I have been doing everything with Pos, PosEx, Copy
and Delete, but I am looking for even more efficient ways of parsing text,
something well thought through and full-proof.
Also I'd like to see a link to some article, tutorial, anything that
could put me on the right track for writing an active document editor -
something like the RichEdit control or MSHTML (mixed images, controls and
text). I thought of making a component for every f.ex. html element and then
just putting them together (side by side) in the final control, but there's
a problem because of multiline text and images mixed together. So an idea on
doing that can help me alot, I am all out of ideas and have no source for
reference.
Thanks in advance
Marko
 
 

Re:Algorythms

Marko Binic writes:
Quote
but I am looking for even more efficient ways of parsing text
As always you need to carefully benchmark what you're doing before
deciding that it needs to be more efficient. it is quite possible that
the algorithm you're using is just as good or better than something
else out there.
--
-Mike (TeamB)
 

Re:Algorythms

Quote
As always you need to carefully benchmark what you're doing before
deciding that it needs to be more efficient. it is quite possible that
the algorithm you're using is just as good or better than something
else out there.
Maybe, but I know there are lots of algorithms for f.ex. searching through
text and stuff. I know how to do lots of things with Delphi, but it seems
it's never good enough for me (either slow or sloppy code, so much Pos() and
PosEx() that I just hate myself when I look at it a week later). So, I was
searching for something time-tested and already used, so I could compare it
to my own routines and maybe combine some things.
BTW Do you happen to know which is the best way (fast, memory efficient) to
store data: dynamic arrays, collections or TStrinList. The data I want to
store is usually numbers or text, not objects.
Also, which is the best method of comparing strings, I know IF-s are slower
than CASE, but I don't know how to use CASE with strings, I suppose
typecasting with cardinal(string) isn't the best way :lol:
Thanks
Marko
 

Re:Algorythms

Marko Binic writes:
Quote
BTW Do you happen to know which is the best way (fast, memory
efficient) to store data: dynamic arrays, collections or TStrinList.
The data I want to store is usually numbers or text, not objects.
Also, which is the best method of comparing strings, I know IF-s are
slower than CASE, but I don't know how to use CASE with strings, I
suppose typecasting with cardinal(string) isn't the best way :lol:
This will depend on how much data you have. If it will easily fit in
memory then I like a tList or tStringList.
As far as comparing strings just using a string expression is the
easiest way. You can also use AnsiCompareStr to be sure that it works
in other locales.
Once again, I wouldn't worry too much about performance until you've
run a benchmark.
--
-Mike (TeamB)
 

Re:Algorythms

Have a look at:
FastStrings Library
www.droopyeyes.com/default.asp
ADQStrings
delphi.icm.edu.pl/ftp/d40free/ADQStrings.zip
These links are string manipulation libraries, mainly in assembly and should
improve a lot what you are doing.
I hope they can help you.
Jomar
 

Re:Algorythms

Quote
This will depend on how much data you have. If it will easily fit in
memory then I like a tList or tStringList.
Well, while parsing an HTML page one can get well over a 1000 attribute
values, either as numbers or strings. My computer's memory (1GB) can
certainly handle it, but what about ancient computers, like P2 with 32-64MB
(yes, some people here still use those) ?
Quote
As far as comparing strings just using a string expression is the
easiest way. You can also use AnsiCompareStr to be sure that it works
in other locales.
Yeah, that is a way, but not an efficient one, most of the time I have to
check multiple values, so the code can get really unclean. A case statement
can make it alot more readable, PHP allows using CASE for strings, while
Delphi just says "ordinal type required" :(
BTW can AnsiCompare/Replace/Contains/.../Text be used for unicode strings?
Quote
Once again, I wouldn't worry too much about performance until you've
run a benchmark.
Yes, but benchmarking is useless if you don't have other results to compare
with...
Marko
 

Re:Algorythms

Thanks Jomar, I just downloaded them. I can take a closer look now :)
Marko
 

Re:Algorythms

Sorry, I couldn't resist...
Quote
Yeah, that is a way, but not an efficient one, most of the time I have to
check multiple values, so the code can get really unclean. A case statement
can make it alot more readable, PHP allows using CASE for strings, while
Delphi just says "ordinal type required" :(
PHP doesn't really have strings or integers. Every variable can take any
type. While this has some advantages, I am really glad Delphi isn't PHP.
If you have a lot of strings to compare, don't use if / then /else at
all. Use a hash table and store an object for each string, calling a
virtual method defined there. However comparing strings (either with the
= operator or AnsiCompareString) isn't that slow, because it returns
false as soon as the first character differing is found, which quite
often is the first one.
Jens
--
Jens Gruschel
www.pegtop.net
 

Re:Algorythms

Quote
Sorry, I couldn't resist...
Ah Jens, you know your input is always more than welcome and more than
useful :)))
Quote
PHP doesn't really have strings or integers. Every variable can take any
type. While this has some advantages, I am really glad Delphi isn't PHP.
Yeah, me too ;) But a thing like that could really be useful...
Quote
If you have a lot of strings to compare, don't use if / then /else at all.
Use a hash table and store an object for each string, calling a virtual
method defined there. However comparing strings (either with the =
operator or AnsiCompareString) isn't that slow, because it returns false
as soon as the first character differing is found, which quite often is
the first one.
Hash table??? I wouldn't want to sound stupid, but I have no idea what that
is, could you please explain or provide a link :(
Well comparing isn't the biggest issue here, here's a real example:
In a DTD I have to parse something like:
id ID #IMPLIED
class CDATA #IMPLIED
and I do it like:
s := DTD.Text;
i := 1;
while s[i] in [' ', #13, #10] do
Inc(i);
p := PosEx(s, ' ', i);
attname := Copy(s, i, p-i);
...
So, by using a similar parsing function I do it OK, but when you have to
parse more than 1000 lines of such strings you start worrying about the time
waste, especially on slow machines. I thought I would find something about fast
text parsing which would give me an idea on doing it more efficiently
(regarding both speed and code readability, cause after some time I won't
know what I was trying to do with it).
Thanks
Marko
 

Re:Algorythms

Quote
Hash table??? I wouldn't want to sound stupid, but I have no idea what that
is, could you please explain or provide a link :(
Google should return plenty of links. The idea is to create a hash key
for your string and use this key to store and retrieve the item directly
(a hash key is a number you calculate for a given string or some other
data using the single characters). I have written such a class, too, you
can find it in my component collection (PegtopHashes.pas, do you have
the link?), however it might not be the best implementation. Hash tables
are even faster than sorted lists / binary search, but I don't think
it's worth it if you only have 2 strings like in your example. But if
you have many keywords you can use a hash table to store some
information for each keyword and retrieve this information very fast
each time you find such a keyword. Knowing about hash tables isn't wrong
(too bad there isn't a hash table included to the VCL except IIRC as a
helper class for ini files, but there are more than enough 3rd party
Delphi implementations around).
Quote
In a DTD I have to parse something like:

id ID #IMPLIED
class CDATA #IMPLIED

and I do it like:

s := DTD.Text;
i := 1;
while s[i] in [' ', #13, #10] do
Inc(i);
p := PosEx(s, ' ', i);
attname := Copy(s, i, p-i);
...
I don't think that is too bad, although I sometimes prefer good old state
machines (for example for a simple XML parser I wrote).
Jens
--
Jens Gruschel
www.pegtop.net
 

Re:Algorythms

"Marko Binic" <XXXX@XXXXX.COM>writes
Quote

Hash table??? I wouldn't want to sound stupid, but I have no idea what
that is, could you please explain or provide a link :(
Hi Marko,
There's a hash table class in the Turbopower SysTools library on Sourceforge
(sourceforge.net/projects/tpsystools/). I have never used it, so I
don't know how it compares to others in terms of code quality and
performance, but it might be worth looking at as a place to start.
HTH,
Van Swofford
Tybee Jet Corp.
 

Re:Algorythms

Quote
I've written such a class, too
I just read en.wikipedia.org/wiki/Hash_table, which is quite
interesting, and noticed how much better my implementation can be made
(starting which the hash function, which is pretty slow compared with
other ones).
Jens
--
Jens Gruschel
www.pegtop.net
 

Re:Algorythms

Seems like need a parser/tokenizer.
If you have a predefined set of tokens, you can create a hash function that
returns a unique integer hash for each token, so that you can use it in a
case statement. Synedit highlighters use such method.
If you have arbitrary set of tokens, you build a hash table - every added
string gets its hash value and ID, so when you search the ID for a string
value, it won't search sequentially entire table, but a small subset using
calculated hash. An example is in inifiles.pas, but increase the bucket size
as needed.
"Marko Binic" <XXXX@XXXXX.COM>writes
Quote
>Sorry, I couldn't resist...

Ah Jens, you know your input is always more than welcome and more than
useful :)))

>PHP doesn't really have strings or integers. Every variable can take any
>type. While this has some advantages, I am really glad Delphi isn't PHP.

Yeah, me too ;) But a thing like that could really be useful...

>If you have a lot of strings to compare, don't use if / then /else at
>all. Use a hash table and store an object for each string, calling a
>virtual method defined there. However comparing strings (either with the
>= operator or AnsiCompareString) isn't that slow, because it returns
>false as soon as the first character differing is found, which quite
>often is the first one.

Hash table??? I wouldn't want to sound stupid, but I have no idea what
that is, could you please explain or provide a link :(

Well comparing isn't the biggest issue here, here's a real example:

In a DTD I have to parse something like:

id ID #IMPLIED
class CDATA #IMPLIED

and I do it like:

s := DTD.Text;
i := 1;
while s[i] in [' ', #13, #10] do
Inc(i);
p := PosEx(s, ' ', i);
attname := Copy(s, i, p-i);
...

So, by using a similar parsing function I do it OK, but when you have to
parse more than 1000 lines of such strings you start worrying about the
time waste, especially on slow machines. I thought I would find something
about fast text parsing which would give me an idea on doing it more
efficiently (regarding both speed and code readability, cause after some
time I won't know what I was trying to do with it).

Thanks

Marko

 

Re:Algorythms

OK, thanks for clearing that up. I guess I will have some reading to do :)
Why is when you know how to do something there's always a better but way
harder way to do it? I thought Delphi wrapped all of those things and made
them easy, I mean - look at how easy COM is with Delphi :D
Thank you all!
Marko
 

Re:Algorythms

Quote
Why is when you know how to do something there's always a better but way
harder way to do it?
I don't think that is true. But for simple things you don't ask in this
newsgroup, do you?
Quote
I thought Delphi wrapped all of those things and made
them easy, I mean - look at how easy COM is with Delphi :D
Actually I didn't use COM much. Too easy ;-) Much harder to answer is
why Delphi seems to be the only serious language shipping without a hash
table.
Jens
--
Jens Gruschel
www.pegtop.net