Board index » delphi » Sorting a file of records

Sorting a file of records

I am working on a database management program, and want to be able to
sort records in the data file on a multi-key sort.  At present, I am
implementing a quicksort algorithm, which can sort 1000 records of 83
bytes in just over a minute.  Unfortunately, the database may run up to a
million records, and the sort time of over an hour is prohibitive.

The application is to run on a base-level 286 (640K RAM), and the sort
needs to be as fast as possible.  Can anyone make any suggestions as to
how I should go about it?

Cheers,

JB

--
-----------------------------------------------------------------------------
John Breen                             jbr...@echidna.cowan.edu.au
Somewhere in Mandurah :)
-----------------------------------------------------------------------------

 

Re:Sorting a file of records


Quote
In article <jbreen.823575...@bluering.cowan.edu.au>, jbr...@echidna.cowan.edu.au (John Breen) writes:
> I am working on a database management program, and want to be able to
> sort records in the data file on a multi-key sort.  At present, I am
> implementing a quicksort algorithm, which can sort 1000 records of 83
> bytes in just over a minute.  Unfortunately, the database may run up to a
> million records, and the sort time of over an hour is prohibitive.

> The application is to run on a base-level 286 (640K RAM), and the sort
> needs to be as fast as possible.  Can anyone make any suggestions as to
> how I should go about it?

> Cheers,

> JB

> --
> -----------------------------------------------------------------------------
> John Breen                             jbr...@echidna.cowan.edu.au
> Somewhere in Mandurah :)
> -----------------------------------------------------------------------------

You will find that even a minute or 2 is tooooo long for most users -
particularily if they are sitting at a terminal waiting for a response.
Therefor, I recomend that you don't sort them at all!!  The best way to get away
with this is to store them in sorted order.

There are a couple ways to do this.
1. Use a database engine for your database manager.  All the database engines
(Oracle, informix, sybase, DBase, Rbase, Paradox, etc...) will allow you to
define a multi-part key, then just retrieve the data sorted by that key.  The DB
engine will feed the records to you in sorted order.

If that is not an option then...
2. Do it like DBase does it and keep a seperate "index" file for your data file.
It has 2 fields in it, the Key (or index) and the record number of the record in
the data file.  Sequentially read thru the Key file and retrieve the record
number called for in the data file.  The only trick to this is creating the key
file and keeping it sorted.  If the data isn't modified very much you can do
this in a batch type of mode.

If that sounds like a bunch of hooie then try this...
3.  Add 2 new fields to the data file - NEXT_RECORD and PREVIOUS_RECORD. and
treat the file has a hugh double linked list.  When adding records to the file,
append the record to the end of the list and then figure out where it belongs in
the list and update your "pointers".  (This I have seen done in a real time
system)

Of course there is also the COTS solution...
4.  Similar to 1 but less expensive.  Get a product like B-Trieve that will
allow you to create indexed files and let the data file keep it in sorted
order.  When you need a record, retrieve it via the key you want it sorted by.

And a wild idea that I don't know if it can happen
5.  Is there a DOS command out there that will sort the data?  Can you call it
from your program?

And another wild one..
6.  Is windows running?? if so could you get Access or Lotus or Excell or ...
etc... to sort if for you?  Course on a 286 that might be pushing it a bit....

Hope these help..
Paul

Re:Sorting a file of records


Quote
jbr...@echidna.cowan.edu.au (John Breen) wrote:
>I am working on a database management program, and want to be able to
>sort records in the data file on a multi-key sort.  At present, I am
>implementing a quicksort algorithm, which can sort 1000 records of 83
>bytes in just over a minute.  Unfortunately, the database may run up to a
>million records, and the sort time of over an hour is prohibitive.

Quicksort is has an optimal average case running time.  BUT, if the data
is already sorted (or pretty near in order) it runs _really_ slow.  Beware
of where you take the comaparison key from.  Take it from the middle of
the portion to be sorted to ensure that sorting an already ordered database
isn't too slow.   I guess you know all this as you've already got QS
implemented.  

If you know the type of data you're sorting on is all of a similar type
(e.g., 7 digit numbers), then a bucket sort (aka radix sort) works extremely
well, with a complexity directly proportional to the list size.  This is
faster as it relies on the data having some kind of similar structure.
If you want the algorithm, mail me, though it's probably easier for both of
us if you find it in a book - saves me the typing and you'll get a better
understanding of what's going on!

Quote
>The application is to run on a base-level 286 (640K RAM), and the sort
>needs to be as fast as possible.  Can anyone make any suggestions as to
>how I should go about it?

Arrrrrrggggh.  286, 640K and a million records to sort in less than 1 hour?
You don't need an effecient sorting algorithm, you need a magician!

Best of luck,

ChrisR:

Re:Sorting a file of records


 JB> I am working on a database management program, and want to be able to
 JB> sort records in the data file on a multi-key sort.  At present, I am
 JB> implementing a quicksort algorithm, which can sort 1000 records of 83
 JB> bytes in just over a minute.  Unfortunately, the database may run up to
 JB> a million records, and the sort time of over an hour is prohibitive.

 JB> The application is to run on a base-level 286 (640K RAM), and the sort
 JB> needs to be as fast as possible.  Can anyone make any suggestions as to
 JB> how I should go about it?

You are probably going to need a disk merge-sorting method.  It
will take a couple of merge phases.  1 million records at 83
bytes each is going to take 83 megabytes.  The merge sort would
take an additional 83 megabytes for work space.  Do you have this
much space on the hard drive?

It can probably be done in less than an hour on the 286 (for 1
million records), if you have the required disk space.  Probably
only a few minutes.

I could wrte it for you, but it would be involved enough that
I'd have to get paid.
-----
For your quick sort - are you having to use the disk?  In
memory, on a 286, Quicksort should take < 1 second to do 1000
records.  (Like you're going it will take about 35 hours to sort
1 million records).

Jud McCranie  jud.mccra...@swsbbs.com
 * Silver Xpress V4.02B03P SW20178

Other Threads