Board index » delphi » Sorting records in a File

Sorting records in a File

Quote
David Koons <cn3...@abaco.coastalnet.com> wrote:
>I've tried several times to write an efficient record sorter/packer
>to no avail...  This is what I'm looking for:

>Say I have a record:
>Type
>MyRec = Record
>  IntInfo  : Integer;
>  ByteInfo : Byte;
>  WordInfo : Word;
>  Deleted  : Boolean;
>End;
>Var
>  MyFile : File of MyRec;
>  MyInfo : MyRec;

>Now, I have a file of say hundreds of records and some are marked for
>deletion by toggling MyInfo.Deleted to true...  I need a routine that
>will purge out the deleted records efficiently...  I can and have done
>it, but not efficiently, I used the bubblesort method but that's too
>slow...

>Some code for sorting the records by MyInfo.IntInfo would be nice too.
>Again, I have done it, but not efficiently...

>Can anyone help??

First, if you want to have speed, you have to read/write big blocks of
data from/to the file. So have to use an internal buffer of, say, 32K
which holds a lot of your records... Btw: If your program works without
any errors, setting the Range-, Stack- and Overflow Checking turned off
will give you another speed boost.

Your 'delete' routine should look something like this:

{ Warning: untested code !!}

CONST BufSize = 8192;    { Number of records in Buffer...}

TYPE MyRec = Record
               IntInfo  : Integer;
               ByteInfo : Byte;
               WordInfo : Word;
               Deleted  : Boolean;
             End;
     FileStr = String [14];

{$I-}  { Set I/O Checking Off in TP }

procedure Delete (FileName:FileStr);   { Use special string-type to gain speed

Quote
}

  VAR F1,F2:File;                      { because of less stack overhead
Quote
}

      LoadBuf,SaveBuf:Array [0..BufSize] of MyRec;
      i,j,Result,Max:Word;
  BEGIN
    Assign (F1,FileName);
    Reset (F1,1);
    Assign (F2,"TEMP.TMP");
    ReWrite (F2,1);
    REPEAT
      BlockRead (F1,LoadBuf,SizeOf(LoadBuf),Result);
      Max:=Result/SizeOf(MyRec);  
      j:=0;
      for i:=0 to Max do
        if NOT (LoadBuf[i].Deleted)
          BEGIN
            SaveBuf[j]:=LoadBuf[i];
            Inc (j);
          END;
    WriteBlock (F2,SaveBuf,j*SizeOf(MyRec));
    UNTIL (Result<SizeOf(LoadBuf));
    Close (F1);
    Close (F2);
END;

Modify this code to your needs... Hope this helps...

Aubert Dupont

 

Re:Sorting records in a File


Quote
>Now, I have a file of say hundreds of records and some are marked for
>deletion by toggling MyInfo.Deleted to true...  I need a routine that
>will purge out the deleted records efficiently...  I can and have done
>it, but not efficiently, I used the bubblesort method but that's too
>slow...

     It seems to me that if all you want to do is purge the records
marked as "delete", one efficient way to do it is to simply copy the
file, only don't write out deleted records.  Something like

   while not eof (infile) do begin
   get (infile);
   if not infile^.deleted
    then begin
     outfile^ := infile^;
     put (outfile) end
   end;

     In this particular case, sorting does not seem to help in any way.
(You, of course, will want to do the proper opening/closing/renaming
of the input and output files).  Note that the above code is Standard
Pascal, so it should work with all compilers and systems.

Bob Schor
Pascal Enthusiast

Re:Sorting records in a File


Quote
>      It seems to me that if all you want to do is purge the records
> marked as "delete", one efficient way to do it is to simply copy the
> file, only don't write out deleted records.  Something like

>    while not eof (infile) do begin
>    get (infile);
>    if not infile^.deleted
>     then begin
>      outfile^ := infile^;
>      put (outfile) end
>    end;

I appreciate the help, but that's pretty much how I've done it in the
past...  I know there's faster and more efficient code out there
though...  IE: moving undeleted records "up" in the file to fill in
deleted spots and trunicating the file...  This would save from having
to make a temporary file and renaming it....

I've seen some messages about it before, but not anything that I
could understand...<G>

        Again, thanks,
                     Dave

Re:Sorting records in a File


Quote
On Sat, 14 Oct 1995, David Koons wrote:
> >> I've tried several times to write an efficient record sorter/packer
> >> to no avail...  This is what I'm looking for:

> >> Some code for sorting the records by MyInfo.IntInfo would be nice too.
> >> Again, I have done it, but not efficiently...

> >Post me your routines and I'll try to help you ...

> Took a while for me to answer, had to find time to sit and write some sample
> code...  Here's how I purge deleted records:

> .....

> That's it in a nutshell, of course I always add the IOResult checks and
> FileExist functions in my actual programs...  I've heard of faster ways to do
> this without all the disk activity by moving undeleted records from the end of
> the data file to overwrite a deleted record towards the beginning of the data
> file, then just trunacating the data file when done...  This would be much more
> efficient, faster, and would not require makeing a TEMP file or accessing the
> disk as much...

That's the best way I find. You can store all the contents of the file in
memory (only not-to-purge records) and then rewrite over the original file.
But the other method (making a temp{*word*203}file in the hard disk and then
rename it) gives you more security (if your system hangs during the
expunging operation you don't loose the original data). I think so, but
may be I'm wrong ...

Quote
> As for the sorting methods I use, I basically write the contents of the record
> file to an array (much faster than file I/O operations)...  Then I loop
> through the array comparing each entry with the next entry, if the second
> one is larger, they get swapped and a boolean variable gets toggled to
> true...  With the start of each pass through the array, the boolean variable
> is set to false, it only goes true if values are swapped...  I have to
> continue through the loop until no values are swapped, that means they are
> all in ascending order...

> I know there's faster sorting methods than bubblesort, but I haven't been able
> to write it...  I've read about using Linked Lists to do this, but I really
> don't understand enough about using them...

Here you are an example of sorting records from a file of registers.
First, the program creates a file of registers with numeric and
alphanumeric fields. When the file is created, reads the contents of
fields of one of the numeric fields (for example). For the sorting uses
the ShellSort procedure, that sorts the array containing the numeric
data (using a temp{*word*203}array of pointers), returning an array with the
values of positions of the records in the file. Not linked lists, but the
sorting process is very fast, and you can sort up to 32000 records (aprox.).

My English may be not very good, but hope you understand almost all ...
... if any comment, please e-mail me.

Regards,
Aitor.

    ********  Hau ez da Espainia ... Euskal Herria Askatu !!!  *******
    ***  This is not Spain ... Freedom for the Basque Country !!!  ***
    *****************  epago...@lgdx02.lg.ehu.es  ********************    

Here it's the code ... hope this helps you !!

{------------------- cut here ----------------- cut here ----------------}

Program ShellSortExample;

Uses
  Crt;

Const
  MaxNumCustomers = 3200; { Number of records ... up to 32000 (aprox.) }

Type
  RCustomer  = Record
               Name    : String[20];
               Address : String[100];
               Country : String[30];
               TelNum  : LongInt;
               CP      : Word;
             End;
  TOrder    = Array[1..MaxNumCustomers] of Word;

Var
  FCustomers  : File of RCustomer;
  d1        : RCustomer;
  i         : Word;
  Order     : TOrder;

Procedure CreateFCustomers;
Var
  F : RCustomer;
Begin
  WriteLn('Creating ',MaxNumCustomers,' registers file ...');
  ReWrite(FCustomers);
  Randomize;
  for i:=1 to MaxNumCustomers do
    begin
      Seek(FCustomers,FileSize(FCustomers));
      F.Name    := 'Aitor Gomez Gonzalez';
      F.Address := 'Aurrekoetxea 1, 2. dcha.';
      F.Country := 'Euskal Herria';
      F.TelNum  := Random(255)*Random(255);
      F.CP      := Random(255)*Random(255)*Random(10);
      Write(FCustomers,F);
    end;
  WriteLn(MaxNumCustomers,' registers file created');
End;  { CreaFichCustomeres }

Procedure ShellSort(Var A:TOrder);
Type
  TOrderPtr = ^TOrder;
Var
  B     : TOrderPtr;  { Temp{*word*203}table }
  Hecho : Boolean;
  Salto : Word;
  j,k   : Word;
  l     : Word;
Begin
  New(B);
  For i:=1 to MaxNumCustomers do
    B^[i]:=i;
  Salto:=MaxNumCustomers;
  While Salto>1 do
    Begin
      Salto:=Salto div 2;
      Repeat
        Hecho:=True;
        For j:=1 to MaxNumCustomers-Salto do
          Begin
            i:=j+Salto;
            if A[j]>A[i] then
              Begin
                l:=B^[j];
                k:=A[j];
                B^[j]:=B^[i];
                A[j]:=A[i];
                B^[i]:=l;
                A[i]:=k;
                Hecho:=False;
              End;
          End;
      Until Hecho;
    End;
  { When sorted values are stored in 'B' array, we store'em in 'A' again and }
  { dispose 'B' from memory. Using temp{*word*203}pointers array 'B' we can sort   }
  { double number of records ...                                             }
  For i:=1 to MaxNumCustomers do
    A[i]:=B^[i];
  Dispose(B);
End;  { ShellShort }

begin
  Assign(FCustomers,'Customer.Dat');
  ClrScr;
  { Creation of 'Customer.Dat' }
  {$I-}
  Reset(FCustomers);
  Close(FCustomers);
  {$I+}
  If (IOResult=0)
    then
      Begin
        WriteLn('File ''Customer.Dat'' exist ... Overwrite? [Y/N] ');
        If UpCase(ReadKey)='Y' then
          CreateFCustomers
            Else Halt(1);
      End
        Else
          CreateFCustomers;
  { Open 'Customer.Dat' for reading }
  Reset(FCustomers);
  { Reading of 'CP' fields from all the 'RCustomer' registers in the file }
  { and storing them in the table 'Order'                                 }
  i:=0;
  Seek(FCustomers,0);
  WriteLn('Reading ''CP'' fields from file ''Customer.Dat'' ...');
  For i:=1 to MaxNumCustomers do
    Begin
      Read(FCustomers,d1);
      Order[i]:=d1.CP;
    End;
  WriteLn('Sorting ',MaxNumCustomers,' registers ...');
  ShellSort(Order); { <-------- Call to the sorting procedure !!! }
  WriteLn(MaxNumCustomers,' registers sorted');
{ Uncomment the following to display the sorted values }
{  For i:=1 to MaxNumCustomers do
    Begin
      Seek(FCustomers,Order[i]-1);
      Read(FCustomers,d1);
      WriteLn('Register n. ',Order[i]:5,' : ',d1.CP:5);
    End;
  Close(FCustomers);}
Repeat Until KeyPressed;
end.

{------------------- cut here -------------- cut here ------------------}

Re:Sorting records in a File


 . . . .
Quote
>I appreciate the help, but that's pretty much how I've done it in the
>past...  I know there's faster and more efficient code out there
>though...  IE: moving undeleted records "up" in the file to fill in
>deleted spots and trunicating the file...  This would save from having
>to make a temporary file and renaming it....

 . . . .
Obviously I didn't see this thread from the beginning, but there is one
point I believe you should consider in favour of writing a new file from
the old rather than updating an existing file in place: What happens if
something fails during the process? The answer: Make sure the old file
is unchanged until you have created a new one, then delete the old
backup file (if one exists), rename the old file to become a new backup,
and finally rename the new file to become the new operative file.

regards Sven

Re:Sorting records in a File


Quote
In article <45j2sm$...@usenet.srv.cis.pitt.edu> Yourn...@somwhere.COM (Your Name) writes:
>>Now, I have a file of say hundreds of records and some are marked for
>>deletion by toggling MyInfo.Deleted to true...  I need a routine that
>>will purge out the deleted records efficiently...  I can and have done
>>it, but not efficiently, I used the bubblesort method but that's too
>>slow...
>     It seems to me that if all you want to do is purge the records
>marked as "delete", one efficient way to do it is to simply copy the
>file, only don't write out deleted records.

Easy, but not efficient.  Assuming TurboPascal on a PC or equivalent
capability:Read the whole file as a typed file of record, noting the numbers
of those to be deleted, and counting. Then seek the last non-deleted one and
write it over the first to be deleted, repeating till done.  Then truncate the
file.  I'VE NOT DONE IT but AFAICS it should work.

You may be able to raise the efficiency by having your record size a multiple
or submultiple of 512, the sector size.
--
John Stockton : mailto:J...@dclf.npl.co.uk from off-site.  MIME.  WP.
 National Physical Laboratory, Teddington, Middlesex, TW11 0LW, UK
  Direct Phone +44 181-943 6087, Nearby Fax +44 181-943 7138  
   Postings out, Email in/out are fast.  Offshore news takes 0..10+
    days to arrive; please mail me a copy of non-UK followups!  
     Regret system puts unzoned (UK civil) time on messages.

Re:Sorting records in a File


Quote
On Thu, 26 Oct 1995, Dr John Stockton, NPL UK wrote:
> In article <45j2sm$...@usenet.srv.cis.pitt.edu> Yourn...@somwhere.COM (Your Name) writes:
> >>Now, I have a file of say hundreds of records and some are marked for
> >>deletion by toggling MyInfo.Deleted to true...  I need a routine that
> >>will purge out the deleted records efficiently...  I can and have done
> >>it, but not efficiently, I used the bubblesort method but that's too
> >>slow...
> >     It seems to me that if all you want to do is purge the records
> >marked as "delete", one efficient way to do it is to simply copy the
> >file, only don't write out deleted records.

> Easy, but not efficient.  Assuming TurboPascal on a PC or equivalent
> capability:Read the whole file as a typed file of record, noting the numbers
> of those to be deleted, and counting. Then seek the last non-deleted one and
> write it over the first to be deleted, repeating till done.  Then truncate the
> file.  I'VE NOT DONE IT but AFAICS it should work.

It depends where do you see the 'efficiency'.  Tell me, what happens if
your system hangs during the 'overwriting' operation ??

Regards,
Aitor.

    ********  Hau ez da Espainia ... Euskal Herria Askatu !!!  *******
    ***  This is not Spain ... Freedom for the Basque Country !!!  ***
    *****************  epago...@lgdx02.lg.ehu.es  ********************    

Other Threads