Board index » delphi » Large Text File Import

Large Text File Import

I have a large text file to import every month from my harddrive to the
Interbase Database on our Winnt 3.51 Server the file is about 6,000
records. The file is not delmitted it is fixed but each record takes up
three lines. I have been using a readln to a temp var. Then checking the
forst char to see which fields are on that line the using
Copy(String,StartPos,NumberToCopy) in a while loop  to get it in my
fields. This takes forever. Anyone know a better way.
                        Thanks in Advance
                                Jon

 

Re:Large Text File Import


Quote
J...@MAIL.STATE.KY.US wrote...
> I have a large text file to import every month from my harddrive to the
> Interbase Database on our Winnt 3.51 Server the file is about 6,000
> records. The file is not delmitted it is fixed but each record takes up
> three lines. I have been using a readln to a temp var. Then checking the
> forst char to see which fields are on that line the using
> Copy(String,StartPos,NumberToCopy) in a while loop  to get it in my
> fields. This takes forever. Anyone know a better way.

You can probably get an easy speed improvement by using
the SetTextBuf procedure (see Delphi help).  It defaults to only 128
bytes; try 16K or so and see if it helps.

A few months ago I wrote code to import a huge (30M or so) text
database; I was worried about speed and so experimented with
binary block read.  It worked but code was rather messy as my
lines were of varying length and I had to manually detect the
end of line.  I ended up going back to readln.

However if in your case you can *guarantee* that the line lengths
are absolutely predictable then you can probably get a hefty
performance boost by declaring the file as untyped (not text);
  type
   Rec = record
                  // line 1
                  str1:array[0..4] of char;
                  dummy1:char;
                  str2:array[0..9] of char;
                  str3:array[0..1] of char;
                  crlf1:Word;  // possibly only char here if CR not CRLF
                  // line 2
                   ...
                  crlf2:Word;
                  // line 3
                   ...
                  str10:array[0..7] of char;
                  str11:array[0..3] of char;
                  crlf3:Word;
               end;
var R:Rec;
     F:File;
     NumRead:integer;
...
  AssignFile(F,'C:\TEST.TXT');
  Reset(F,sizeof(R));
 while not eof...
  BlockRead(F, R, SizeOf(R), NumRead);
  if NumRead <> sizeof(R) then begin  Error := True; break; end;
  if (crlf1 <> $1310) or (crlf2 <> $1310) // customise to suit
    then begin Error := True; break; end;
  // extract characters from record here
end // while

(This is OTTOMH.)

Tip: I prefer exceptions to goto/break, but exceptions are
somewhat slow.  If performance is a *big problem* then
avoid exceptions in the critical areas.
Also, turn off I/O error checking as it slows things down.
Use {$I-} etc on only across your while loop so the rest
of your program still benefits from automatic exceptions.
Eg, so Reset will still raise exception if file doesn't exist.

How do you extract strings etc from record above?
The important thing to see is that the record
character arrays are neither PChar (not zero-terminated)
nor ShortString (no length byte).  

One way to get the text would be to start at the end and
work backwards, applying zero terminator as you go.
Eg, var FieldXX:string;
      crlf3 := 0;
      Field11 := PChar(str11);
      str11[0] := #0;
      Field10 := PChar(str10);
      str10[0] := #0;
      ...

A better way would be to use
    SetString(Field11,str11,sizeof(str11));
    SetString(Field10,str10,sizeof(str10));

You may need to use other approaches here
depending on how you're inserting stuff into your
database.

Also, you'll probably want to drop your table indexes
during import and recreate them once all the data
is in place.  You can get *big* performance improvement
here, as you can imagine...

A word of warning: keep a close eye on project option
"Aligned Record Fields" ($A compiler option).  You may
need to explicitly turn this off across the record declaration
otherwise you may find the compiler pads things out to
32bit boundaries, which you definitely don't want here.

HTH.

_________________________________
Grant Walker, g...@enternet.com.au

Re:Large Text File Import


Quote
J...@MAIL.STATE.KY.US wrote:

> I have a large text file to import every month from my harddrive to the
> Interbase Database on our Winnt 3.51 Server the file is about 6,000
> records. The file is not delmitted it is fixed but each record takes up
> three lines. I have been using a readln to a temp var. Then checking the
> forst char to see which fields are on that line the using
> Copy(String,StartPos,NumberToCopy) in a while loop  to get it in my
> fields. This takes forever. Anyone know a better way.
>                         Thanks in Advance
>                                 Jon

I suspect the problem is with your data base access calls, not with
readln's off a text file, and copy's off the string. Without looking at
your code, can't tell, but I would comment out the db portion of the
code and see what it takes to just read the text file... I suspect not
much.

If I am right, and the problem is with the db code, make sure you
disconnect any unneeded data aware components while the batch process is
taking place.

6000 records should not take long, something is wrong with your code.

Regards,

Robert

Re:Large Text File Import


Quote
J...@MAIL.STATE.KY.US wrote:

> I have a large text file to import every month from my harddrive to the
> Interbase Database on our Winnt 3.51 Server the file is about 6,000
> records. The file is not delmitted it is fixed but each record takes up
> three lines. I have been using a readln to a temp var. Then checking the
> forst char to see which fields are on that line the using
> Copy(String,StartPos,NumberToCopy) in a while loop  to get it in my
> fields. This takes forever. Anyone know a better way.
>                         Thanks in Advance
>                                 Jon

You need to create a structure mirroring the data on the drive...

type
        rectype=record
                field1:array[1..8] of char;
                field2:array[1..2] of char;
                ...
        end;

Var
        recordfile: file of rectype;
        rec:rectype
begin
        assignfile(recordfile,'inputfile.txt');
        reset(recordfile);
        while not eof do begin
                read(recordfile,rec);
                DoStuffToIt;
        end;
        closefile(recordfile);
end;

--
-----------------------------------------
Software Services - Making Windows Scream
http://www.invsn.com/softserv/
bry...@thevision.net
-----------------------------------------

Other Threads