Board index » delphi » Read/write to very large text files???

Read/write to very large text files???

Hi to all,

It is very cumbersome to replace a line of text in a very large textfile
using readln and writeln.
Is there a better way?
If so, then show a short sample please.

Thanks for any replays

Gery

www.zipworld.com.au/~gprdata

Have a great day

 

Re:Read/write to very large text files???


A couple of questions:

1. Define very large?  Are we talking 5mb, 50mb, or 500mb?

2. Is the file in a particular format such that replacing a line will not
change it's overall length?

Quote
> Have a great day

Have a great day to you too!

Re:Read/write to very large text files???


In article <3b61fc57.4127...@news.zipworld.com.au>,
gprdataNoS...@zipworld.com.au says...

Quote

>Hi to all,

>It is very cumbersome to replace a line of text in a very large textfile
>using readln and writeln.
>Is there a better way?
>If so, then show a short sample please.

man sed

Re:Read/write to very large text files???


On Sat, 28 Jul 2001 00:11:12 GMT, "Michael P. Schneider"

Quote
<mich...@sbpaaimt.{*word*104}magician.net> wrote:
>A couple of questions:

Hi Michael,

here are the answers:

Quote

>1. Define very large?  Are we talking 5mb, 50mb, or 500mb?

This file contains data of companies like P/E, Yield, Company value usw.
It is now to large to fit into a Memo, and could grow 200 kb or about
4000 lines of this:

AAP    Telcos Jun09  20.7   6.4    36.6   0      +1     0      eMr-Ok
ABC    Buildr Dez09  2.4    75     -3.1   0      +40    0            
ABG    Devel  Jun09  -11    42     6.7    6.2    -5     100    aAp-No

(Sorry about the pitched font, it is to show how the file format is set.
5 + 2 space and then 9 time 6 + 1 space)

Quote

>2. Is the file in a particular format such that replacing a line will not
>change it's overall length?

The format how I would use it is:
string[5] for the "ticker" (share code), to find the matching ticker.
The rest is a string, of often with uneven lenght as you can see above.

What I wish to do is to replace a line if the Ticker already in the
file,  or add it if it is not (if possible in alphabetical order).
Thanks for any replays

Gery

www.zipworld.com.au/~gprdata

Have a great day

Re:Read/write to very large text files???


Well, I certainly would not use ReadLn and WriteLn for even a very
small file.

What is the point of going to the disk hundresd of times when you
could only do it once.

Working sensibly with a pretty standard PC you can 'wolf' in 1mb in
under a second.

My prefered choice is using THandleStream - or rather descendants of
TStream with heavily buffered input and output.

It might be an idea if you gave us just a little more information
about *exactly* what you want to do.  

Is this a one time update or do you need to make hundreds of changes
to the Text file ?

Does it *need* to be a Text file ?

On Fri, 27 Jul 2001 23:47:11 GMT, gprdataNoS...@zipworld.com.au (Gery

Quote
Rohrig) wrote:
>Hi to all,

>It is very cumbersome to replace a line of text in a very large textfile
>using readln and writeln.
>Is there a better way?
>If so, then show a short sample please.

>Thanks for any replays

>Gery

>www.zipworld.com.au/~gprdata

>Have a great day

Re:Read/write to very large text files???


Quote
> It is very cumbersome to replace a line of text in a very large textfile
> using readln and writeln.
> Is there a better way?

Let's say you want to replace the "n-th" line;

var ts : tstringlist; n : integer
begin
 ts := tstringlist.create;
 ts.loadfromfile('MyFile.txt');
 n := 234; { or any number }
 ts[n] := 'your new line';
 ts.savetofile('MyFile.txt'); { this will replace the first version of the
file }
 ts.free;
end;

Easy and fast.

Re:Read/write to very large text files???


Right - what you need is three files - two will do - but three is
better.

You also nead a descendant of TStream that will wolf in large chunks
from disk - by large I mean 100kb

1) Read the text file in large chunks
2) Scan for #13 #10 to find each line end
3) Write the line start position and length into the second file
    call this second file the 'Index File'

You now have random access to the text file for reading.

Now you decide you want to change a line - fine - what you do is
actually write the line to the third file - and in the Index file you
simply update the position and the length - but put in the negative of
the length - so it knows to go to the third file to get the data.

You now have a text file that is random read write access - it has
'rollback' if you want to discard the changes - and if you want to
'Flush' the file - ie: turn it back into a straight text file - all
you do is read each line from your Text File Manager and write it out
to a new file.

You then delete the first three files and simply rename the new fourth
file.

On a standard PC the first 'index' of the system would take about 1
second for about a megabyte - on my *very* slow network drive 2mb
takes about 4 seconds.

You can easily speed this up by making it do the initial setup in the
background - say off a timer or - but I don't fancy it - from its own
thread.

BTW - 200 kb is trivial.

On Sat, 28 Jul 2001 14:25:30 GMT, gprdataNoS...@zipworld.com.au (Gery

Quote
Rohrig) wrote:
>On Sat, 28 Jul 2001 00:11:12 GMT, "Michael P. Schneider"
><mich...@sbpaaimt.{*word*104}magician.net> wrote:

>>A couple of questions:

>Hi Michael,

>here are the answers:

>>1. Define very large?  Are we talking 5mb, 50mb, or 500mb?
>This file contains data of companies like P/E, Yield, Company value usw.
>It is now to large to fit into a Memo, and could grow 200 kb or about
>4000 lines of this:

>AAP    Telcos Jun09  20.7   6.4    36.6   0      +1     0      eMr-Ok
>ABC    Buildr Dez09  2.4    75     -3.1   0      +40    0            
>ABG    Devel  Jun09  -11    42     6.7    6.2    -5     100    aAp-No

>(Sorry about the pitched font, it is to show how the file format is set.
>5 + 2 space and then 9 time 6 + 1 space)

>>2. Is the file in a particular format such that replacing a line will not
>>change it's overall length?

>The format how I would use it is:
>string[5] for the "ticker" (share code), to find the matching ticker.
>The rest is a string, of often with uneven lenght as you can see above.

>What I wish to do is to replace a line if the Ticker already in the
>file,  or add it if it is not (if possible in alphabetical order).
>Thanks for any replays

>Gery

>www.zipworld.com.au/~gprdata

>Have a great day

Re:Read/write to very large text files???


Quote
J French wrote:

> 1) Read the text file in large chunks
> 2) Scan for #13 #10 to find each line end
> 3) Write the line start position and length into the second file
>     call this second file the 'Index File'
> ...
> On a standard PC the first 'index' of the system would take about 1
> second for about a megabyte - on my *very* slow network drive 2mb
> takes about 4 seconds.

I remember the time when we had DOS and those good old AT-machines:)
These tricks were very important to master that time.

But the current 1000+ MHz machines, having even their CPU internal cache
bigger than the 200 kB disk file here, fast hard disks and Windows
intelligently doing the disk caching. Well, against this, the algorithm
above looks like 10 fold overkill for the task involved. <g>

You mostly only need a simple StringList. And because of Windows buffering,
the ReadLn/WriteLn operations are quite as fast as TStreams.
Write a couple of simple functions, and you can easily find, edit
and write back any row of the data.

I made a simple speed test:
1. Read in to StringList three text files, sizes 180 kB, 233 kB and
   306 kB, totalling 712 kB.
2. Within StringList, scan through each of them trying to find a string starting
   with string 'AAP  '
3. Finally, write the Stringlist back to disk, to create exact copy of the
   original files.

This all, with all those three files, took only 0.2 seconds on my AMD-1000
Mhz machine.

I'll include my ugly, yet working test code below.

========================================================
Function GetElapsedTime(StartT, EndT:TDateTime):String;
{This function needed only for reporting test results}
var
  Year, Month, Day, Hour, Min, Sec, MSec: Word;
begin
  DecodeTime(EndT-StartT, Hour, Min, Sec, MSec);
  Result:='   ' +Format('%.2d',[Min])+':' +Format('%.2d',[Sec]) +':' +Format('%.3d',[MSec]);
end;

procedure TForm1.Button1Click(Sender: TObject);
var
  aList:TStringList;
  i,j:integer;
  InFile,OutFile:TextFile;
  S:String;
  StartTime,EndTime:TDateTime;
  Found:Boolean;
begin
  aList := TStringList.Create;
  StartTime :=Now; {Start time}
  for j:=1 to 3 do
  begin
   {1. Test Read in text files TEST1.TXT, TEST1.TXT and TEST3.TXT}
    AssignFile(InFile,'C:\TEST' +IntToStr(j) +'.TXT');
    Reset(InFile);
    aList.Clear;
    while not Eof(InFile)
    do
    begin
      ReadLn(InFile,S);
      aList.Add(S);
    end;
    CloseFile(InFile);

    {2. Test Scan through the Stringlist, trying to find a string starting with 'AAP  '}
    Found := False;
    i:=0;
    while not Found and (i < aList.Count) do
    if Copy(aList.Strings[i],1,5) ='AAP  '
    then Found:=True
    else inc(i);
    if Found
    then Label2.Caption:='Found on line: ' + IntToStr(i);
    Application.ProcessMessages;

    {3. Finally, Test write TEST1.TXT, TEST1.TXT and TEST3.TXT files out}
    AssignFile(OutFile,'C:\OUT' +IntToStr(j) +'.TXT');
    Rewrite(OutFile);
    for i:=0 to aList.Count-1
    do
    begin
      WriteLn(OutFile,aList.Strings[i]);
    end;
    CloseFile(OutFile);
  end;
  aList.Free;
  EndTime:=Now; {End time}
  Label1.Caption:= GetElapsedTime(EndTime, StartTime);{Show elapsed time for the whole action}
end;

===============================================================

I'm glad if someone finds can find an error, which would mean that
also the 0.2 sec test result would be wrong.
Raw search through the text data is surprisingly fast with the current
machines.

Markku Nevalainen

Re:Read/write to very large text files???


Quote
J French wrote:

> 1) Read the text file in large chunks
> 2) Scan for #13 #10 to find each line end
> 3) Write the line start position and length into the second file
>     call this second file the 'Index File'
> ...
> On a standard PC the first 'index' of the system would take about 1
> second for about a megabyte - on my *very* slow network drive 2mb
> takes about 4 seconds.

I remember the time when we had DOS and those good old AT-machines:)
These tricks were very important to master that time.

But the current 1000+ MHz machines, having even their CPU internal cache
bigger than the 200 kB disk file here, fast hard disks and Windows
intelligently doing the disk caching. Well, against this, the algorithm
above looks like 10 fold overkill for the task involved. <g>

You mostly only need a simple StringList. And because of Windows buffering,
even the ReadLn/WriteLn operations are quite as fast as TStreams.
StringList even has simpe built in properties ReadFromFile and
SaveToFile, which seem to have quite equal speed with Readln/Writeln.

Then you only need to write a couple of simple functions, and you'll
easily find, edit and write back any data with the StringList.

I made a simple speed test:
1. Read in to StringList three text files, sizes 180 kB, 233 kB and
   306 kB, totalling 712 kB.
2. Within StringList, scan through each of them trying to find a string starting
   with string 'AAP  '
3. Finally, write the Stringlist back to disk, to create exact copy of the
   original files.

This all, with all those three files, took less than 0.2 seconds on my
AMD-1000 Mhz machine.

I'll include my test code below.

========================================================
procedure TForm1.Button1Click(Sender: TObject);
var
  aList:TStringList;
  i,j:integer;
  S:String;
  StartTime,EndTime:TDateTime;
  Found:Boolean;
begin
  aList := TStringList.Create;
  StartTime :=Now; {Start time}
  for j:=1 to 3 do
  begin
   {1. Test Read in text files TEST1.TXT, TEST1.TXT and TEST3.TXT}
    aList.Clear;
    aList.LoadFromFile('C:\TEST' +IntToStr(j) +'.TXT');
   {2. Test Scan through the Stringlist, trying to find a string starting with 'AAP  '}
    Found := False;
    i:=0;
    while not Found and (i < aList.Count) do
    if Copy(aList.Strings[i],1,5) ='AAP  '
    then Found:=True
    else inc(i);
    if Found
    then Label2.Caption:='Found on line: ' + IntToStr(i);
    Application.ProcessMessages;
   {3. Finally, Test write TEST1.TXT, TEST1.TXT and TEST3.TXT files out}
    aList.SaveToFile('C:\OUT' +IntToStr(j) +'.TXT');
  end;
  aList.Free;
  EndTime:=Now; {End time}
  Label1.Caption:= GetElapsedTime(EndTime, StartTime);{Show elapsed time for the whole action}
end;

===============================================================

I'm glad if someone can find an error here. The good 0.2 sec test result
on a 712kB text file was slighly surprising to me too.

Markku Nevalainen

Re:Read/write to very large text files???


In article <3B634407.2...@iki.fi>, Markku says...

    i:=0;
    while not Found and (i < aList.Count) do
    if Copy(aList.Strings[i],1,5) ='AAP  '
    then Found:=True
    else inc(i);
    if Found
    then Label2.Caption:='Found on line: ' + IntToStr(i);
    Application.ProcessMessages;

if you going to do become a programmer one day, you need to learn
how indent the code you write so it will be easy to read. the above
code is just plain ugly.

Re:Read/write to very large text files???


Quote
steve@nospam wrote:

> if you going to do become a programmer one day, you need to learn
> how indent the code you write so it will be easy to read. the above
> code is just plain ugly.

Sorry, I did know someone was intending to publish it on some larger
Tip collection or something...

My hacks usually are ugly. Even this small one had also one totally
needless line left there, Application.ProcessMessages, but hey, it
works and I can live it.
After all, there was only 8 lines of code in that chunk. You can't write
a totally non-understandable code mesh with so limited line count:)

Markku Nevalainen

Re:Read/write to very large text files???


Quote
steve@nospam wrote:

> if you going to do become a programmer one day, you need to learn
> how indent the code you write so it will be easy to read. the above
> code is just plain ugly.

Sorry, I did not know someone was intending to publish it on some larger
Tip collection or something...

My hacks usually are ugly. Even this small one had also one totally
needless line left there, Application.ProcessMessages. But hey, it
works and I can live it.
After all, there was only 8 lines of code in that chunk. You can't write
a totally non-understandable code mesh with so limited line count:)

Markku Nevalainen

Re:Read/write to very large text files???


Quote
> if you going to do become a programmer one day, you need to learn
> how indent the code you write so it will be easy to read. the above
> code is just plain ugly.

 Do a search on google.groups using Markku's name and you will see
that not only is he an excellent programmer, but he has, through the
years, devoted a lot of his own spare time to helping people in this
newsgroup.

 Many of us crowd our polished code over onto the left margin when
posting in a newsgroup in order to avoid line-wrap.

=========================================
Brad Blanchard
Website :  http://www.braser.com

Re:Read/write to very large text files???


On Sun, 29 Jul 2001 02:00:23 +0200, Markku Nevalainen <m...@iki.fi>
wrote:

Quote

>I remember the time when we had DOS and those good old AT-machines:)
>These tricks were very important to master that time.

>But the current 1000+ MHz machines, having even their CPU internal cache
>bigger than the 200 kB disk file here, fast hard disks and Windows
>intelligently doing the disk caching. Well, against this, the algorithm
>above looks like 10 fold overkill for the task involved. <g>

Yes - at first sight a TStringList looks the simplest hack - but I too
come from the old days - the days when 64kb code and data space was a
luxury.

I learnt several major lessons :-
    1) never make assumptions about the size of the data
    2) Disk space grows faster (arithmetically) than RAM
    3) never use RAM when you can use disk
    4) if you can use 'rollback' go for it
    5) write re-usable code - it is an investment
    6) do not do 'large' operations when small ones will do

The TStringList is a wonderful creature - I discovered its uses back
in 1989 from working on stuff for printing.

If the guy used my method and did *not* flush the file then his
startup time would be infinitisimal - just checking the dates of 3
files and opening them.

Although it looks complex - used properly it is actually simpler than
a TStringList - to do an update you simply do :

       VStringList[ LineNo].Text := MyNewText

On the other hand - maybe he should use the TStringList and in a few
years time wonder how to fix his system for the 200mb files having
grown to 1gb and users (who always do this - having loaded on piles of
ill-written memory hungry software) moaning to him because his system
is going slowly and they can hear the disks grunting as they thrash
swapping virtual memory pages.

Re:Read/write to very large text files???


On Sun, 29 Jul 2001 12:39:18 +0200, GB Blanchard <gbbNOS...@ctv.es>
wrote:

Quote
>> if you going to do become a programmer one day, you need to learn
>> how indent the code you write so it will be easy to read. the above
>> code is just plain ugly.

> Do a search on google.groups using Markku's name and you will see
>that not only is he an excellent programmer, but he has, through the
>years, devoted a lot of his own spare time to helping people in this
>newsgroup.

Hmm - well I did not criticize his code layout - but I do not like his
advice to the original poster

- in fact I thought it was of the 'dig a pit, put stakes in the
bottom, line it with barbed wire and then dive into it variety'

Quote

> Many of us crowd our polished code over onto the left margin when
>posting in a newsgroup in order to avoid line-wrap.

>=========================================
>Brad Blanchard
>Website :  http://www.braser.com

Go to page: [1] [2]

Other Threads