Handling large text files

Every time you go to the disk you make the system do a vast amount of
work.

You need to do large buffered reads - say 100kb at a time then pull
out each line from the position of #13 #10  (or #10 if Unix)

The *only* way of determining the number of lines in a Text file is by
counting the number of #13 #10 s - and watch out the last line might
not terminate with that.

I have a Delphi DLL for reading a text file, counting the lines and
returning each line on request (like an Array) at
   www.iss.u-net.com/etxtmgr.htm

The demo driver is VB - but you should be able to figure that out - if
nothing else it will give you an idea of what speeds *can* be
obtained.
It is based around a buffered read/write pseudo stream class.

BTW - watch it when creating millions of small files - just don't do
it on a FAT system - each time you create a new file name the system
has to scan the ENTIRE directory to check that the file does not
already exist. NTFS is better - it has indexed directory trees.

If I were you I would create *one* large file with an index into each
'pseudo small' file, and write a method of extracting each small file
on demand.  Yup - I mean write your own pseudo directory.

Remember *any* disk access is a major task - you lose your App down a
hole into the OS - and the work it does down there is massive.

On Thu, 26 Apr 2001 21:44:02 GMT, "Johann Joubert"

Quote
<lastlegion...@hotmail.com> wrote:
>I have to do some maintenance on large text files by reading each line and
>then writing it to millions of other smaller text files (no larger than
>32K).  However, the large files are each around 1.4GB in size, EACH.  Will
>AssignFile and Reset and Readln work?  Or will it slow the system down?  It
>will run on a big ass server...hehe.

>Also, when the file is so incredibly large, what is the fastest way to
>determine the number of lines in the file?  Any help would be appreciated.

>Johann Joubert