Board index » delphi » Is it a text file or not.

Is it a text file or not.

Hi,

I'm trying to search through a directory containing some log files
and some data files. I need to pick out the latest TEXT file from the list.

The filenames are randomly generated.
I can get the date of the files, thats easy enough, but
whats a quick and easy way of testing for being TEXT?

Thanks.

 

Re:Is it a text file or not.


Quote
Jeremy Twiggs wrote in message <01bda9b2$2958cc00$130210ac@jeremy>...
>I'm trying to search through a directory containing some log files
>and some data files. I need to pick out the latest TEXT file from the list.

>The filenames are randomly generated.
>I can get the date of the files, thats easy enough, but
>whats a quick and easy way of testing for being TEXT?

Usually you can tell by convention that a file is a text file by its
extension (e.g., .TXT, .DAT, etc.),  but there's really no guarantee that
such a file is ASCII text.

To test whether a file is all ASCII, you need to look at every byte.  One
way is to use something like BlockRead to read the file in binary mode, and
then make sure all bytes are IN [$20..$7F].  In addition, you normally need
to allow certain control characters as text, e.g, CR ($0D), LF ($0A), and
perhaps tabs ($09) in an ASCII file.  Is an ESC character ($1B) OK? Is a
formfeed, FF, ($0C) OK?   Some will say yes, others might say no.

Years ago I was quite confused by a file that was half ASCII and half
EBCDIC.  I wrote a utility that showed me the distribution of all characters
in the file.  It's been very useful in finding characters in a text file
that you sometimes can't "see" with editors, but really are in the file.
This Turbo Pascal utility, CharCnt, is item B-2 on the "Other Projects" page
in my Computer Lab.

efg
_________________________________________
efg's Computer Lab:  http://infomaster.net/external/efg

Earl F. Glynn                 E-Mail:  EarlGl...@att.net
MedTech Research Corporation, Lenexa, KS  USA

Re:Is it a text file or not.


Are you saying that you want to know about files with .txt extensions or do
you want to know all ascii text files?

the .txt extension thing is simple; just use findfirst, findnext and if
you're doing 32-bit stuff, make sure you do a findclose.

If your question is the second part, good luck because as I see it the first
thing you'll have to do is define just what a text file is and then parse
the files to see if they fit the mold.

Sherril Blackmon

Quote
Jeremy Twiggs wrote in message <01bda9b2$2958cc00$130210ac@jeremy>...
>Hi,

>I'm trying to search through a directory containing some log files
>and some data files. I need to pick out the latest TEXT file from the list.

>The filenames are randomly generated.
>I can get the date of the files, thats easy enough, but
>whats a quick and easy way of testing for being TEXT?

>Thanks.

Re:Is it a text file or not.


In article <01bda9b2$2958cc00$130210ac@jeremy>, "Jeremy Twiggs"

Quote
<jeremy@hotel"SPAMthanksbutdoyoumindifidont"scene.co.uk> writes:
>I can get the date of the files, thats easy enough, but
>whats a quick and easy way of testing for being TEXT?

You can't really. What you can do is to test the first so many bytes and if 95%
of them are in the set of char = [#9, #10, #12, #13, #32..'~'] then its most
likely a text file because it has only characters which are space to tilde,
tab, LF, CR,  & FF.

There was some considerable discussion some 4 months ago about this.

Alan Lloyd
alangll...@aol.com

Re:Is it a text file or not.


Hey Jeremy,

    You've already gotten some fine suggestions.  However, you may be able
to use a different approach depending on your circumstances.  I would
propose this, however, it does count on some basic constants.

1) Although the files are randomly generated, are their any garuanteed
strings you could search for within the text?  Meaning, is the text file
formatted (its contents) in such a way that it has predictable headers,
formatting codes, or the like.

2) Are their any distinctive patterns at all in the filenames of the
randomly generated text files that would make them stand out above all other
non-text files in the directory.  In otherwords, are the text files a jumble
of random characters while all other files use standard naming conventions
such as ".DAT' or '.EXE'..etc..

If you can answer yes to the first scenario then reading the text file into
a Tstrings object using the LOADFROMFILE method and doing a POS search of
the TEXT property of the Tstrings object for a particular string constant
would be the easiest method.

If you can answer yes to the second one, then you could compile a list of
all text files by excluding those files with normal extensions assuming the
non-standardly named files to be the randomly generated text files.  Using
FindFirst and FindNext functions to get the filenames form the directory.

The second method by far would render the fastest results if its possible.
If not, however, the first method is much easier and will code more simply
than scrutinizing the text on a character basis.  Not necessarily faster in
the grand scheme of things.  But where you can't save speed in processing
you might as well save time in programming :^)

Hope this gives you some ideas.

- Delphi Newbie

Quote
Jeremy Twiggs wrote in message <01bda9b2$2958cc00$130210ac@jeremy>...
>Hi,

>I'm trying to search through a directory containing some log files
>and some data files. I need to pick out the latest TEXT file from the list.

>The filenames are randomly generated.
>I can get the date of the files, thats easy enough, but
>whats a quick and easy way of testing for being TEXT?

>Thanks.

Re:Is it a text file or not.


Thanks to everyone for the help.

Delphi Newbie <nob...@nothing.com> wrote in article
<6o36ld$...@examiner.concentric.net>...

Quote
> Hey Jeremy,

>     You've already gotten some fine suggestions.  However, you may be
able
> to use a different approach depending on your circumstances.  I would
> propose this, however, it does count on some basic constants.

> 1) Although the files are randomly generated, are their any garuanteed
> strings you could search for within the text?  Meaning, is the text file
> formatted (its contents) in such a way that it has predictable headers,
> formatting codes, or the like.

Yes there are, so that's a good place to start thanks.

Quote
> 2) Are their any distinctive patterns at all in the filenames of the
> randomly generated text files that would make them stand out above all
other
> non-text files in the directory.  In otherwords, are the text files a
jumble
> of random characters while all other files use standard naming
conventions
> such as ".DAT' or '.EXE'..etc..

The file extensions are in the range 000 to FFF (i.e. they use hexadecimal
values)
They appear to increment sequentially, but I believe the filename portion
is related
to (the files are created by Seagate Backup Exec) a backup job, so if the
job is
changed the file extension sequence starts again. I also believe the
numbers are recycled
so that any gaps in the sequence are re-used, and that the sequence only
goes
for a few tens of values.

Quote
> If you can answer yes to the first scenario then reading the text file
into
> a Tstrings object using the LOADFROMFILE method and doing a POS search of
> the TEXT property of the Tstrings object for a particular string constant
> would be the easiest method.
> If you can answer yes to the second one, then you could compile a list of
> all text files by excluding those files with normal extensions assuming
the
> non-standardly named files to be the randomly generated text files.
Using
> FindFirst and FindNext functions to get the filenames form the directory.

> The second method by far would render the fastest results if its
possible.
> If not, however, the first method is much easier and will code more
simply
> than scrutinizing the text on a character basis.  Not necessarily faster
in
> the grand scheme of things.  But where you can't save speed in processing
> you might as well save time in programming :^)

> Hope this gives you some ideas.

Some good ideas, thanks, which I very much intend to incorporate in the
next release.

- Show quoted text -

Quote
> - Delphi Newbie

Other Threads