Board index » cppbuilder » Parsing a file

Parsing a file


2005-09-12 10:12:20 PM
cppbuilder44
What is the quickest way to parse a file in the format of
Line1
Line2
Line3
Line4
Line5
Line6
Line7
Line8
Line9
Line10
Line11
Line12
I have many, many lines, actually a few hundred. I need to parse the file
and format it in
this format
Line1{tab}Line2{tab}Line3{tab}Line4
Line5{tab}Line6{tab}Line7{tab}Line8
Line9{tab}Line10{tab}Line11{tab}Line12
then export it out to a file. Can anyone offer some assistance please.
Barry
 
 

Re:Parsing a file

"Barry" < XXXX@XXXXX.COM >writes:
Quote
What is the quickest way to parse a file in the format of

Line1
Line2
Line3
Line4
Well, quickest in the sence of "programming time required", or
quickest in terms of runtime performance? For rapid development, I
would suggest using a loop over calls to std::getline(), which reads
each line into a std::string. std::getline takes an input stream, a
string object, and a delimiter. By default the delimiter is a
newline, but you can change that to a tab if necessary.
Later, if that proves too slow, then you can worry about optimizing
it. So make an interface to your parser that does not expose the
implementation, and you can change that later if needed.
Quote
I have many, many lines, actually a few hundred. I need to parse the file
and format it in
this format

Line1{tab}Line2{tab}Line3{tab}Line4
Line5{tab}Line6{tab}Line7{tab}Line8
Line9{tab}Line10{tab}Line11{tab}Line12

then export it out to a file. Can anyone offer some assistance please.
Writing it should be straight-forward. For each line, write it, then
write a tab, then loop. (Unless, of course, your lines actually
contain tabs, which certainly would be a design problem!)
--
Chris (TeamB);
 

Re:Parsing a file

"Chris Uzdavinis (TeamB)" < XXXX@XXXXX.COM >wrote in message
Quote
Well, quickest in the sence of "programming time required", or
quickest in terms of runtime performance? For rapid development, I
would suggest using a loop over calls to std::getline(), which reads
each line into a std::string. std::getline takes an input stream, a
string object, and a delimiter. By default the delimiter is a
newline, but you can change that to a tab if necessary.
Not to contradict, but would this not maybe be simpler, providing that the
OP is VCL only, and that his file contains quoted spaces ?
#include <memory>
int i = 0;
std::auto_ptr<TStringList>InFileData( new TStringList() );
std::auto_ptr<TStringList>LineData( new TStringList() );
std::auto_ptr<TStringList>OutFileData( new TStringList() );
InFileData->LoadFromFile( "filename.txt" );
while( i < InFileData->Count )
{
LineData->CommaText = InFileData->Strings[i];
OutFileData->AddStrings( LineData.get() );
OutFileData->Add( "" );
++i;
};
OutFileData->SaveToFile( "filename_parsed.txt" );
Jonathan
 

{smallsort}

Re:Parsing a file

"Jonathan Benedicto" < XXXX@XXXXX.COM >wrote in message
Quote
the OP is VCL only
What makes you think that? The OP did not say one way or another what
environment is being used.
Quote
his file contains quoted spaces
Not according to the sample that the OP actually showed.
Quote
std::auto_ptr<TStringList>InFileData( new TStringList() );
std::auto_ptr<TStringList>LineData( new TStringList() );
std::auto_ptr<TStringList>OutFileData( new TStringList() );
<snip>
Your code does not do what the OP asked for. Assuming the source file
contains blocks of data that are always 4 lines of text followed by a blank
line then a more accurate approach would be more like the following instead:
#include <memory>
std::auto_ptr<TStringList>InFileData( new TStringList() );
std::auto_ptr<TFileStream>OutFileData( new
TFileStream("filename_parsed.txt", fmCreate) );
InFileData->LoadFromFile( "filename.txt" );
for(int i = 1; i <= InFileData->Count; ++i)
{
if( (i % 5) == 0 )
OutFileData->Write("\r\n", 2);
else
{
AnsiString Line = InFileData->Strings[i-1];
OutFileData->Write(Line.c_str(), Line.Length());
OutFileData->Write("\t", 1);
}
};
You could then optimize the reading of the source file to read a line at a
time from the file so that the entire file is not loaded into memory at one
time.
Gambit
 

Re:Parsing a file

"Jonathan Benedicto" < XXXX@XXXXX.COM >writes:
Quote
"Chris Uzdavinis (TeamB)" < XXXX@XXXXX.COM >wrote in message
news: XXXX@XXXXX.COM ...
>Well, quickest in the sence of "programming time required", or
>quickest in terms of runtime performance? For rapid development, I
>would suggest using a loop over calls to std::getline(), which reads
>each line into a std::string. std::getline takes an input stream, a
>string object, and a delimiter. By default the delimiter is a
>newline, but you can change that to a tab if necessary.

Not to contradict, but would this not maybe be simpler, providing that the
OP is VCL only, and that his file contains quoted spaces ?
Well, I don't mind being challenged, especially if I'm wrong.
However, the VCL TStringList assumes "lines" are newline delimited, I
think, and I am not sure if they can be made to use tabs instead.
It's not a "bad" solution to use the VCL if you're already using the
VCL, but when code doesn't explicitly depend on the VCL, I prefer
approaches that do no extend the dependency. That is, I like to make
the smallest possible amount of my application depend on any
non-standard features, and then tightly segregate that code to its own
corner of my application. That makes it easier to replace, and the
heart of the application is still standard thus portable.
Quote
#include <memory>

int i = 0;
std::auto_ptr<TStringList>InFileData( new TStringList() );
std::auto_ptr<TStringList>LineData( new TStringList() );
std::auto_ptr<TStringList>OutFileData( new TStringList() );

InFileData->LoadFromFile( "filename.txt" );

while( i < InFileData->Count )
{
LineData->CommaText = InFileData->Strings[i];
OutFileData->AddStrings( LineData.get() );
OutFileData->Add( "" );
++i;
};

OutFileData->SaveToFile( "filename_parsed.txt" );
If this works, then there doesn't appear to be a great reason not to
use it. Provided, again, that you don't mind locking yourself further
into nonstandard code.
--
Chris (TeamB);
 

Re:Parsing a file

"Remy Lebeau (TeamB)" < XXXX@XXXXX.COM >wrote in message
Quote
What makes you think that? The OP did not say one way or another what
environment is being used.
I meant that if he was using the VCL. Sorry for the confusion.
Quote
Not according to the sample that the OP actually showed.
[snip]
Quote
Your code does not do what the OP asked for. Assuming the source file
contains blocks of data that are always 4 lines of text followed by a
blank
line then a more accurate approach would be more like the following
instead:
I'm sorry, I misunderstood the formats that he wanted.
Jonathan
 

Re:Parsing a file

"Chris Uzdavinis (TeamB)" < XXXX@XXXXX.COM >wrote in message
Quote
If this works, then there doesn't appear to be a great reason not to
use it. Provided, again, that you don't mind locking yourself further
into nonstandard code.
It doesn't. :-(. Mr Lebeau pointed out that I had misunderstood the formats
the OP wanted.
Jonathan