Board index » delphi » finding a text string in an array of bytes (binary array)

finding a text string in an array of bytes (binary array)

I am converting some binary files from one format to another.  The input
file is a file of binary records of the type:

type x = record
   header: array[1..256] of byte;
   data: array[1..10000] of single;
end

The header is a binary array of 4 byte integers and 2 byte integers but also
includes at a certain byte position a comma separated string of information
that is 62 bytes long in which there should be some differences in that
string of data from record to record and I have to parse that each time.
We'll it seems that the harware did not do a good job of writing that string
to the header.  I's where it's supposed to be most of the time but not
always.  It may move around a little or even sometimes the first of the
string is missed and only partial data is written to the header.  It happens
just often enough to crash the program when translating information that is
expected in fields in that csv string.  If the data location didn't move
around a little then I could just test a value and do what's appropriate but
I want to find it and parse it if it moves around and is a complete string.

As a result, I need to search for the head (first 5 characters never
changes) of the wanted string, beginning a a given byte position, allowing
for the index position to move 10-20 bytes or so and retrieve the index into
the array for the head of the string.  If it's not found within some
reasonable range, return -1.  I hope my explanation is not too confusing.

I can't make things work reliably.  I need something like the PosEx function
that will find a substring in a byte array.  I'm having difficulty in using
and mixing pointer types for this purpose being a recent convert from C to
Delphi.

Any help?

Thanks
Bob

 

Re:finding a text string in an array of bytes (binary array)


If efficiency is a real concern then a boyer-moore search would probably be
best. A less efficient means would be

var pc, pe : pChar;

pc := @varX.header;
pe := @varX.header [256];
inc (pe, SizeOf (varX.header) - 4);
while (pc <> pe) and (pc <> 'XXXXX') do
    inc (pc);

One note. Have you taken into account the fact that Delphi may add padding
in records to improve access times. If not, look up the word Packed in the
help.

Re:finding a text string in an array of bytes (binary array)


Bruce's point about 'packed' is almost certainly correct

I had nightmares over that one.

I suggest that you map out the Record as it is *supposed* to be
defined

When it comes to the Comma Separated string, define it as an array of
Char, that way you can convert it into a String with negligable
overhead - example below.

Strings are much easier to work with
- personally I would stay away from TStringList.CommaText

type TRec = packed record
   header: array[1..256] of Char;
   data: array[1..10000] of single;
end;
{$R *.DFM}

procedure TForm1.Button1Click(Sender: TObject);
Var
  S :String;
  Rec :TRec;
begin
  FillChar( Rec.Header, SizeOf(Rec.Header), Ord('x'));
  S := Rec.Header ;
  ShowMessage( S ) ;

end;

On Tue, 1 Oct 2002 18:43:35 -0500, "Bob Ballard"

Quote
<[email protected]> wrote:
>I am converting some binary files from one format to another.  The input
>file is a file of binary records of the type:

>type x = record
>   header: array[1..256] of byte;
>   data: array[1..10000] of single;
>end

>The header is a binary array of 4 byte integers and 2 byte integers but also
>includes at a certain byte position a comma separated string of information
>that is 62 bytes long in which there should be some differences in that
>string of data from record to record and I have to parse that each time.
>We'll it seems that the harware did not do a good job of writing that string
>to the header.  I's where it's supposed to be most of the time but not
>always.  It may move around a little or even sometimes the first of the
>string is missed and only partial data is written to the header.  It happens
>just often enough to crash the program when translating information that is
>expected in fields in that csv string.  If the data location didn't move
>around a little then I could just test a value and do what's appropriate but
>I want to find it and parse it if it moves around and is a complete string.

>As a result, I need to search for the head (first 5 characters never
>changes) of the wanted string, beginning a a given byte position, allowing
>for the index position to move 10-20 bytes or so and retrieve the index into
>the array for the head of the string.  If it's not found within some
>reasonable range, return -1.  I hope my explanation is not too confusing.

>I can't make things work reliably.  I need something like the PosEx function
>that will find a substring in a byte array.  I'm having difficulty in using
>and mixing pointer types for this purpose being a recent convert from C to
>Delphi.

>Any help?

>Thanks
>Bob

Re:finding a text string in an array of bytes (binary array)


Does Delphi add padding in the case of  BlockRead(x bytes) and
BlockWrite(those same x bytes)?  I don't think I have to worry about padding
and packed arrays since I'm peeking and poking things into specific
locations in binary byte buffers.  I define pointers of the right type to
specific locations in the byte array buffer.

I'm blockreading 256 bytes (the header) into an array [1..64] of integer.
With that array, I'm retrieving some integer values with a normal
int:=array[] type statement.  The text string is supposed to be at position
array[44].  Right now I'm

charptr:= Pchar(@array[44]);
string:= copy(charptr,1,62) // to copy 62 characters out to the string
TStringList.CommaText:= string // to parse the string into parts
then TStringList[4] and [6] are tested for valid values and, if valid,  I
continue doing what I have to do with the other strings in the list.
(converting text DDMM.mmmm strings to decimal degrees, converting
undelimited hhmmss.ss strings to TDateTime, converting undelimited mmddyy
string to TDateTime).  This all works when the string is in the right
location, otherwise I just ignore the string transcription and value poking.
Then...

I poke all the good header values and converted string stuff if any and
time/date parts into a new 240 byte binary header

blockwrite that 240 byte header
after that I BlockRead the data part and blockwrite the data part (no
translation needed)

Then go get the next record.

Right now, this all works and I just give bogus header info in the string is
not found at array[44].  (19 of the 204) records in the sample data file.
At least 5-6 of those records have valid data but the string is further down
in the record some 10-15 bytes. and I need to search for it beginning at
@array[44] and retrieve that string if available.

The responses here have provided pointers to making this work.

Thanks
Bob

Quote
"J French" <[email protected]_.bin> wrote in message

news:[email protected]
Quote
> Bruce's point about 'packed' is almost certainly correct

> I had nightmares over that one.

> I suggest that you map out the Record as it is *supposed* to be
> defined

> When it comes to the Comma Separated string, define it as an array of
> Char, that way you can convert it into a String with negligable
> overhead - example below.

> Strings are much easier to work with
> - personally I would stay away from TStringList.CommaText

> type TRec = packed record
>    header: array[1..256] of Char;
>    data: array[1..10000] of single;
> end;
> {$R *.DFM}

> procedure TForm1.Button1Click(Sender: TObject);
> Var
>   S :String;
>   Rec :TRec;
> begin
>   FillChar( Rec.Header, SizeOf(Rec.Header), Ord('x'));
>   S := Rec.Header ;
>   ShowMessage( S ) ;

> end;

> On Tue, 1 Oct 2002 18:43:35 -0500, "Bob Ballard"
> <[email protected]> wrote:

> >I am converting some binary files from one format to another.  The input
> >file is a file of binary records of the type:

> >type x = record
> >   header: array[1..256] of byte;
> >   data: array[1..10000] of single;
> >end

> >The header is a binary array of 4 byte integers and 2 byte integers but
also
> >includes at a certain byte position a comma separated string of
information
> >that is 62 bytes long in which there should be some differences in that
> >string of data from record to record and I have to parse that each time.
> >We'll it seems that the harware did not do a good job of writing that
string
> >to the header.  I's where it's supposed to be most of the time but not
> >always.  It may move around a little or even sometimes the first of the
> >string is missed and only partial data is written to the header.  It
happens
> >just often enough to crash the program when translating information that
is
> >expected in fields in that csv string.  If the data location didn't move
> >around a little then I could just test a value and do what's appropriate
but
> >I want to find it and parse it if it moves around and is a complete
string.

> >As a result, I need to search for the head (first 5 characters never
> >changes) of the wanted string, beginning a a given byte position,
allowing
> >for the index position to move 10-20 bytes or so and retrieve the index
into
> >the array for the head of the string.  If it's not found within some
> >reasonable range, return -1.  I hope my explanation is not too confusing.

> >I can't make things work reliably.  I need something like the PosEx
function
> >that will find a substring in a byte array.  I'm having difficulty in
using
> >and mixing pointer types for this purpose being a recent convert from C
to
> >Delphi.

> >Any help?

> >Thanks
> >Bob

Re:finding a text string in an array of bytes (binary array)


Quote
"Bob Ballard" <[email protected]> wrote in message

news:[email protected]

Quote
> Does Delphi add padding in the case of  BlockRead(x bytes) and
> BlockWrite(those same x bytes)?  I don't think I have to worry about
padding
> and packed arrays since I'm peeking and poking things into specific
> locations in binary byte buffers.  I define pointers of the right type to
> specific locations in the byte array buffer.

BlockRead and BlockWrite have nothing to do with any padding that Delphi
might add to array and record structures. AFAIK Delphi doesn't add any
padding to arrays themselves, but it certainly does add padding, unless told
otherwise, to record structures. Your original post showed a Record
declaration, if you are in fact using this declaration then padding may well
be an issue.

Re:finding a text string in an array of bytes (binary array)


On Thu, 3 Oct 2002 01:19:09 -0400, "Bruce Roberts"

Quote
<[email protected]> wrote:

<snip>

Quote
>BlockRead and BlockWrite have nothing to do with any padding that Delphi
>might add to array and record structures. AFAIK Delphi doesn't add any
>padding to arrays themselves, but it certainly does add padding, unless told
>otherwise, to record structures. Your original post showed a Record
>declaration, if you are in fact using this declaration then padding may well
>be an issue.

Understated - it is damn near certain that 'padding' is responsible
for your problems.

BTW 'padding' is 4 byte alignment of variables
One easy way to show it is :  
     ShowMessage( IntToStr( SizeOf( MyRec ) ) ) ;

Other Threads