Board index » delphi » Re: How to handle gz files fetched with IdHTTP?

Re: How to handle gz files fetched with IdHTTP?


2008-06-05 01:17:45 PM
delphi110
Bo Berglund writes:
Quote
On Wed, 04 Jun 2008 23:18:23 +0200, Stig Johansen <XXXX@XXXXX.COM>
writes:

>Bo Berglund writes:
>>Well, that might be good to know if one is to go deep into the
>>compression mystery, but all I want to do is to use an IdHTTP
>>component to retrieve the name.xml.gz files and have them converted to
>>xml so I can then process the contents...
>
>There is not much of mystery in there.
>The tGZipFirstHeader definition is the _fixed_ part of a gzip file as
>defined in the previous mentioned rfc.
>So i read the fixed part, an afterwards the variable part, and _then_ the
>compressed data block into "compressedstr" (a string in my example)
>It should be the same whether data comes from a file or IdHTTP. (I use
>synapse, so i don't know idHTTP).

But I have a single downloaded file on my disk so there are no
separate parts to "load". I just want to subject the *file* to the
decormpression and all I get is a "data error".
I also googled a discussion about just this where they talked about
setting some values in the constructor of the object like this:
I am not clear enough, i can see.
If i rewind my memory, we had this strange browser (IE) wich only supported
deflate, and not gzip.
On the other hand we had the other browsers, which supported gzip, but not
deflate.
I needed to support both, so thats where my jurney started.
I discovered that deflate was 'just' the compressed data without the
headers.
And gzip was just the compressed/deflated data with header and checksum.
So you can look at gzip roughly as:
gzip := header + compressed_data + trailer.
So when you do this (from your other post):
  Res?= IdHTTP1.Get('tv.swedb.se/xmltv/channels.xml.gz');
Your Res contains the above construction.
What i am talking about is basically to exctract the compressed_data part
from the file/string/stream.
Quote
TZDecompressionStream.Create(TFileStream.Create(gzfile,
fmOpenRead),12+32);
It looks like some hardcoded sizes of the header data.
The problem is that the header can contain variable parts, so you can't be
sure where exactly the compressed_part starts in the file/string/stream.
Quote

ZReceiveFromBrowser does not exist in any of my two downloaded copies
of ZlibEx.pas...

Thinking about it, it may very well be my own addition to implement
decompression of deflate, but here it is:
.....
function ZReceiveFromBrowser(const s: string) : string;
var outBuf: Pointer; outBytes: Integer;
begin
ZDecompress2(pointer(s), length(s), outBuf, outBytes, 0);
SetLength(result, outBytes);
Move(pointer(outBuf)^, pointer(result)^, outBytes);
FreeMem(outBuf);
end;
.....
Quote
>My things are from about 2002, but i think i had a hard time back then,
>figuring things out.

Seems like the same is true 6 years later...
Yes but i downloaded this from your other post:
<tv.swedb.se/xmltv/channels.xml.gz>
And it decompresses fine with my programs as mentioned, so the data is
correct.
This is still on Linux which uses native zlib library.
--
Best regards
Stig Johansen
 
 

Re: How to handle gz files fetched with IdHTTP?

On Wed, 4 Jun 2008 18:12:37 -0700, "Remy Lebeau \(TeamB\)"
<XXXX@XXXXX.COM>writes:
Quote

"Bo Berglund" <XXXX@XXXXX.COM>writes
news:XXXX@XXXXX.COM...

>The actual URL for my test file (there are others from the same source)
>is:
>tv.swedb.se/xmltv/channels.xml.gz

Even though the URL looks like a .gz file, the server is actually delivering
a plain .xml file that is gzip-compressed dynamically as it is being
downloaded. So TIdHTTP in Indy 10 can decompress the file automatically for
you. Simply assign a TIdCompressorZLib component to the TIdHTTP.Compressor
property before calling TIdHTTP.Get().

This sounds like *exactly* what I want to do, but then in BDS2006 I
got into trouble immediately...
1) There is no component named TIdCompressorZLib
2) I found one named TIdCompressorZLibEx so I put this on my form
3) Now I get a bunch of errors:
[Pascal Error] IdZLib.pas(1224): E1026 File not found: 'adler32.obj'
[Pascal Error] IdZLib.pas(1225): E1026 File not found: 'compress.obj'
[Pascal Error] IdZLib.pas(1226): E1026 File not found: 'crc32.obj'
[Pascal Error] IdZLib.pas(1227): E1026 File not found: 'deflate.obj'
[Pascal Error] IdZLib.pas(1228): E1026 File not found: 'infback.obj'
[Pascal Error] IdZLib.pas(1229): E1026 File not found: 'inffast.obj'
[Pascal Error] IdZLib.pas(1230): E1026 File not found: 'inflate.obj'
[Pascal Error] IdZLib.pas(1231): E1026 File not found: 'inftrees.obj'
[Pascal Error] IdZLib.pas(1232): E1026 File not found: 'trees.obj'
[Pascal Error] IdZLib.pas(1233): E1026 File not found: 'uncompr.obj'
[Pascal Error] IdZLib.pas(1234): E1026 File not found: 'zutil.obj'
These files are mentioned in IdZlib in comments looking like this:
{$L adler32.obj}
{$L compress.obj}
{$L crc32.obj}
{$L deflate.obj}
{$L infback.obj}
{$L inffast.obj}
{$L inflate.obj}
{$L inftrees.obj}
{$L trees.obj}
{$L uncompr.obj}
{$L zutil.obj}
I made a search from the top of my BDS2006 installation for
adler32.obj but it is not there..
So what do I do now? My Indy is what came with BDS2006 (10.1.5).
/BoB
 

Re: How to handle gz files fetched with IdHTTP?

On Wed, 4 Jun 2008 18:14:03 -0700, "Remy Lebeau \(TeamB\)"
<XXXX@XXXXX.COM>writes:
Quote

"Bo Berglund" <XXXX@XXXXX.COM>writes
news:XXXX@XXXXX.COM...

Yes, you will have to do that if you do not allow Indy to decompress the
data for you automatically.

>This is done with Delphi7 where I have only Indy 9.0.10 installed...

You will have to upgrade to Indy 10.2.3 if you want to take advantage of
TIdHTTP's automated decompressing feature.

Then we have the problem of upgrading Indy in BDS2006 again (there are
other threads about this). I have 10.1.5 as delivered with BDS2006....
Exactly how is this upgrade done?
/BoB
 

Re: How to handle gz files fetched with IdHTTP?

Bo Berglund writes:
Quote
These files are mentioned in IdZlib in comments looking like this:

{$L adler32.obj}
.....
Actually it is a directive to link in the mentioned file(s).
Quote
I made a search from the top of my BDS2006 installation for
adler32.obj but it is not there..

So what do I do now? My Indy is what came with BDS2006 (10.1.5).
I know there are various places where you can download the set of .obj
files.
I have tried to Google, but i can not remember what to Google with.
I don't remember where i got them from, perhaps it was on my Delphi 7 CD.
--
Best regards
Stig Johansen
 

Re: How to handle gz files fetched with IdHTTP?

On Thu, 05 Jun 2008 07:17:45 +0200, Stig Johansen <XXXX@XXXXX.COM>
writes:
Quote
I am not clear enough, i can see.
If i rewind my memory, we had this strange browser (IE) wich only supported
deflate, and not gzip.
On the other hand we had the other browsers, which supported gzip, but not
deflate.

I needed to support both, so thats where my jurney started.

I discovered that deflate was 'just' the compressed data without the
headers.

And gzip was just the compressed/deflated data with header and checksum.

So you can look at gzip roughly as:
gzip := header + compressed_data + trailer.

So when you do this (from your other post):
  Res?= IdHTTP1.Get('tv.swedb.se/xmltv/channels.xml.gz');
Your Res contains the above construction.

What i am talking about is basically to exctract the compressed_data part
from the file/string/stream.

>TZDecompressionStream.Create(TFileStream.Create(gzfile,
>fmOpenRead),12+32);
Again:
However there is no overloaded constructor that accepts a second
parameter in my versions of ZLibEx.pas (I have downloaded both that
are available, 1.1.4 and 1.2.3).
So how can I enter the 12+32 parameter????
Quote
It looks like some hardcoded sizes of the header data.
The problem is that the header can contain variable parts, so you can't be
sure where exactly the compressed_part starts in the file/string/stream.

>
>ZReceiveFromBrowser does not exist in any of my two downloaded copies
>of ZlibEx.pas...
>

Thinking about it, it may very well be my own addition to implement
decompression of deflate, but here it is:
.....
function ZReceiveFromBrowser(const s: string) : string;
var outBuf: Pointer; outBytes: Integer;
begin
ZDecompress2(pointer(s), length(s), outBuf, outBytes, 0);
SetLength(result, outBytes);
Move(pointer(outBuf)^, pointer(result)^, outBytes);
FreeMem(outBuf);
end;
.....

>>My things are from about 2002, but i think i had a hard time back then,
>>figuring things out.
>
>Seems like the same is true 6 years later...

Yes but i downloaded this from your other post:
<tv.swedb.se/xmltv/channels.xml.gz>

And it decompresses fine with my programs as mentioned, so the data is
correct.
Could you post the *complete* code you used to decompress successfully
the channels.xml.gz file, please?
And since you are on Linux, how can you use Delphi then????
--
Bo Berglund
 

Re: How to handle gz files fetched with IdHTTP?

On Wed, 4 Jun 2008 18:14:03 -0700, "Remy Lebeau \(TeamB\)"
<XXXX@XXXXX.COM>writes:
Quote
>This is done with Delphi7 where I have only Indy 9.0.10 installed...

You will have to upgrade to Indy 10.2.3 if you want to take advantage of
TIdHTTP's automated decompressing feature.

Remy,
thank you very much for your help! Really appreciated! :-)
I have now installed Indy 10 (snapshot) in Delphi7 and tested the
IdHTTP.GET after assigning the compressor to the component.
As you claimed it worked directly!
Now I only have to figure out the best way to switch between Indy 9
and Indy 10 on my Delphi 7 installation so I can maintain the old code.
Might go to using a virtual machine for development and then I could
probably use multiple dev machines depending on if I am doing
maintenance or new dev. We'll see.
Thanks again, now I can concentrate on the main job of downloading all
the data files and extracting the TV EPG for my system. :-)
/BoB
 

Re: How to handle gz files fetched with IdHTTP?

"Stig Johansen" <XXXX@XXXXX.COM>writes
Quote
I needed to support both, so thats where my jurney started.
TIdHTTP in Indy 10 supports both.
Gambit
 

Re: How to handle gz files fetched with IdHTTP?

"Bo Berglund" <XXXX@XXXXX.COM>writes
Quote
Now I get a bunch of errors:
You should only be getting those errors if you are trying to re-compile
Indy. Otherwise, the .obj files are already compiled into the Indy binaries
that Borland shipped.
Quote
My Indy is what came with BDS2006 (10.1.5).
You are using an outdated Indy 10 version. There have been many changed to
Indy 10 since the 10.1.5 release. The current version is 10.2.3 now.
Gambit
 

Re: How to handle gz files fetched with IdHTTP?

Bo Berglund writes:
Quote
On Thu, 05 Jun 2008 07:17:45 +0200, Stig Johansen <XXXX@XXXXX.COM>
writes:
Could you post the *complete* code you used to decompress successfully
the channels.xml.gz file, please?
<danish>
Puha , Bo, Grundlovsdag herover p?Sjælland, lidt rigeligt med øl, hvidvin,
rødvin, sm?gr? hårrødderne vender den forkerte vej m.v. men:
</danish>
I can see you have gotten Indy to work, but it *is* more or less the
complete code i have posted.
As mentioned, this piece of code decompresses a file, and not a buffer from
a HTTP request, but here is more code extraction:
I := FileRead(fileHandle,GZipFirstHeader,SIZEOF(GZipFirstHeader));
IF ( GZipFirstHeader.ID1 = 31 ) AND (GZipFirstHeader.ID2 = 139 ) THEN
BEGIN
IsGzip := TRUE ;
ModifiedTime := GZipFirstHeader.MTIME / 86400 {$IFDEF LINUX} + 25567
......
IF IsGzip AND NOT ReturnZip THEN BEGIN
PosStart := I ;
IF ( GZipFirstHeader.FLG AND FNAME ) <>0 THEN BEGIN
I := FileRead(fileHandle,FileNameBuffer,255);
INC(PosStart,StrLen(FileNameBuffer) + 1); // +1 FOR 0
TERMINATION
FileSeek(woprfileHandle,PosStart,0 );
END ;
I := Fs - 8 - PosStart ; // FS is the File size.
SetLength(compressedstr,I) ;
I := FileRead(fileHandle,compressedstr[1],I) ;
compressedstr := ZReceiveFromBrowser(compressedstr) ;
......
Result :=
QueueBuffer(Pchar(compressedstr),Length(compressedstr),FALSE)
......
QueueBuffer is a routine to Queue data back to the browser, and
ZReceiveFromBrowser is the previously posted code added to ZlibEX.pas.
Quote
And since you are on Linux, how can you use Delphi then????
Kylix, which was bundled to my version of D7.
Develop, test, debug with D7, (re)compile/deploy on Linux.
I had various attempts to checkout Linux. Back in 2002, it worked on Linux
as well, but compared to Windows, the speed was more or less the same, so
no benefit from Linux.
Then somewhere around 2004-2005, there was too much inconsitency (aka
missing compatibility) in Linux, so i didn't even bother to try again.
So now it's, lets call it 3. attempt to evaluate the matureness of Linux
(Only serverside).
The key issue here(to me) is NPTL Threads, which seems to be (very) much
faster than Windows.
I have built a little server with an old PII 200Mhz, 96MB RAM, and i can
easily handle 300+ simultaneous requests on that little beast.
Ok, not _entirely_ true, since i have raised the Maxconnections from the
default 32 to 128 in the webbroker part.
That issue is on my internal 'to do list', since there should be no such low
limit, but i have to investigate the impact on memory usage etc.
--
Best regards
Stig Johansen