Board index » delphi » Re: Fastcode MM memory usage

Re: Fastcode MM memory usage


2005-06-08 07:07:09 AM
delphi76
Hi Pierre,
I did have one memory leak that was causing some of the trouble. After
I fixed that the memory was freed properly at the end, so the usage
got close to what the RTL was using (2-3MB overhead). The leak itself
was only 60 80-byte objects, so I was pretty surprised that it looked
like it was using 50MB.
Anyway, I am still seeing a problem, but I from what I can tell it
doesn't show up in the B&V. The "MTicks" count is time taken and "Mem"
is the number of bytes used, right? So lower would be better in both
cases? I have two application runs, one of which had both FastMM4 and
BucketMem taking more memory than the RTL at peak (55MB vs 69MB) and a
second one that took longer using FastMM4 than the RTL or BucketMem
(0.80 seconds vs. 1.50 seconds). If I run the log using the B&V FastMM4
takes less time in both cases and has similar or better mem scores.
Is there anything else I can do to help you check this remotely? The
program itself is Beyond Compare, so I can not share the source, but I
might be able to put together something using run-time packages if it
would help.
Regards,
Craig
Pierre le Riche writes:
Quote
Hi Craig,


>The replay log for the exact situation I described is 232MB, and 7-zip got
>it down to 20MB. Is that small enough or should I generate a smaller one?


Have you run the replay inside the Fastcode MM B&V? Do you get the same
result?

Regards,
Pierre


 
 

Re: Fastcode MM memory usage

Quote
Is there anything else I can do to help you check this remotely? The
program itself is Beyond Compare, so I can not share the source, but I might
be able to put together something using run-time packages if it would
help.
Off topic, but thank you for Beyond Compare it has saved me lots of time!
DD
 

Re: Fastcode MM memory usage

Hi Leonel,
Quote
Not the IDE specifically, but "if DebugHook <>0 then" is a pretty good
way to check if it is running under the de{*word*81}.
Thanks. I will look into it.
Is there any failsafe way that you know of to detect whether Delphi is
currently running on the PC, whether it is debugging the current process or
not?
Regards,
Pierre
 

Re: Fastcode MM memory usage

Quote
Good suggestion. I have implemented this as you suggested by extending the
options as follows:
Can we download this update somewhere ?
Pierre
--
Pierre Y.
levosgien.net - cerbermail.com/?7dwZGWwOB0
(Cliquez sur le lien ci-dessus pour me contacter en priv?
Capitaine anglais : "Vous vous battez pour l'argent, nous on se bat
pour l'honneur !"
Robert Surcouf : "Vous avez raison, Monsieur, chacun de nous combat
pour ce qui lui manque."
 

Re: Fastcode MM memory usage

Hi Craig,
Quote
I did have one memory leak that was causing some of the trouble. After I
fixed that the memory was freed properly at the end, so the usage got
close to what the RTL was using (2-3MB overhead). The leak itself was
only 60 80-byte objects, so I was pretty surprised that it looked like it
was using 50MB.
Most of the newer MMs grab memory in 1MB+ chunks from the operating system,
and if there is a leak inside a given 1MB+ chunk then that chunk can never
be freed. Those 60 leaks were probably evenly spread across numerous 1MB
chunks. However, the space around the leak can still be used, so FastMM
shouldn't run out of address space any faster than the RTL MM if the
application leaks memory. As long as the *peak* memory usage is comparable
to that of the RTL MM then I am not worried.
Quote
Anyway, I am still seeing a problem, but I from what I can tell it doesn't
show up in the B&V. The "MTicks" count is time taken and "Mem" is the
number of bytes used, right? So lower would be better in both
Correct, lower is better.
Quote
cases? I have two application runs, one of which had both FastMM4 and
BucketMem taking more memory than the RTL at peak (55MB vs 69MB) and a
It's very difficult to measure the memory usage using task manager. The
usage numbers given by the B&V replay tool is the actual address space used
by the process and is IMO the best measure. You should also remember that
not all memory is allocated through the Delphi memory manager. If you're
using COM, ADO or third party DLLs there's a good chance that your program
is allocating memory using the Windows Heap as well. The usage recording
tool doesn't record these allocations, so playing the replay inside the B&V
tool will exclude all these other factors and give you a clear indication of
the memory consumption of the memory manager you're using.
Quote
second one that took longer using FastMM4 than the RTL or BucketMem (0.80
seconds vs. 1.50 seconds). If I run the log using the B&V FastMM4 takes
less time in both cases and has similar or better mem scores.
I don't know how to respond to this other than to say it shouldn't be,
unless I have got some {*word*193} bug somewhere that I don't know about. If
BucketMM turned out slightly faster in some tests I'd believe it, but
not the RTL MM - that thing is just too darn slow! Is this benchmark
repeatable, with FastMM4 taking longer every time? Please make sure that
there's no other influences (like disk accesses, etc.) affecting the
benchmark and if you could have a longer run it would also be more accurate
(0.8 seconds is a bit short.). Are you using the FastMove library? If you
are, please remember to disable the "UseFasterMoveRoutines" define in the
FastMM4 source.
Quote
Is there anything else I can do to help you check this remotely? The
program itself is Beyond Compare, so I can not share the source, but I might
be able to put together something using run-time packages if it would
help.
If you could send me a shorter replay file that would be great. This may
give me some insight as to what is going on.
Regards,
Pierre
 

Re: Fastcode MM memory usage

Hi Pierre,
Quote
Can we download this update somewhere ?
There are a few other small changes I want to make before I post it. Should
be up on SourceForge later today.
Regards,
Pierre
 

Re: Fastcode MM memory usage

Pierre le Riche writes:
Quote
Is there any failsafe way that you know of to detect whether Delphi
is currently running on the PC, whether it is debugging the current
process or not?
Failsafe, I don't think so. The best one I know is the approach you
were trying, but it can be fooled or fail, as you can see.
--
Leonel
 

Re: Fastcode MM memory usage

Hi Pierre,
Quote
Can we download this update somewhere ?
4.01 uploaded to sourceforge. Fixes memory leak checking not working under
Delphi 5 and adds the feature you suggested.
Regards,
Pierre
 

Re: Fastcode MM memory usage

Pierre le Riche a écrit :
Quote
Hi Pierre,

>Can we download this update somewhere ?

4.01 uploaded to sourceforge. Fixes memory leak checking not working under
Delphi 5 and adds the feature you suggested.
Thank you very much, I will test it as soon as I will be able to
recompile my app using D2005. (Tons of components to reinstall...
{*word*30}ing stupid job ;-)))
Pierre
--
Pierre Y.
levosgien.net - cerbermail.com/?7dwZGWwOB0
(Cliquez sur le lien ci-dessus pour me contacter en priv?
Capitaine anglais : "Vous vous battez pour l'argent, nous on se bat
pour l'honneur !"
Robert Surcouf : "Vous avez raison, Monsieur, chacun de nous combat
pour ce qui lui manque."
 

Re: Fastcode MM memory usage

Hi Pierre,
I've put together a couple of packages that you can use for testing.
They're on our website at:
www.scootersoftware.com/BeyComp_MMTest.7z (3.12MB)
www.scootersoftware.com/BeyComp_MMUsage.7z (16.2MB)
Here's what's in the first package :
1. Copies of Beyond Compare compiled with the RTL memory manager,
FastMM3, FastMM4, BucketMem_ASM, and MMUsageLogger. I did disable the
leak detection in FastMM4 because Indy has an intentional leak, but I
ran leak tests in AQTime to make sure there wasn't anything else.
2. Copies of some of the source from the Jedi Code Library so you can
repeat my tests (v1.22 and v1.95).
3. A script and batch file that will compare the JCL directories and
generate a HTML report of the output. Just use "run.bat" to run it, and
change the exe name in it to switch between exes.
4. Copies of BC's CHM files, which is what is what demonstrates the most
repeatable case I have found where the RTL is noticeably faster than FastMM4.
The second package has the MMUsage file from the report generation I
mentioned above. It expands to 175MB. You should be able to generate a
similar one using the first package, so this is mostly if you want to
compare them. In case it matters my system is a dual Xeon 2.4 Ghz
running Windows 2000.
Quote
As long as the *peak* memory usage is comparable
to that of the RTL MM then I am not worried.
The peak memory usage is *not* comparable. On my system, if I run the
report generation from the package above the RTL uses 45MB peak and
FastMM4 uses 55MB peak. If I use our own source code the difference is
55MB to 70MB.
Quote
It's very difficult to measure the memory usage using task manager. The
usage numbers given by the B&V replay tool is the actual address space used
by the process and is IMO the best measure.
I'm having trouble following here. The memory usage reported in perfmon
and the task manager is what's actually allocated, right? So it is a
different value than that reported by the B&V, but it seems like it
would be just as valid. We don't really have a lot of trouble with
fragmentation and address space creep, so I am more concerned about
limiting the overall system impact by keeping as little memory allocated
as possible when we aren't using it. It seems like I would want to use the
numbers from the task manager in that case, wouldn't I?
Quote
You should also remember that
not all memory is allocated through the Delphi memory manager. If you're
using COM, ADO or third party DLLs there's a good chance that your program
is allocating memory using the Windows Heap as well.
Are widestrings covered by this? The vast majority of the memory is
used by records and some classes, but it does use a bunch of short-lived
widestrings in the output. Other than that, no, we aren't using
anything other than Delphi code compiled into the application.
Quote
I don't know how to respond to this other than to say it shouldn't be,
unless I have got some {*word*193} bug somewhere that I don't know about. If
BucketMM turned out slightly faster in some tests I'd believe it, but
not the RTL MM - that thing is just too darn slow! Is this benchmark
repeatable, with FastMM4 taking longer every time?
Yes, it is repeatable. I have included the CHM files that demonstrate the
problem. Just drag either BC2.chm+BC2_2.chm or BC2_2.chm+BC2_3.chm onto
one of the application icons and BC will load up the comparison. When
it's done loading the elapsed time is displayed in the status bar. The
first set of files takes 1-3 seconds to compare and the second set takes
8-11 seconds.
Quote
Please make sure that
there's no other influences (like disk accesses, etc.) affecting the
benchmark and if you could have a longer run it would also be more accurate
(0.8 seconds is a bit short.).
The comparison will access the disk, so the timings vary more than I
would like, but the fact remains that the slowest run using the RTL is
faster than the fastest run using FastMM4. Using the BC2.chm+BC2_2.chm
set the RTL completes the comparison in ~1.2 seconds and FastMM4
completes it in a little over 2 seconds. The second set of files didn't
scale as much as I would hoped they would, but they do demonstrate a larger
difference over 8 seconds. If you want a longer run just concatenate
multiple copies of BC2_2.chm and BC2_3.chm to make bigger files.
Quote
Are you using the FastMove library?
No, it didn't have a significant effect the last time I tested it.
BTW, one thing that occurred to me is that I don't know how the replay
represents threads interacting. The sample with the higher memory usage
is mostly a single thread, but the case causing the slowdown would have
at least three threads interacting, and as I said I do have a dual-CPU
system.
I hope that helps,
Regards,
Craig
 

Re: Fastcode MM memory usage

Quote
>It's very difficult to measure the memory usage using task manager. The
>usage numbers given by the B&V replay tool is the actual address space used
>by the process and is IMO the best measure.

I'm having trouble following here. The memory usage reported in perfmon
and the task manager is what's actually allocated, right?
For PerfMon it depends entirely on what you've chosen to monitor. There
is a *long* list of values. The column "Mem Usage" in Task Manager is
called "Working Set" in PerfMon.
Working Set is roughly the amount of RAM currently in use by the
process. For a computer that has plenty of free RAM, this value is
typically very stable. For a computer that does not have enough free
RAM, this value will be constantly changing as pages are first swapped
through the disk cache and then through the hard drive.
In addition, certain events (like minimizing the application) cause the
operating system to "trim the working set". Basically, the operating
system tosses memory pages that haven't been touched in a while.
Quote
So it is a
different value than that reported by the B&V, but it seems like it
would be just as valid.
You have a point. Smaller working set = good thing. A memory manager
that produces a smaller working set, either by design or by luck, places
a smaller burden on the operating system.
Quote
We don't really have a lot of trouble with
fragmentation and address space creep, so I am more concerned about
limiting the overall system impact by keeping as little memory allocated
as possible when we aren't using it. It seems like I would want to use the
numbers from the task manager in that case, wouldn't I?
That makes sense. Just bare in mind that the number is ephemeral. Even
the tiniest change (like showing a dialog box) can dramatically affect
the working set.
- Brian
 

Re: Fastcode MM memory usage

Quote
>You should also remember that
>not all memory is allocated through the Delphi memory manager. If you're
>using COM, ADO or third party DLLs there's a good chance that your program
>is allocating memory using the Windows Heap as well.

Are widestrings covered by this?
WideStrings are handled by the COM memory manager.
- Brian
 

Re: Fastcode MM memory usage

Hi Craig,
Quote
I've put together a couple of packages that you can use for testing.
They're on our website at:
I've downloaded the EXEs from your webpage and did the compare between the
.chm files. I also used the usage recorder EXE to record a replay of the
compare and played it back in the B&V tool.
I saw some very strange things. For one, the usage recorder EXE is
consistently faster than the RTL MM. This just doesn't make sense - the
usage recorder is an extra layer between the application and the RTL MM with
significant extra processing and memory overhead.
Secondly, inside the replay system of the B&V tool FastMM4 is significantly
faster than all the other MMs with the .chm file compare replay. If I run
the exe you uploaded, however, it seems slower. I looked at the allocation
patterns in the replay file and nothing seems out of the ordinary: many
small blocks, a few larger ones and some reallocations - nothing unexpected.
What is even more baffling for me is that even FastMM3 appears slow in the
compare. FastMM3 and BucketMM are almost identical in internal workings, so
I cannot explain this discrepancy.
Clearly something is wrong, but I cannot put my finger on it. The fact that
the usage recorder that piggy-backs on top of the RTL MM is faster than the
RTL MM itself points to some weird things happening here.
If you want me to investigate this further you will have to give me something
that is compileable so I can try various MM settings and run it under a
profiler to try and find the bottleneck. It goes without saying that
anything you send me will be treated confidentially and I will delete it off
my machine when I am done.
Regards,
Pierre
 

Re: Fastcode MM memory usage

Hi Craig,
Replay results of the .chm file compare:
FastMM4: Time = 65, Memory Usage = 29183
BucketMM: Time = 81, Memory Usage = 30247
RTL: Time = 197, Memory Usage = 27135
I don't understand how the replay system can paint one picture and the
measurements report something so radically different.
I would really like to get to the bottom of this. The RTL MM is really
horribly slow with multi-threaded access, so the fact that your application
is multi-threaded and you're running a dual CPU and it STILL appears to
perform very well just goes against common sense. I'd have expected all
replacement memory managers to give you a significant boost over the RTL MM,
but that doesn't seem to be the case.
I've also tested on a single CPU non-hyperthreaded system and I get the same
results, so I am confident it is neither thread-related nor multi-CPU related.
Regards,
Pierre
 

Re: Fastcode MM memory usage

Hi Pierre
It looks to me like compiling with optimization on/off, Range checking
on/off etc. could be the problem.
I have no doubt that RTLMM is the slowest.
Regards
Dennis