Board index » cppbuilder » all BCB6 apps: memory corruption in hyperthreading / dualCPU machines

all BCB6 apps: memory corruption in hyperthreading / dualCPU machines


2005-10-25 09:03:13 PM
cppbuilder17
Hi all,
I've observed that some memory corruption occurs for all BCB built
multithreaded applications on hyperthreading / dual CPU machines. I have an
application that runs perfectly fine on Banias / Celeron but fails on P4 HT.
If I disable HT, it passes. The same application also fails on dualCPU
machines (servers).
I come to this conclusion through some simple tests:
declare a static global variable and assign a value to it.
at the end of the test application, read it back to determine its
correctness.
Moving around the static global variable does affect the end value it holds.
In some places, it remains the same value, while others will have it
corrupted as soon as the second thread starts execution.
 
 

Re:all BCB6 apps: memory corruption in hyperthreading / dualCPU machines

Zach Saw wrote:
Quote
Hi all,

I've observed that some memory corruption occurs for all BCB built
multithreaded applications on hyperthreading / dual CPU machines. I have an
application that runs perfectly fine on Banias / Celeron but fails on P4 HT.
If I disable HT, it passes. The same application also fails on dualCPU
machines (servers).

I come to this conclusion through some simple tests:

declare a static global variable and assign a value to it.
at the end of the test application, read it back to determine its
correctness.

Moving around the static global variable does affect the end value it holds.
In some places, it remains the same value, while others will have it
corrupted as soon as the second thread starts execution.
Are you serializing access to the global variable, with a
CriticalSection or similar? If not, and multiple threads may write to
the variable, I would expect it to become corrupted.
Tom
 

Re:all BCB6 apps: memory corruption in hyperthreading / dualCPU machines

I'm not writing to the global variable except in the GUI (TForm c'tor)...
I use it as a litmus test for memory corruption... the threads aren't
writing or reading from that static global...
"Tom Widmer" < XXXX@XXXXX.COM >wrote in message
Quote
Zach Saw wrote:
>Hi all,
>
>I've observed that some memory corruption occurs for all BCB built
>multithreaded applications on hyperthreading / dual CPU machines. I have
>an application that runs perfectly fine on Banias / Celeron but fails on
>P4 HT. If I disable HT, it passes. The same application also fails on
>dualCPU machines (servers).
>
>I come to this conclusion through some simple tests:
>
>declare a static global variable and assign a value to it.
>at the end of the test application, read it back to determine its
>correctness.
>
>Moving around the static global variable does affect the end value it
>holds. In some places, it remains the same value, while others will have
>it corrupted as soon as the second thread starts execution.

Are you serializing access to the global variable, with a CriticalSection
or similar? If not, and multiple threads may write to the variable, I
would expect it to become corrupted.

Tom
 

{smallsort}

Re:all BCB6 apps: memory corruption in hyperthreading / dualCPU machines

Zach Saw wrote:
Quote
I'm not writing to the global variable except in the GUI (TForm c'tor)...

I use it as a litmus test for memory corruption... the threads aren't
writing or reading from that static global...
What's your test case? How are you accessing the variable to read its
value to know it's corrupted? If you are using AnsiString, are you
ensuring that instances aren't shared between threads by calling
Unique() inside a critical section? Do you have any static local
variables (which aren't thread safe)?
Tom
 

Re:all BCB6 apps: memory corruption in hyperthreading / dualCPU machines

the test case is simple... it's a TDataSet descendant that simply returns 5
records (hardcoded 5 records).
with that, I create 8 threads, and run DataSet->Open() and DataSet->Close()
over and over again.
I have a static global -- static int __global_x;
in my TForm constructor (TForm is an empty form),
__global_x = 0xA5A5A5A5;
Button1Click:
create and run the 8 threads.
And no, the 8 threads don't interact with one another.
The failure is only observable on HT / dualCPU machines -- 100%
reproducable. Enabling CodeGuard will end up in "EOutOfMemory" exception.
on a single CPU machine, it never fails.
Running the app on single CPU machine -- Memproof reports NO ACCESS OVERRUN.
Running the SAME app on multiCPU machine -- Memproof reports over 4000
ACCESS OVERRUN!
"Tom Widmer" < XXXX@XXXXX.COM >wrote in message
Quote
Zach Saw wrote:
>I'm not writing to the global variable except in the GUI (TForm c'tor)...
>
>I use it as a litmus test for memory corruption... the threads aren't
>writing or reading from that static global...

What's your test case? How are you accessing the variable to read its
value to know it's corrupted? If you are using AnsiString, are you
ensuring that instances aren't shared between threads by calling Unique()
inside a critical section? Do you have any static local variables (which
aren't thread safe)?

Tom
 

Re:all BCB6 apps: memory corruption in hyperthreading / dualCPU machines

"Zach Saw" < XXXX@XXXXX.COM >wrote:
Quote
I have a static global -- static int __global_x;
Are you aware that the system is allowed to do what it feels like with
that variable?
Identifiers with double underscores in them are reserved for the
implementation's own magic names. If you happen to manage to duplicate
one of the magic names, you will get strange results. Very strange
results. Possibly as strange as you are reporting.
In general - never define anything with a leading underscore, or
containing a double underscore.
(The first part has a few exceptions, but the simpler the rule, the
better.)
Alan Bellingham
--
ACCU Conference 2006 - 19-22 April, Randolph Hotel, Oxford, UK
 

Re:all BCB6 apps: memory corruption in hyperthreading / dualCPU machines

Alan,
Actually the global var I declared was somevar, but i thought it looked like
so-me-var... so I changed it over here to read __global_x :)
so that's got nothing to do with the failures I'm seeing...
"Alan Bellingham" < XXXX@XXXXX.COM >wrote in message
Quote
"Zach Saw" < XXXX@XXXXX.COM >wrote:

>I have a static global -- static int __global_x;

Are you aware that the system is allowed to do what it feels like with
that variable?

Identifiers with double underscores in them are reserved for the
implementation's own magic names. If you happen to manage to duplicate
one of the magic names, you will get strange results. Very strange
results. Possibly as strange as you are reporting.

In general - never define anything with a leading underscore, or
containing a double underscore.

(The first part has a few exceptions, but the simpler the rule, the
better.)

Alan Bellingham
--
ACCU Conference 2006 - 19-22 April, Randolph Hotel, Oxford, UK
 

Re:all BCB6 apps: memory corruption in hyperthreading / dualCPU machines

"Zach Saw" < XXXX@XXXXX.COM >wrote:
Quote
Actually the global var I declared was somevar, but i thought it looked like
so-me-var... so I changed it over here to read __global_x :)

so that's got nothing to do with the failures I'm seeing...
Ah, just a thought.
Unfortunately, I can think of too many different possibilities that
would tip up only on a multi-processor multi-threaded program. They're
all to do with not protecting a shared memory location with a mutex.
Whether it's your code or a third party's is anybody's guess.
Alan Bellingham
--
ACCU Conference 2006 - 19-22 April, Randolph Hotel, Oxford, UK
 

Re:all BCB6 apps: memory corruption in hyperthreading / dualCPU machines

Zach Saw wrote:
Quote
the test case is simple... it's a TDataSet descendant that simply returns 5
records (hardcoded 5 records).

with that, I create 8 threads, and run DataSet->Open() and DataSet->Close()
over and over again.
Are you calling those methods in each of the 8 threads? Or what are the
threads doing? Are you calling Open and Close on the same dataset? Or
does each thread have its own TDataSet object? Obviously, calling Open
or Close on the same object from multiple threads without serializing
the calls could lead to the problem you are seeing.
Quote
I have a static global -- static int __global_x;

in my TForm constructor (TForm is an empty form),

__global_x = 0xA5A5A5A5;

Button1Click:

create and run the 8 threads.

And no, the 8 threads don't interact with one another.
How about via the DataSet object?
Quote
The failure is only observable on HT / dualCPU machines -- 100%
reproducable. Enabling CodeGuard will end up in "EOutOfMemory" exception.

on a single CPU machine, it never fails.

Running the app on single CPU machine -- Memproof reports NO ACCESS OVERRUN.

Running the SAME app on multiCPU machine -- Memproof reports over 4000
ACCESS OVERRUN!
Those symptoms could just as likely be down to incorrect multithreaded
code as a compiler bug. A lot of multithreading bugs only show up with
multi CPU machines, and some only on Itanium and Power PC multi CPU
machines.
Tom
 

Re:all BCB6 apps: memory corruption in hyperthreading / dualCPU machines

Zach Saw wrote:
Quote
Moving around the static global variable does affect the end value it holds.
In some places, it remains the same value, while others will have it
corrupted as soon as the second thread starts execution.


Create a data breakpoint for the write access to the variable. When
someone attempts a write operation to that location, the de{*word*81} will
stop, so you could see where the access occurs. I suspect this is a
byte-wide access, but could be word a dword.
hth,
.a
 

Re:all BCB6 apps: memory corruption in hyperthreading / dualCPU machines

Quote

How about via the DataSet object?
As I've said, they're independent.
Quote
Those symptoms could just as likely be down to incorrect multithreaded
code as a compiler bug. A lot of multithreading bugs only show up with
multi CPU machines, and some only on Itanium and Power PC multi CPU
machines.
Well, Memproof shows totally no access overrun when the same piece of code
is run on a single CPU machine.
4000 on a multiCPU machine? This doesn't look like a compiler bug. I've
found 2 in one month already in that regard, but frankly, this isn't one of
them. I bet this has something to do with VCL.
 

Re:all BCB6 apps: memory corruption in hyperthreading / dualCPU machines

My guess would be VCL.
Memproof shows access overrun in almost everywhere I have AnsiString. But I
can't do without AnsiString since it's a TDataSet.
Great. Now I'll just tell my clients -- sorry, your server will continue to
lock up and there's nothing you can do about it, unless of course, you pluck
out one of the CPUs!
Great job Borland.
"Alan Bellingham" < XXXX@XXXXX.COM >wrote in message
Quote
"Zach Saw" < XXXX@XXXXX.COM >wrote:

>Actually the global var I declared was somevar, but i thought it looked
>like
>so-me-var... so I changed it over here to read __global_x :)
>
>so that's got nothing to do with the failures I'm seeing...

Ah, just a thought.

Unfortunately, I can think of too many different possibilities that
would tip up only on a multi-processor multi-threaded program. They're
all to do with not protecting a shared memory location with a mutex.
Whether it's your code or a third party's is anybody's guess.

Alan Bellingham
--
ACCU Conference 2006 - 19-22 April, Randolph Hotel, Oxford, UK
 

Re:all BCB6 apps: memory corruption in hyperthreading / dualCPU machines

Zach Saw wrote:
Quote
Memproof shows access overrun in almost everywhere I have AnsiString.
But I can't do without AnsiString since it's a TDataSet.
Are you sharing you AnsiStrings across threads ? If you are using separate
AnsiStrings for every thread, and never copying/sharing/accessing them from
other threads including the main thread, then no problems should occur.
And, can you post a simple project that will do this ?
Jonathan
 

Re:all BCB6 apps: memory corruption in hyperthreading / dualCPU machines

Quote

Are you sharing you AnsiStrings across threads ? If you are using separate
AnsiStrings for every thread, and never copying/sharing/accessing them
from other threads including the main thread, then no problems should
occur.
nope. like i've said before, the threads are completely unaware of the rest.
Quote

And, can you post a simple project that will do this ?

I'll try to get rid of the unnecessary stuff first... as of now, there are a
few things that links to our internal library. I'll try to do a simple app
that showcases the same corruption first... will get it posted in 24 hours
(hopefully) so everyone can take a look.
Let me test out a few hypothesis first. I've got AnsiString in mind, so I'll
put those to a multithreading test first and see if it exhibits the same
memory corruption.
 

Re:all BCB6 apps: memory corruption in hyperthreading / dualCPU machines

Zach Saw wrote:
Quote
nope. like i've said before, the threads are completely unaware of
the rest.
I seems to be impossible for threading to cause problems with AnsiStrings
as long as the strings are never access by different threads.
Quote
I'll try to get rid of the unnecessary stuff first... as of now,
there are a few things that links to our internal library. I'll try
to do a simple app that showcases the same corruption first... will
get it posted in 24 hours (hopefully) so everyone can take a look.
Good, thank you.
Jonathan