Board index » delphi » corruption causes (technical explanations)

corruption causes (technical explanations)

Superb, Andrew.

And it might well be that there's something differentially important in
your comment about "a slight delay."  Some programs might open a table
and =immediately= start hitting it, whereas other programs might open
the tables and not start trying to hit them for a second or more.  They
might also leave the table connections open for a long time.

For instance, Delphi programs and Paradox programs might exhibit very
different usage patterns.

It could be argued that some applications intrinsically present the
server with a scenario where it might juggle back-and-forth between
opportunistic and non-opportunistic behavior.  Paradox databases are
slightly unusual in that they consist of many files and they are opened
and closed quite frequently.

---

I'm slightly misquoted when you say "don't use any peer to peer stuff."
Obviously peer-to-peer one way in which Paradox is used quite frequently
and I've not seen any distinction between peer-to-peer and server-based
systems.

In my experience the presence of corruption-problems is not really
linked to load characteristics, although I rarely encounter systems that
are under {*word*190}ous loads anyway.  The problem usually appears when
"something changed" in the environment.  A system can hum along for
years and then, "something changed."

---

The "file I/O cacheing library" you describe in the last pp. sounds very
interesting.

Quote
>Andrew wrote:

> Replies inline -

> Liz wrote in message <3B126674.FC018...@aros.net>...
> >Interesting.  So, you're saying that WinNT / Win2K servers
> >and peer-to-peer Windows networks do NOT need their
> >oplocking turned off, given all the other stuff you mention
> >is properly configured?

> Ok - there is a big difference between opportunistic locking and
> write caching.

> Opportunistic locking is a (surprisingly for Microsoft) clever
> method to improve performance when record locking is in
> operation.

> When you use record locking it imposes a much bigger overhead
> than file level locking. The OS has to search a list of locks
> for that file whereas for file locking its a yes/no answer - no
> searching involved. If only one connection is talking to a file
> then this searching for record locks is unnecessary. Hence when
> there is only one connection to a file WinNT/2000 actually locks
> the file instead. When a second connection to the file is
> established the OS then has to change to record locking - whilst
> it goes about its housekeeping and updating there is a small
> delay. It would appear for some reason that this delay on some
> systems causes a problem with paradox. Disabling Op-locking is
> not really going to involve a big performance hit.

> >Would you say the same for Linux/Samba?

> Unix (and hence Linux)  doesn't use op-locking. Record locking
> in Unix is actually done at a much higher level in the file
> system.

> Write caching is the process where writes to the disk are queued
> so that the application isn't blocked waiting for each
> individual write to complete. It also allows the OS to group a
> lot of small writes together into one much bigger write which is
> much quicker and reduces disk thrashing. Virtually all Operating
> systems written in the last decade have used write caching -
> even Windows 3.1 (via smartdrv where the default was to write
> cache harddisks - although you could override it to do read
> caching only).

> > This (in regards to Windows) isn't what I've
> >heard from others on the groups, but it may be that this is
> >carry-over advice from before all the necessary fixes were
> >available, or based on the experience of some who didn't
> >have all the other stuff set up properly.  Or something...

> Ok - there are a couple of things here.

> Windows 95B (OSR2 / 2.1 and I think 2.5) had a fatal bug in the
> vredir.vxd networking file that caused havoc with network data
> transfers (not just with Paradox). This is when the Index out of
> data problem really reared its ugly head. (We had countless
> arguments with Microsoft who denied there was a problem - even
> though our stuff worked fine on Win95a. Then they released an
> updated vredir.vxd which fixed that problem!) The vredir.vxd
> supplied with Win98 is fine.

> However this was not the end of the matter. Paradox still
> presents index out of date errors - on some systems its
> unusable.

> Now as the representative from Sundial Services says if you set
> up all the clients so that they are IDENTICAL (same version of
> the BDE with the same settings - same network drive mappings for
> all database drives - identical versions of Win98 etc. ) - dont
> use any peer to peer stuff - then you can greatly reduce the
> number of errors. We reduced the problems by 95% this way -
> however this is not good enough for commercial deployment.

> Running NetBEUI as the primary protocol also seems to help with
> some of our customers (TCP/IP has a much bigger overhead though
> in timing terms this should be negligible) as does using a
> faster server. As we don't have the Paradox/BDE source code we
> are not sure why. One of the reasons could be that if the BDE
> calculates how long updates should take and there is a sudden
> unexpected delay then the BDE/Paradox combination decides there
> is a problem and aborts the index update. This delay could be
> caused by op-locking swapping from file to record locking or the
> write cache buffer being full and blocking all further writes
> until its emptied then. This is of course entirely speculation

> Another Paradox problem we have is if you try to mix Win98 and
> WinNT/2000 clients whilst talking to a WinNT/2000 server it
> takes about 20 minutes to fall over.

> >[FYI: I've always worked on Novell servers, have no idea
> >whether oplocking is on or off on them, but Paradox runs
> >beautifully on these, with no performance problems at all -
> >of course, comparing server OSs is another thread, but it's
> >always been interesting to me that the folks who complain
> >about Paradox corruption or speed issues are generally
> >running Windows networks...]

> >Liz

> Ive no idea about the underlying locking mechanisms of Novell -
> I know our customers with Novell servers never have index out of
> date problems even when using WinNT clients (and in some cases a
> mixture of WinNT and Win98). However the performance of some of
> their applications is significantly slower on Novel than on
> WinNT server. Whether this is due to the client interface on the
> workstation or Novell itself I have no idea - we have a file IO
> caching library I wrote that is transparantly link into our
> programs when they are going to be used on a Novel server.

> regards,
> Andrew

--
------------------------------------------------------------------
Sundial Services :: Scottsdale, AZ (USA) :: (480) 946-8259
mailto:i...@sundialservices.com  (PGP public key available.)

- Show quoted text -

Quote
> Fast(!), automatic table-repair with two clicks of the mouse!
> ChimneySweep(R):  "Click click, it's fixed!" {tm}
> http://www.sundialservices.com/products/chimneysweep

 

Re:corruption causes (technical explanations)


Quote
Sundial Services wrote in message

<3B12B37C.5...@sundialservices.com>...

Quote
>Superb, Andrew.

>And it might well be that there's something differentially
important in
>your comment about "a slight delay."  Some programs might open
a table
>and =immediately= start hitting it, whereas other programs
might open
>the tables and not start trying to hit them for a second or
more.  They
>might also leave the table connections open for a long time.

>For instance, Delphi programs and Paradox programs might
exhibit very
>different usage patterns.

Certainly our CPP builder and Paradox programs do - my CPP
builder programs open the tables - hammer a 1000 records out and
close them straight away. The Paradox side here tends to be
interactive record updating, hence a few records get changed
every now and then.

Quote
>It could be argued that some applications intrinsically present
the
>server with a scenario where it might juggle back-and-forth
between
>opportunistic and non-opportunistic behavior.  Paradox
databases are
>slightly unusual in that they consist of many files and they
are opened
>and closed quite frequently.

I'd say we're thinking along the same lines here. One of our
applications had a 6 indexes to update for every transaction -
(and a lot of transactions for each job - a contiguous burst of
about 500 records). Running multiple copies of this resulted in
recurrent Index out of date errors. Obviously as the second
instance of the app opened the table the server would have to
switch to proper record locking. This was a CPP builder
application so I introduced some sequencing control using manual
file locking. IE the application instances argued amongst
themselves as to who got exclusive access to the table and in
what order. This totally eliminated all Index out of date errors
on our system in the office. Alas if I introduce a Win2000
workstation into the equation I start getting problems again.
Some of our customers still get the odd index out of data with
the application but its helped a lot

Quote
>I'm slightly misquoted when you say "don't use any peer to peer

stuff."

Sorry - I added that myself as its one of our recomendations -
the rest of them I think you already mentioned.

Quote
>In my experience the presence of corruption-problems is not
really
>linked to load characteristics, although I rarely encounter
systems that
>are under {*word*190}ous loads anyway.  The problem usually appears
when
>"something changed" in the environment.  A system can hum along
for
>years and then, "something changed."

>---

Like someone swapping to an NT server......

To be honest most of our paradox systems are less than 3 years
old - except a few really old ones which were all on Novell
servers. Back in those days there was no WinNT and Microsofts
"operating systems" were awful - you either used Novell or if
you were rich and wanted to be different BSD (Unix) on a 68K
based machine.

Until recently we didn't use off the shelf database systems very
often because with the hardware available they weren't quick
enough. The hardware has advanced so much in the last couple of
years that we can now do so for a lot of our work. This makes
our development times much shorter and Paradox really fits well
with the sort of work we do.

Quote

>The "file I/O cacheing library" you describe in the last pp.
sounds very
>interesting.

But alas its not much use unless you're a C++ developer - I
wrote it as a C library for our C/C++ applications (which are to
be honest still 60% of our systems - this includes our own
custom high performance database stuff which is effectively
custom written per application).

regards,
Andrew.

Re:corruption causes (technical explanations)


The opp-lock article does not make it clear whether the switch occurs
when a second user connects, or when a second file-handle is opened
(e.g. by the same user).  Logic says to me that a second open handle
would be sufficient.

In which case, a simple solution would be to open all of the main tables
when the application starts ... and keep those handles open until the
app is through.  A data-model in a Delphi program would be sufficient to
do this.  For a Paradox application, handles could be opened through the
library.

Or, simply make sure that the application waits a fraction of a second
before it starts to do serious heavy-duties against the tables it just
opened.  Anything to gloss over a timing-window that might still exist
somewhere.

Your strategy of 'sequencing control' sounds very similar to the idea of
acquiring a Write-lock (table level) before performing a series of
updates all covered by that one lock; vs. record-locks.  Did your CPP
program do this sort of thing before?

[ Ahh, when a discussion like this one starts-up on a news group, it
does make all the "trolling" worth while ... best source of information
in the known universe, for all of us, I think.  Thanks, all. ]

Quote
>Andrew wrote:

> I'd say we're thinking along the same lines here. One of our
> applications had a 6 indexes to update for every transaction -
> (and a lot of transactions for each job - a contiguous burst of
> about 500 records). Running multiple copies of this resulted in
> recurrent Index out of date errors. Obviously as the second
> instance of the app opened the table the server would have to
> switch to proper record locking. This was a CPP builder
> application so I introduced some sequencing control using manual
> file locking. IE the application instances argued amongst
> themselves as to who got exclusive access to the table and in
> what order. This totally eliminated all Index out of date errors
> on our system in the office. Alas if I introduce a Win2000
> workstation into the equation I start getting problems again.
> Some of our customers still get the odd index out of data with
> the application but its helped a lot

> >I'm slightly misquoted when you say "don't use any peer to peer
> stuff."

> Sorry - I added that myself as its one of our recomendations -
> the rest of them I think you already mentioned.

> >In my experience the presence of corruption-problems is not
> really
> >linked to load characteristics, although I rarely encounter
> systems that
> >are under {*word*190}ous loads anyway.  The problem usually appears
> when
> >"something changed" in the environment.  A system can hum along
> for
> >years and then, "something changed."

> >---

> Like someone swapping to an NT server......

> To be honest most of our paradox systems are less than 3 years
> old - except a few really old ones which were all on Novell
> servers. Back in those days there was no WinNT and Microsofts
> "operating systems" were awful - you either used Novell or if
> you were rich and wanted to be different BSD (Unix) on a 68K
> based machine.

> Until recently we didn't use off the shelf database systems very
> often because with the hardware available they weren't quick
> enough. The hardware has advanced so much in the last couple of
> years that we can now do so for a lot of our work. This makes
> our development times much shorter and Paradox really fits well
> with the sort of work we do.

> >The "file I/O cacheing library" you describe in the last pp.
> sounds very
> >interesting.

> But alas its not much use unless you're a C++ developer - I
> wrote it as a C library for our C/C++ applications (which are to
> be honest still 60% of our systems - this includes our own
> custom high performance database stuff which is effectively
> custom written per application).

> regards,
> Andrew.

--
------------------------------------------------------------------
Sundial Services :: Scottsdale, AZ (USA) :: (480) 946-8259
mailto:i...@sundialservices.com  (PGP public key available.)

- Show quoted text -

Quote
> Fast(!), automatic table-repair with two clicks of the mouse!
> ChimneySweep(R):  "Click click, it's fixed!" {tm}
> http://www.sundialservices.com/products/chimneysweep

Re:corruption causes (technical explanations)


Quote
Sundial Services wrote in message

<3B13AD5D.1...@sundialservices.com>...

Quote
>The opp-lock article does not make it clear whether the switch
occurs
>when a second user connects, or when a second file-handle is
opened
>(e.g. by the same user).  Logic says to me that a second open
handle
>would be sufficient.

I suppose that would depend on any caching done by the BDE. It
would be sensible if all connections to a remote table (on the
server) from the one machine were handled by the BDE with only 1
file lock - it seems daft to open several locks and connections
where the data would have to make multiple trips backwards and
forwards to the same BDE on the same machine - but of course in
reality who knows?

This may mean that several open handles on the same machine may
not have the desired effect. Time to fire up the Win2000
monitoring tools and count the locks......

Quote

>Your strategy of 'sequencing control' sounds very similar to
the idea of
>acquiring a Write-lock (table level) before performing a series
of
>updates all covered by that one lock; vs. record-locks.  Did
your CPP
>program do this sort of thing before?

Well the reason we moved some database development to CPP
Builder is you are SUPPOSED to be able to just use the data
aware controls on forms and never have to write a single
Database API call - its all taken care of "in the wash" so to
speak. Of course in reality it doesn't work like that. Ive got
"DbiCheckRefresh()"  and "->Refresh()" programmed into function
keys as macros because they are needed so often. The DIY
sequencing control routines are because of the way you have to
use multiple table components pointing to the same table if
you're using lookup controls in CPP builder. If you tried a
tableX->LockTable() it would fail because the second instance of
that table on the form would have to the table open as well.

regards,
Andrew

Other Threads