Status 2 file damage

Last post 06-26-2012 5:53 AM by nkhughes. 5 replies.
Page 1 of 1 (6 items)
Sort Posts: Previous Next
  • 06-21-2012 5:50 AM

    Status 2 file damage

    Sorry if this is a bit handwavy or veering into "how long is piece of string" territory, but I'm looking for some opinions/advice on avoiding database file damage.

    I don't get too many phone calls nowadays regarding status 2 errors in the legacy software I support (like the many phantom status 2s back in the days of P.SQL 2000) , but when it happens it tends to be quite dramatic (this week, for example, 9000+ records lost in 2 files when they were rebuilt, which meant that we ended up restoring from a 3-week old backup when it was discovered that the damage existed all the way back to then but had only started giving status 2 errors this week).

    It has to be said that after 10+ years of clients using the software, the files involved tend to be the same ones and are tables into which the most rows are being inserted each day. Therefore I imagine they are getting cached the most. In this week's case, the records lost accounted for about 1% and 4% of the total number of records in the files. In all cases, as far as I can remember, the damage has never been traced back to a bug in the applications or (phantom 2s excepted) the database engine - when a culprit has been identified it's been a hardware failure, server crash or network issue.

    What concerns me is that despite having backup plans and modern hardware/operating systems, I still find myself having to explain to clients that they are going to have to re-enter data because records have been lost. I was wondering if people on here have any views on the following:

    1) File size

    Is there a linear relationship in a Pervasive.SQL database file between the number of records lost and the total number of records in the file, i.e. a 10 year old file with a million records is likely to lose more records than a 10 year old file with 100K?


    2) File version

    The files in question this week were still in version 6 format, even though the database engine involved was 11. Are later file versions more damage-resistant?


    3) Caching

    With caches getting larger and more tables sitting in RAM to gain performance, is this asking for trouble and a possible reason for the heavier file damage I am seeing? Is the system/OS cache inherently more risky for the Pervasive.SQL engine to use and should it be avoided if possible? Do people see more issues when they combine the system and Pervasive cache?

  • 06-21-2012 9:37 AM In reply to

    Re: Status 2 file damage

    I get files to repair from clients all over the world, and have written many tools to make this job easier and recover more data than the typical BUTIL -RECOVER, so I think I am fairly up on this topic.

     1) No relationship exists between file size and corruption.  Files under 10MB get corrupted just as frequently as large files.  Now, the math suggests that a file with 1M records will be likely to lose more data than a file with 100 records (since 100% loss on a 100-record file is still only 100 records), but as a percentage of records, the numbers are reversed.  Much depends on the type of corruption -- overwriting 12 pages with the contents of an EXE file due to a file system cross-linked cluster is still going to impact those 12 pages -- and all data on those pages will be lost.

    2) Btrieve 5.x is much more likely to have problems, but 6.x files (when used with an engine version of 6.15.451 or higher) are pretty darned stable.  Newer file formats are no more resistant to damage, as there is no special error correction data in the file that might indicate this.  (Having said that, PSQLv9.00 did have some problems mangling files over 6GB on the v9.x file format, but this was fixed by either v9.10 or v9.50.)  Running out of disk space can cause 7.x files (or higher) to segment on an incorrect extent boundary on some versions, too, so I guess 6.x is the MOST stable file format, if you're OK with the 4GB file size limit, smaller page sizes (Max=4K instead of 16K) and lower performance (without TWA).

    3) No.  Data is cached for read purposes.  For writing, the background writers (i.e. I/O Threads) are monitoring the data files and issuing a system transaction every 10s (by default, see your Initiation Time Limit), so the database changes is always getting flushed to disk.  The system cache should be avoided on servers which have heavy disk writes, because it tends to slow down the write process, causing the engine to delay from time to time under heavy load.

  • 06-21-2012 9:51 AM In reply to

    Re: Status 2 file damage

    1) File size - some.  The larger the file the more file there is to be damaged.  Also the larger the file typically the more active the file.  As to the number of records lost, not really as it is more or less random inside the file.  Knowing more about the file internals would shed light on it. 

    If you want a better tool for recovering records you may want to get brecover.  It can often retrieve all but the actually damaged records so instead of losing 9000 it may have been only 9.

    http://cs.pervasive.com/files/folders/tools/entry47393.aspx

    2) File version - not really for any V6 or later file format.  V6 and later all share approximately the same file integrity protection.  The one caveat about V6 is that it does not have the option for a SYSKEY for transaction logging if the file does not already have a unique key.

    3) Caching - should not be.  The default settings for how often dirty pages are written out have been the same for about a dozen years that I know of.  Dirty pages should be written out in a timely manner no matter how much cache you have.  If the machine being used does not have ECC memory though bit errors may creep in.

    4) system and Pervasive cache?  I have heard tales of such but have never seen it and from knowledge of how things should work would not expect it.  From a performance perspective though it is a bad idea.  There is of course the "Cache Size" fixed cache.  Any record read or written is done from the fixed cache.  If the record is not in cache it is a cache miss and is fetched then is read from the cache.  Writes are also done to the fixed cache and are flushed as background writes based on the cache flush settings and sometimes other triggers in high volume environments.  Both the Pervasive "System Cache" and "Max MicroKernel Memory Use" settings can use the remainder of memory for that "last chance" cache.  Using both leads to double caching and will hold less data in memory than picking one or the other, so pick one.

  • 06-22-2012 3:32 AM In reply to

    Re: Status 2 file damage

    BtrieveBill:
    1) No relationship exists between file size and corruption.  Files under 10MB get corrupted just as frequently as large files.  Now, the math suggests that a file with 1M records will be likely to lose more data than a file with 100 records (since 100% loss on a 100-record file is still only 100 records), but as a percentage of records, the numbers are reversed.  Much depends on the type of corruption -- overwriting 12 pages with the contents of an EXE file due to a file system cross-linked cluster is still going to impact those 12 pages -- and all data on those pages will be lost.

    OK...it was the degree of damage, rather than the probability of damage, that I was thinking about. About 10+ years ago I would have been more up to speed on the mysteries of the Btrieve file format (PATs?) but it's become a black box to me nowadays.

    2) Btrieve 5.x is much more likely to have problems, but 6.x files (when used with an engine version of 6.15.451 or higher) are pretty darned stable.  Newer file formats are no more resistant to damage, as there is no special error correction data in the file that might indicate this.

    ...so I guess 6.x is the MOST stable file format, if you're OK with the 4GB file size limit, smaller page sizes (Max=4K instead of 16K) and lower performance (without TWA).

    Once I discovered that the site in question, like most of the sites I still support, had version 6 files I started thinking that maybe now was a good time to get everyone to update their file versions sooner rather than later. Not a simple task since most are running 24/7...so I'm glad I can put that on hold for now and continue to rely on file conversions (due to updates to the software) to gradually update the files.

    I don't think I've ever been able to prove that updating the file version has brought about a noticeable performance improvement, so on its own it's not a compelling reason to update. The site in question upgraded to Pervasive.SQL 11 because they wanted to replace their unsupported 2000i installation.

    3) No.  Data is cached for read purposes.  For writing, the background writers (i.e. I/O Threads) are monitoring the data files and issuing a system transaction every 10s (by default, see your Initiation Time Limit), so the database changes is always getting flushed to disk.  The system cache should be avoided on servers which have heavy disk writes, because it tends to slow down the write process, causing the engine to delay from time to time under heavy load.
    Thanks...there's some stuff there I can talk to the system admin. guy about. When they had a support contract with my previous employer we would have configured the 2000i server engine, but the upgrade to version 11 was done entirely by them since at the time they were no longer getting any support. It'll be interesting to see how things are configured.

    If the files are getting damaged having already been written back to, e.g. disk errors, I wouldn't expect the 2 files in question to be affected more than the others. Since they are in use and written back to more than the other files, it feels like there's something going on between the application updating the data and the files having been updated. Hence my (mistaken) thinking about the cache.

  • 06-22-2012 3:52 AM In reply to

    Re: Status 2 file damage

    lharvey:
    1) File size - some.  The larger the file the more file there is to be damaged.  Also the larger the file typically the more active the file.  As to the number of records lost, not really as it is more or less random inside the file.  Knowing more about the file internals would shed light on it. 

    If you want a better tool for recovering records you may want to get brecover.  It can often retrieve all but the actually damaged records so instead of losing 9000 it may have been only 9.

    http://cs.pervasive.com/files/folders/tools/entry47393.aspx

    Thanks...I'll try recovering the file again and see what happens.

    2) File version - not really for any V6 or later file format.  V6 and later all share approximately the same file integrity protection.  The one caveat about V6 is that it does not have the option for a SYSKEY for transaction logging if the file does not already have a unique key.

    Fortunately, we made the decision to always have at least one unique key, usually an index number, in (I think) all of the tables when the database was first designed Smile

    3) Caching - should not be.  The default settings for how often dirty pages are written out have been the same for about a dozen years that I know of.  Dirty pages should be written out in a timely manner no matter how much cache you have.  If the machine being used does not have ECC memory though bit errors may creep in.

    OK...so it looks like I can get rid of this mental picture of data sitting in a cache for a while, waiting to be written out. Thank you (and Bill) for mentioning this.

    4) system and Pervasive cache?  I have heard tales of such but have never seen it and from knowledge of how things should work would not expect it... ...

    Using both leads to double caching and will hold less data in memory than picking one or the other, so pick one.

    Thanks...I obviously need to check how the site in question has configured their version 11 engine.

     

  • 06-26-2012 5:53 AM In reply to

    Re: Status 2 file damage

    nkhughes:

    lharvey:
    If you want a better tool for recovering records you may want to get brecover.  It can often retrieve all but the actually damaged records so instead of losing 9000 it may have been only 9.

    http://cs.pervasive.com/files/folders/tools/entry47393.aspx

    Thanks...I'll try recovering the file again and see what happens.

     

    As suggested, brecover does seem to have done a much better job. Only a handful of records lost instead of thousands. It's possible that had I used this we might have been able to avoid restoring from backup, by analysing which auto-increment index values were missing to see if the lost records were recent or ancient history (and therefore something we could deal with in time).

     

Page 1 of 1 (6 items)