Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
More | Login | Reply
Loading... please wait.
  • It appears that the IDE controller (or the kernel, talking to the IDE controller) completely dies, with many 'lost interrupt' messages on the console for three of the four drives in the machine.

    I saw this once. So, I bought a PCI controller to replace the one built in on the mobo. Didn't help, because it was a bad hard drive. It gets worse, much worse if you have the hard drive as part of a RAID.

    What is happening is that some blocks on the drive are failing, and it takes the drive so long to remap that it doesn't answer dma requests in time. (Or... something like that ;-))

    If you have the time/hardware... you can run hdparm on the drive to make sure it is getting the proper transfer rates... then run badblocks across the entire drive (don't waste time writing a log file) and wait for the messages to appear. If they do, you found the drive.

    Also, check w/ hdparm afterwards to make sure the transfer rate is still where it should be. I've seen some instances where the drive would pass the badblocks test... but the dma got turned off and the transfer rate went from 66MBs to 3 MBs. This was the response from the drive taking so long to remap the bad blocks.

    This causes all holy hell to break out when the drive is part of a RAID and the sibling(s) are still running w/ dma, and at a much faster transfer rate.

    Also, running the vendor's disk diagnostics did very little good since the drive could remap the bad blocks. It wasn't until I left badblocks running for a week(!) that I was able to consume all of the spare blocks and make it fail the vendor diagnostics. But, the drive would consistently drop it's dma settings whey you ran badblocks on it and it hit the trouble spot on the platter. (I did this because the vendor insisted that they needed the code from the diagnostics disk before they would replace the drive. I had already replaced it in the system with a spare.)