Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • That's the problem with software - failure really is an option. It's not like we're building bridges or hospitals.

    Case in point - today we discovered a bug in my spam scanning software that has been there for years. Hundreds of thousands of mails have triggered this bug. Yet we only just noticed it because failure wasn't a total showstopper. Creating the software with a tool like Alloy would have caught the bug (probably) but it would have also taken a hell of a lot longer to get the software written.
    • Depending upon what you're doing, failure may not be an option. Consider the Therac-25 [wikipedia.org], a well-known radiation therapy machine which killed at least 5 patients due to a software bug.

      Or how about the doctors who were indicted for murder [baselinemag.com] because they didn't double-check the results of some software and had several patients die as a result?

      On a less lethal scale, tests can be used to prevent software flaws from reappearing, but if the underlying design of the software is flawed, the fixes that go in place

      • Consider the Therac-25, a well-known radiation therapy machine which killed at least 5 patients due to a software bug.

        The Therac 25 is a really important story, but it is an outlier, and ultimately not relevant to most discussions about bugs, reliability or catastrophic failure. There is no general lesson to learn from that, except to be extremely careful when working on a system where life is on the line (medical, embedded or otherwise).

        Case in point: I've worked on many online publishing systems in my day, and the absolute worst case scenario is that somehow, the overcomplicated pile of mud crashes and the system goes offline. But even in those rare circumstances, there's always a backup plan to get something working immediately to restore service. The second most important worst case scenario is that too much content or too little content is reaching paying customers. Better too little than too much, since the too much may be everything, or embargoed news that could lead to a contract violation with a data provider. Ultimately, as bug ridden as these systems tend to get, no software problem is worth losing sleep over; no one is going to die, and no one is going to lose their job over the odd bug.

        The more important problem is increasing and insurmountable complexity. Yet even that isn't critical, certainly not to the level of a Therac 25. Sure, a system can get so complex that it becomes unmaintainable. Either a business is healthy enough for a rewrite/upgrade/port to be feasible, or it's not a healthy business. Companies do not go under because of bugs and big unmaintainable systems; they go under because they are not flexible enough to evolve as the industry evolves around them. I witnessed one company migrate one app from dialup access to a VAX to a webapp; numerous shops went from DOS to Win3.1 to Win95 to webapp development, shepharding the same project all the way through. Whoever says rewrites aren't necessary or prudent is looking at small time scales or just fooling themselves.

        The reason why the Therac 25 isn't relevant is because sometimes the price of total catastrophic failure really isn't all that bad. (Forgive me, Gene Cernan, but not every project is life-or-death.) I worked for an engineering firm once, and we had a project where we needed to install a curved piece of extruded aluminum into a showpiece entryway. We got the extrusion, sent it to the benders, and it didn't fit. Why? Someone sent the wrong measurements to the bender, and the bender did exactly what they were told. Total failure. Cost to the company? One extra, expensive piece of bent aluminum; if that was the entire profit margin for the entire project, that's bad management, not the end of the world.

        Eliminating failure is a worthwhile and noble pursuit, but not when the effort is unwarranted or unprofitable.

        • I wondered how Brooks' distinction between accidental complexity and essential complexity fits into this distinction between acceptable and unacceptable failures.

          Whether the failure is acceptable or not depends on the values of the clients, I think. Or does it?

          I was thinking accidental complexity comes from the problem that the software is supposed to solve, but it looks like Brooks didn't think this way.

          He said use of a high-level language frees a program from much of its accidental complexity.

          I think I ha
          • Actually, you bring up a very good point.

            In the systems I can remember at the moment, catastrophic failure related to essential complexity is intolerable. Catastrophic failure related to accidental complexity is accepted as part of the "cost of doing business". Prime example: IIS and Windows servers instead of something more solid, like VMS, TrustedSolaris or something even more paranoid that can run a webapp. :-)

            You could make a convincing case that the inherent complexity of a computer is a part of the
        • No, this is the worst case scenario: vulnerabilites in SAP [cansecwest.com] or perhaps this Who turned out the lights [cansecwest.com]. The price of catastrophic failure really can be that bad. The problem is, that the same components that you used in your publishing house, or to bend sheet metal are being used everywhere else as well - And they suck!

          Now for my lovely little anecdote to debunk the rest of your point. Back before I worked for ActiveState, I was an IT consultant to a very large forestry company (who shall remain nameless
          • mock! How've you been? Where have you been hiding yourself?

            Bugs matter.

            Yes, they do, but not all bugs have equal weight. Not even security related bugs. Do I care if a package has a known buffer overflow if it's running inside my firewall? OK, I care, but do I care as much as I would if it's in the DMZ or on a public site? Do I care enough to patch inside the firewall first, leaving a public machine wide open?

            We can trade annecdotes all day about how bugs matter or don't. In the end, thou

            • Well I'm still kicking around Vancouver, however you might see me in London or Tokyo as well. I founded MailChannels [mailchannels.com] with another former ActiveStater, and I've been making the bits go for CanSecWest [cansecwest.com] and associated conferences for the last few years. Right now, I'm reworking our conference registration system, which entailed an audit of all the bits I was planning on using. I'm not really pleased with what I found.

              While I don't disagree that perspective is necessary, obviously when limited resources are