Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Mark Leighton Fisher (4252)

Mark Leighton Fisher
  (email not shown publicly)
http://mark-fisher.home.mindspring.com/

I am a Systems Engineer at Regenstrief Institute [regenstrief.org]. I also own Fisher's Creek Consulting [comcast.net].
Friday November 20, 2009
12:11 PM

10,000+ Exceptions/Hour

Although the details are .NET-specific, Are you aware that you have thrown over 40,000 exceptions in the last 3 hours? is a good overview of what happens when you use exceptions for non-exceptional circumstances...

(Just Say No to using exceptions for flow control.)

Friday November 13, 2009
11:56 AM

Checklists, Recipes and Algorithms

Checklists, Recipes and Algorithms draws from medicine, cooking, and programming to make the point that sometimes you just need to write down what you are going to do. You don't want to impose too much structure (that way lies spending a week to write a 2-line program (see analysis paralysis)), but you do need structure -- and explicit structure is easier to get right because it is easier to analyze.

(The Cardiac Arrest algorithm in the article is especially nicely presented.)

Wednesday September 23, 2009
01:12 PM

Regular Expressions: The Ultimate in Lack of Redundancy

The concepts in regular expressions are simple -- one of anything, a numeric character, a class of characters -- so why do so many people have problems with regular expressions? I think it is the lack of redundancy.

Each concept in regular expressions is expressed in 1 or 2 characters -- "." is one of anything, "*" is one or more of the preceeding thing, and so on. Compare this with C, where matching against a 'b' in a string could be coded compactly as:

match = 0;
while (*c++) {
    if (*c == 'b'){
        match = 1;
        break;
    }
}

Although a modern language would cut down the size of that code, it still wouldn't come close to the one character of the corresponding regular expression. And therein lies the problem – we humans rely on redundancy when interpreting information. In theory, you would never need more than one character to represent any concept in a computer language. Yet, the programming languages that gain general popularity are languages with some amount of redundancy built in -- Java, Perl, Python, C# -- the list goes on and on. Given that we’ve had minimally-redundant programming languages since programming languages were first conceived of in the 1950’s (APL, anyone?), if minimally-redundant programming languages were going to take over the world, it would have happened by now -- and it hasn't happened.

Another example of our need for redundancy is in driving directions. The best driving directions always contain some redundancy -- "you turn at Capitol, which is between Senate and Illinois" -- instead of "you turn at Capitol", as "you turn at Capitol" gives you no idea of where Capitol actually is -- it could be several miles down the road, or 1 block after the previous turn. (I have wondered if giving good directions is a skill similar to that of programming.)

Reading may be another example -- you can usually get the gist of a paragraph of English text even when only the first and last letters of each word are in their right places (thereby demonstrating that the other letters are mostly redundant).

Music is possibly another example of the human need for redundancy. Whether it is the de-de-de-dah motif of Beethoven's 5th symphony, the distinctive drum line of Led Zeppelin's "Immigrant Song", or the chorus of Green Day's "21 Guns", music relies on redundancy through repetition. In theory, you should only need to hear each part of a song once to derive full musical enjoyment from the song. But instead, in Beethoven's 5th symphony (where there are no words to require musical backing) Beethoven repeats the motif over and over again. And Beethoven's 5th symphony is widely regarded as one of the crowning achievements of music -- yet it is filled with redundancy through repetition, although there is no theoretical reason for that level of repetition. Or is there?

Truth is, we humans need a certain level of redundancy in our information before a concept is firmly planted in our heads, whether it is a popular song or the clauses of our national constitution. The reason that I and so many others have found such success with the Head First book series is because Head First's use of redundancy (presenting each piece of information in several different ways) helps to ensure that you retain the information in the Head First books.

Perl's "/x" modifier may turn out as one of the most significant advances in regular expression syntax, because /x enables the splitting-up and commenting of your regular expressions -- operations that increase the readability and redundancy of your regular expressions (redundancy because the parts of your regular expressions are now represented by a whitespace-bounded line of text instead of just the regular expression characters (in the common case)).

(Why we humans need all this redundancy is better left to another day, although I will give you a hint: why do humans still have appendixes?)

Friday July 17, 2009
12:16 PM

iMacros: Automation for Firefox

iMacros for Firefox is the Web automation solution I have been looking for. Why iMacros? Because:

  1. You record what you actually do, rather than trying to reconstruct what you did from memory;
  2. The end product is a vanilla ASCII macro language editable by vi (or the editor of your choice);
  3. The macro language works at a high level -- for example, HTML element absolute and relative positioning are among the macro language features included;
  4. You can extract data from a page as text or HTML;
  5. An iMacro can be used as a Firefox bookmark; and
  6. The Firefox add-on version is freeware.

To give an idea of how exciting I found iMacros for Firefox, I immediately wrote 2 iMacros after installing iMacros that I have been wanting to write forever (a downloader for the titles of my blog posts and a front end to our local real estate browser to jump right to the properties in our county).

I have only scratched the surface of what can be done with the freeware version of iMacros -- if you need web automation, iMacros is worth a look.

[Flash and Java can be supported through the DirectScreen command (as yet untested by me, as DirectScreen is only avaiable in the paid editions).]

[Ob.Perl: Once designed, something like iMacros would be relatively easy to write in a Firefox-embedded Perl.]

Thursday July 02, 2009
11:23 AM

Test-Driven Development: Some Hard Numbers

Realizing quality improvement through test driven development: results and experiences of four industrial teams analyzes the TDD experiences of 4 teams at IBM and Microsoft. Nothing surprising here to those who have already experimented with test-driven development (pre-release defect density decrease of 40%-90% combined with a 15–35% increase in initial development time), but it is good to have some hard numbers on TDD rather than relying solely on anecdotes and hearsay.

Friday June 26, 2009
12:02 PM

Why Big Software Projects Fail: The 12 Key Questions

Why Big Software Projects Fail: The 12 Key Questions by Watts Humphrey clearly talks about how:

  1. Requirements management is hard; and that
  2. When those who perform the work (programmers, graphic designers, etc.) tell you that a task will take X amount of time, you need to listen to them.

Agile has become popular because finding all requirements at the beginning of a project is nearly impossible for large software projects, especially green-field projects (projects that are the first of their kind). Agile, properly executed, lets you discover requirements by using a running system as a usable prototype for the eventual finished system.

Agile also requires everyone's involvement in the scheduling, rather than having an arbitrary schedule handed down from on high -- a tactic which I think has contributed the majority of project failures (real Project Management classes will tell you just how big of a no-no is schedule imposition).

(Mr. Humphrey is worth listening to -- while he led the OS/360 development team, they met their schedule for all 19 releases he oversaw.)

Friday April 24, 2009
01:23 PM

Pretend Project Management

Pretend Project Management is when management of the project ignores reality -- from the making of the schedule, to the tracking of actual vs. planned time/money spent, all the way down to the project post-mortem (and mortem it usually is, as the project is often D.O.A.) Cargo Cult Methodology: How Agile Can Go Terribly, Terribly Wrong is a real-life example of Pretend Project Management, one worth examining in more detail.

The first red flag is not hiring a system administrator, while simultaneously not allocating the time for system administration in the schedule. The Iron Triangle of scheduling cannot be violated without doing violence to the schedule -- you cannot have system administration work to do without scheduling time for someone to do that work. So, without a system administrator, the schedule has to change so that other people will get that work done. Generally (IMHO), you cannot take an FTE's amount of work in a schedule and say, "Oh, we'll just do that work during slack times." This always comes back to bite you, usually towards the end of the project when you can least afford it. The Project Manager should have modified the schedule to allow time for system administration by whatever means cut back features (scope), add a system administrator (cost), or stretched out the schedule (time). If management does not let you modify the schedule, then Pretend Project Management is what is actually being practiced.

Another red flag was "Agile Development" but no time in the schedule for quick incremental deliveries (intervals measured in weeks). Perhaps the essence of Agile Development is quick iterations with immediate responses by the customer. If the iterations are not quick, or the responses are not immediate, then the development process is not Agile, despite protestations to the contrary. (And calling quick iterations "silly" as management did shows serious misunderstanding of the Agile Nature.) Quick iterations maximize the value delivered to the customer, as the feedback from quick iterations keeps development on the correct path. If the feedback loop is too long because of lengthy iterations, management introduces the risk that development will produce a product not needed or wanted by the customer. Agile Development without quick iterations is Pretend Agile Development, and Project Management of Pretend Agile Development is Pretend Project Management, as time has not been allocated in the schedule for real Agile Development. (Hint: 4 weeks rather than 4 months is closer to a useful iteration length.)

The lack of continuous integration is yet another red flag. Agile development should proceed at a fairly steady pace. Continuous integration helps steady the pace, by preventing small, relatively simple build problems from growing into huge, intractable build problems. In tbe pre-Ethernet days, I once had to integrate months of changes -- trust me, you really don't want to have to do that.

Those of you with exposure to Project Management training will notice another, massive red flag -- no flexibility in scope, time, or resources. When I have managed projects in the past, resources were fixed, time was fairly-well fixed, while scope was the most variable part of the project. I suspect (without proof) that this is a common pattern, as you sacrifice features to get the project "done" (by some measure).

From what I have read, Agile Development seems to plan for a fixed number of people (resources) while varying the time and scope of the project. Usually, changing the project scope is the topic for those writing about how Agile differs from other development styles. (This may be because Agile developers have the attitude, "It will be done when it is done, and not a moment before.")

What conclusions can we draw from this example?

  1. Half of the problems had nothing to do with Agile development. No flexibility along any axis in the schedule (scope, time, resources) and lack of (system administration) time in the schedule are just varieties of problems that project managers ran into 50+ years ago.
  2. Unfortunately, some in management do not grasp that all schedules are approximations -- the Pyramid blocks are not delivered on time, the 1942 warship review for the good ship "X" is cancelled because there is no longer a good ship "X" (or "Y", or "Z"...), Microsoft delays a service pack to fix a security bug, and so on and so on. If you get far enough into using Microsoft Project (as an example tool), you will see multiple start and end times for each task (phrases like early start, late start, scheduled start, actual start). Only when you drive the project risk down to zero can you be certain that each task will start and end on time. Not accounting for project risks leads to deciding there will be no flexibility in the schedule, and inflexible schedules lead to project problems (even if your requirements are up-front perfect).
  3. The other killer classic scheduling mistake (one that was undoubtably seen back in Roman times and before) is "well, we can't hire someone to do this -- we'll just do it during our slack times". As that work usually tends to pile up, the effects are usually felt at the end of the project when you can least afford it (as mentioned before).
  4. If you cannot do weekly-to-monthly incremental deliveries, IMHO you are developing in another way -- not in the Agile Way.
  5. In Extreme Programming (an Agile style), they talk about "Sustainable Pace". This means a pace that team members can comfortably keep up for months or years at a time. Continuous Integration is an essential lubricant for sustainable pace -- without CI, you will waste a lot of time catching up on huge blocks of changes that also change the build process (adding a module/assembly/DLL, etc.) Whether you or not you are doing Extreme Programming, "Sustainable Pace" is a goal worth shooting for.
Thursday April 02, 2009
11:52 AM

A Pattern of Troubleshooting

Troubleshooting works through an example troubleshooting situation (possible cardiac problem), then extracts the pattern to follow when you are troubleshooting, whether you are responding to a medical emergency or a debugging a Perl program.

Worth a look.

Friday March 27, 2009
12:59 PM

Grabbing With Your Presentations

If you have ever wondered why some presentations grab you while others leave you cold, this TechRepublic download of a chapter from Cliff Atkinson's Beyond Bullet Points may provide insight. (The chapter feels like a Head First book chapter to me.)

Friday March 20, 2009
03:36 PM

Code Contracts for .NET

Code Contracts for .NET "provide a language-agnostic way to express coding assumptions in .NET programs" (from their website). This lets .NET programmers -- whether C#, VB.NET, Iron Python, or whatever -- verify coding assumptions both statically and dynamically. (This is similar to Design by Contract in Eiffel.)

Code Contracts includes a static checker program for verifying both explicit and implicit (null references, array bounds, etc.) code contracts. Runtime (dynamic) contract checking can use marked series of If-Then-Throw guard clauses as in:

if ( x == null )
    throw new ArgumentNullException("x");
if ( y < 0 )
    throw new ArgumentOutOfRangeException(...);
Contract.EndContractBlock();

(so you don't waste perfectly good guard clauses) as well as the standard, explicit code contracts like:

    Contract.Invariant(this .y >= 0);
    Contract.Assert(this .x == 3,
     "Why isn't the value of x 3?");
    Contract.Requires(x ! = null,
     "DANGER -- missles fired!");

Code Contracts defaults at runtime to throwing an exception (System.Diagnostics.Contracts.ContractException) when a contract is violated (this behavior is configurable).

I have not tried Code Contracts (or any code contract mechanism) yet, but the idea is intriguing because it lets the computer do something it does well (exhaustive examination of your code in tedious detail) thereby freeing you to work on the higher-level aspects of your program, just as C freed us from assembly language bookkeeping and Perl/Java/VB.NET etc. free us from C language bookkeeping.

If anyone has experience with code contracts for any language (positive or negative), please comment.

(Ob. Perl ref. -- see Moose and Class::Contract among others...