Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Mark Leighton Fisher (4252)

Mark Leighton Fisher
  (email not shown publicly)
http://mark-fisher.home.mindspring.com/

I am a Systems Engineer at Regenstrief Institute [regenstrief.org]. I also own Fisher's Creek Consulting [comcast.net].
Friday March 06, 2009
01:49 PM

Windows Vista and Multi-Level Security

Mark Russinovich's Inside Windows Vista User Account Control includes many interesting tidbits for those like me who develop for Microsoft Windows, but to me Windows Vista Integrity Levels are DoD-style Multi-Level Security by another name.

This is ironic, as the Department of Defense seems to be moving away from MLS systems, instead going towards PCs where each PC is at one level of security. (DoD developers, feel free to speak up at this point.)

Worth a look for Windows developers and OS enthusiasts.

Friday February 27, 2009
12:34 PM

Dispose, Finalization, and Resource Management

As Perl moves into Garbage Collection territory with Perl 6, Dispose, Finalization, and Resource Management -- even though it was written about the .NET GC -- is worth a look because all garbage-collected languages must deal with these issues.

If you ignore these issues, you will spend your time debugging memory allocation/release problems instead of delivering functionality to your customers -- and your customers pay you for the functions you deliver (they only put up with your debugging to get those functions).

(One way to think about Garbage Collection is that GC is like Perl for memory allocation/release -- GC makes easy memory alloc/release easy, and makes hard memory alloc/release possible.)

12:19 PM

Top 25 Most Dangerous Programming Errors

For those who have not seen this -- Top 25 Most Dangerous Programming Errors.

Tuesday February 17, 2009
12:38 PM

Zotero -- Open Source Super-EndNote in your Firefox

A comment by Jon Duke here at Regenstrief led me to Zotero, an Open Source Super-EndNote in your Firefox (EndNote helps you collect and manage citations). Although Zotero was originally developed for humanities researchers, Zotero is useful for anyone who researches on the Web (whether for publication or software development), as it provides easy collecting, organizing, and searching of your personal citation list (think "bookmarks on anabolic steroids"). Zotero provides more functionality if the web page is designed for Zotero (see Make your site zotero ready), but any web page can be linked or captured into Zotero for later use. Some Zotero features:
  • Zotero will automatically gather citation information if it is present on the page. You can collect any page with Zotero, but you may need to fill in some information if the automatic citation info is not present. Note that Amazon.com among other popular websites provides Zotero-compatible citation info.
  • You can collect either just the link to the page, or a snapshot (copy of) the page.
  • Notes let you annotate your citations to any level of details. Notes can also stand alone (i.e. notes not attached to a citation).
  • A Zotero citation can have zero or more attachments.
  • Collections let you gather related items together. A citation can exist in more than one collection.
  • Tagging lets you group your citations in arbitrary ways. Zotero may automatically grab the LC subject headers for book citations and keywords for article citations.
  • You can work with Zotero when off-line (airline travel, anyone?) although linked citations will of course be unavailable except for their Zotero metadata.

As a personal example, RELMA is moving from VB6 (1998 technology) to VB.NET (2008 technology), so there are lots of good additional features in VB.NET to learn about. I have started using Zotero to track my .NET Web citation links for RELMA so those links are ready for when I need them (as in the ability to use ASP.NET as a text template engine outside of IIS] (handy for internationalizing RELMA's HTML output)). Try Zotero -- you may like it!

Wednesday June 18, 2008
05:58 AM

The Important Numbers of Testing: 0, 1, and Many

Although there are an infinity of numbers to use in software testing, the 3 important numbers are 0, 1, and Many.

0, the number of nothingness, comes into play when you don't have anything. C enshrined 0 as the null pointer, though other languages and systems had represented nothing by a memory address of 0 before C. (There were other representations of null they make for interesting reading.) Customers without orders, Webpages without links, forests without oak trees all of these are most easily represented by a 0 inside a computer. Even inside your computer, your programs are not a closed system. You can run out of memory (although Perl eliminates the silly cases of this), you might forget and make a directory unreadable (0 files), a compiler error could skip an allocation statement (0 elves) the list can go on and on. If you don't consider the case of 0, eventually your software will fail. (Conversely, I once wrote a server in Perl 4 that ran for months at a time because I did extensively consider and test the case of 0.)

1, the first number, is seen when you only have one of something, an idea so common that it becomes the Singleton pattern in languages that need a special representation of one and only one instantiation of a class. With 1 of something, everything has to be instantiated, but you don't have the problems of multiple copies of the item in question. If the item is part of a collection of items, I have occasionally seen defects where the collection is not allocated if there is only 1 item. It is probably an artifact of my coding style, but I don't see many defects in my code specific to 1 and only 1 item. When the common cases are 0 and many, I have seen code that fails to work on all of the edge cases of 1 and only 1 item.

"Many" often just means "more than 1". Usually, your code does nothing different for the 472nd item than it does for the 2nd item. There are times, though, when code(2nd) != code(472nd) (3-column display code comes to mind here). Handling many items involves sizing their containers appropriately. Almost any allocation algorithm can give you space for 1 item only correctly constructed allocation algorithms will always yield the right number of places to contain your items. The familiar fence-post error of array management is but one example of a failed allocation algorithm (and failure to properly test for many items).

defined() is the special case of accessing an item before it is initialized. A real-world case is a restaurant without any customers. Attempts to access any customer data will only find undefined values. Undefined values can occur when you grab large blocks of data for performance reasons the example restaurant and all its customers from before where you grab so much data in one fell swoop that not all of it is initialized. Incomplete data is not defined. A customer without a cellphone (or without a landline phone) would have an undefined value for that phone field. A broken tire pressure sensor could yield an undefined value when read. Sometimes you can just ignore undefined values, but other times you have to explicitly handle them (think running sums or some statistical operations).

Although you may have other special numbers to test, you will likely have to test at least 0, 1, and many. The multiples of many, the somethingness of 1, and the nothingness of 0 will need to be tested to ensure adequate test coverage of your code.

05:57 AM

The Golden Rule of Data Manipulation

The Golden Rule of Data Manipulation can be summed up as "Concatenation is Easy, But Parsing Is Hard". But we are talking really, really hard here not just lifting a dining room hutch hard, but lifting the Empire State Building hard (in the end game). That hardness has been a large barrier in natural language communication for computers, as parsing an arbitrary sentence is ludicrously hard. AIML et.al. have worked around the problem by restricting both the domain of discourse and the variety of sentences recognized, but they have only worked around the problem, not solved it. If you start from a point of concatenating simple, nearly atomic data, your programming task will be much easier (and much more like to lend itself to later parsing, rather than starting at arbitrary parsing of your data). Anyway, read the article!

Friday May 30, 2008
12:04 PM

PageRank is Precomputed Relevancy Ranking

Google's PageRank is precomputed relevancy ranking, where the heavy lifting of actual relevancy ranking is done by us humans. Why is this important? I was re-reading A new comparison between conventional indexing (MEDLARS) and automatic text processing (SMART), which lays out how computerized indexing can beat the best manual indexing by:

  • Using a stop-word list;
  • Using a thesaurus (synonyms); and
  • Relevancy ranking.

(It's more complicated than that, but you get the idea.) Relevancy ranking is the hardest part of the indexing job, as there are no clear-cut algorithms for relevancy ranking with both excellent precision and excellent recall (getting all of the documents you want and none of the documents you don't want). Google's PageRank works around the difficulty of relevancy ranking by handing the hardest part the ranking of individual documents to us humans. You can get good results from proper metadata, but metadata is useful only in environments where no one has interest in gaming the metadata (I wonder if it should be called "The Semantic Intranet"? That's where Semantic Web technologies really make sense to me.)

The original paper is worth a read, especially if you work on software that incorporates search and these days, I suspect that almost any non-embedded program could grow to a point where it incorporates a search mechanism (and an email client, and a web browser you get the point).

Friday May 23, 2008
12:04 PM

Good Inheritance and Bad Inheritance

Inheritance is evil, and must be destroyed is the slightly overwrought title of an article by BernieCode that, nonetheless, expresses an idea that I've long held that most use of inheritance is better represented by either composition (HAS-A rather than IS-A) or by interface implementation/Perl 6 roles (ACT-AS rather than IS-A).

Inheritance works well for classes that are actually closely related (the canonical example of classes that represent the relationship of various species springs to mind here). What you often want (in my experience) are classes that can act in a certain way for example, a horse and a dog that can act like a pet. The EventManager example in the article above is a particularly good example of where a Perl 6 role/Java interface/etc. solves a problem much more neatly and clearly than inheritance does.

By the way, Solving compositional problems with Perl 6 roles (which I just discovered) also looks like a pretty good resource on this topic, especially for us Perl users.

Friday March 07, 2008
01:09 PM

pmtools-1.10 Release

Now at a CPAN mirror site near you pmtools-1.10. Tom "spot" Callaway of Fedora Core let me know that the Fedora folks were concerned about the fact that pmtools was only licensed under the Perl 5 Artistic License (they were concerned about how well the Artistic License 1.0 would stand up in court). So, pmtools (starting with v1.10) is now dual-licensed like Perl (Artistic and GPL). (My other public Perl stuff is also dual-licensed.) I also added my copyright to pmtools, as I had not added my name to the copyright when I took it over.

Off-hand, I don't recall why Tom Christiansen used only the Artistic License for pmtools. Anyone with a clue, please drop me a line. (That of course includes you, Tom.)

Friday January 25, 2008
12:17 PM

Navigational Spaghetti -- What are your thoughts?

Navigational Spaghetti -- What are your thoughts? presents the dilemma of making program navigation both easy and flexible. Somehow MVC/MVP come to mind here...