Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

schwern (1528)

schwern
  (email not shown publicly)
http://schwern.net/
AOL IM: MichaelSchwern (Add Buddy, Send Message)
Jabber: schwern@gmail.com

Schwern can destroy CPAN at his whim.

Journal of schwern (1528)

Friday March 16, 2007
05:02 PM

The modern view on commenting

[ #32714 ]

I keep writing these off-the-cuff essays to people in response to questions and I really should archive them somewhere and turn them into something. For now, I'll stick them here.

This one is where a friend of mine is learning how to program on the job. She's a biologist which explains all the gene-centric examples I use (and I apologize for my mangling of the entire field of genetics in the process). She's figuring out how and when to comment and write documentation. I answered the commenting part and largely ignored the documentation part because I was sick of writing really.

Antonia Mayer wrote:
> > Would you have a perl script with exemplary documentation for me? I
> > still don't have a good sense of what is too much and what too little
> > comments.
> >
> > That is, what I've heard here on the question is my direct supervisor
> > saying: I suggest you learn and use POD.
> >
> > And another co-worker saying: and there is no "too little comments".
> > Comment every line and every regex if you will. We don't know
> > whether whoever is maintaining this code after you will have had any
> > experience with Perl. Any decrease of legibility of the code by
> > having all these comments will be balanced by the time spent trying to
> > figure out little things and changes.
> >
> > So, d'you have an example of exemplary documentation at hand, or any
> > other advice or comments?

First off, comments and documentation are two different beasties. Or rather,
there's internal documentation, usually done with comments, and external
documentation, usually done with POD. They have different audiences, and its
important to know your audience.

Internal documentation is for people who are going to read and modify the code
after you. It might be the next maintainer, it might be you a few days from
now after having forgotten how it works.

External documentation is for the user of the program. It assumes the user
has no knowledge of how the program is written or of programming at all. The
program is a black box. The docs explain why you want to use the program,
what it does and does not do, quirks and bugs, frequently asked questions,
common mistakes and how to fix/avoid them, what you pass in, how to pass it
in, what gets done to it and what comes out.

Internal documentation is normally written with the assumption that the
audience knows Perl. If they don't, its not your job to teach them and there
are plenty of better ways to learn. Unfortunately it sounds like you can't
make that assumption at your work place. However, it doesn't mean you should
comment what every line is doing. It would be silly to write:

        my $thing = 4; # set the lexical variable $thing to 4

Both because its cluttering, and here's the reason why too much commenting is
frowned upon, if you change the line you have to remember to change the
comment. And you won't remember. And then the comment will lie.

Comments should not say *what* is happening and *how* it is happening but
rather *why* it is happening. The what and how can be figured out from the
code itself, and if its written properly it should be clear. But the why is
not in the code, so you have to add it in with annotations. Let me show the
difference.

        my $thing = 4; # the length of a gene sequence

That tells you *why* you're assigning 4 to $thing, which is the important
thing. Now it is possible even to eliminate the need for annotative comments,
you can make the why implicit in the code. This is done with good use of names.

        my $gene_sequence_length = 4;

That needs little or no explanation (except, maybe, why you set it to 4).

Good variable and subroutine names largely eliminate the need for comments and
help you to structure your code better. Let's say you have a dense series of
regexes which transforms a sequence from type A to type B (I'm making shit up).

        # Transform from type A to type B
        $sequence =~ s/A+/B+/g;
        $sequence =~ s/gafoooty/feribble/;
        $sequence .= " this is the end my friends.";

You can eliminate the need for the comment AND structure the code better by
moving that block into a subroutine.

        sub transform_from_type_A_to_type_B {
                my $sequence = shift;

                $sequence =~ s/A+/B+/g;
                $sequence =~ s/gafoooty/feribble/;
                $sequence .= " this is the end my friends.";

                return $sequence;
        }

        $sequence = transform_from_type_A_to_type_B($sequence);

That might seem like more code, but when reading a big wad of code its nice to
be able to just look and say "oh, they're transforming from type A to type B"
instead of reading 4 lines about how you're doing that. A literary analogy
would be:

He drew a square. [1]

[1] See appendix 5 for details.

vs

He decided to draw a square. He grasped his pencil firmly about the middle
and pressed the sharp, black end against the paper with enough force to leave
a mark but not to tear tender white skin. His arm lashed upwards leaving a
straight black line! Then a turn to the right and another! Another turn,
another line! Finally, his masterpiece nearly complete, he dragged the pencil
back to its starting point closing the cycle like an old man after a full life
returning to the womb of death.

Even regular expressions can be named.

        my $non_amino_acids =~ qr{[^AGCT]+}i;

        $sequence =~ s/$non_amino_acids//g;

Finally, sometimes you write code in a non-obvious way and you need to write
down why you did it that way and what its doing. Usually its because 1)
you're working around a bug 2) its faster to do it the non-obvious way 3)
you're making use of some little known feature.

This is all intermediate level stuff that you usually work out after you've
got a decent grasp of the language, but its nice to keep in mind for now. At
the point you're at, work on getting your names right and err on the side of
too many comments.

As for good examples of external documentation... prove isn't bad.
http://search.cpan.org/~petdance/Test-Harness-2.64/bin/prove

dbiprof is another
http://search.cpan.org/~timb/DBI-1.54/dbiprof.PL

They're both a little scanty and assume the reader already knows about testing
and profling, respectively. Probably assumes too much.

Test::Simple's documentation is pretty good, even though its a library and not
a program.
http://search.cpan.org/~mschwern/Test-Simple-0.70/lib/Test/Simple.pm

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • A perfect example to illustrate one of your points: I was working on some code when I ran across several instances of a variable named $number. Since this was one of our ubiquitous several hundred line subroutines that we are cleaning up, it was very, very frustrating to see such a useless variable name. After a bit of hunting through the code, I finally found the following:

    # $number is invoice number
    if ( defined $number ) {
        my @results = $dbh->selectrow_array(...);
        ...

    H

  • I'd like to recommend the debugger's comments and documentation (even if I did write them - http://ibiblio.org/mcmahon/perl5db.html [ibiblio.org] for the formatted POD and http://ibiblio.org/mcmahon/perl5db.pl [ibiblio.org] for the raw source). I worked very hard to make them work both as an integrated whole (the POD summarizes, the comments detail).
  • I keep writing these off-the-cuff essays to people in response to questions and I really should archive them somewhere and turn them into something.

    Yes, you should.
    --
    Kirrily "Skud" Robert perl@infotrope.net http://infotrope.net/
  • Nice writeup, and well put.

    Another point that I would have made is: often, novices structure code in weird ways, with procedures that aren’t very useful outside the context in which they were conceived because they are big and do too many things. They use variables that get recycled over too many intermediate calculations, or conversely fail to break calculations into intermediate steps saved in variables.

    Naming is a good detector for such problems. If you can’t think of a good precise name

    • When I name an array after the objects it contains, I don't know whether to use the plural, or the singular, ie whether to call it objects, or object.

      @persons means I start saying $persons[0], $persons[2]. @person reads nicer when referring to the elements, $person[1], $person[2].

      But the second might confuse a reader of the code?
      • No don't name it after the members. Name it after the collection. Thus, the array becomes @person. When read out loud, it becomes "the person array".

        This avoids ever having to ask the "is it singular, is it plural?" question. Name all your variables in the singular, even if, in some corner cases it comes out sounding a little weird.

        If everything is always singular, no exceptions, you never have to stop and pause to think about its name. This optimisation pays off big time in the long run.

        • My array and hash names are always singular, but not my scalars. When a scalar is intended to store an arrayref, I use the plural, and when it’s intended to store a hashref, its name ends in _for.

          I agree otherwise, though.

          • Can you sketch an example?

            I can't see how it can be a win over a blanket "no plurals" regardless of type and/or content. But you're no dummy, so I'd like to if I can be convinced :)

            • It’s basically the same as people who like to append _ref or _r to the names of their reference-storing scalars. The only difference is that with this scheme, the code reads more like plain English.

      • For arrays and hashes, I always use the singular form. The plurality is already implied by the sigil – putting it in the name as well would be repetition. Note also that there are many cases where you use the collection as a whole, which read nicer with a singular name, such as push @person, $last_record (OK, I suppose that can be argued either way) or keys %person_name.

    • I read in a software engineering textbook about a South African software company that went broke, because its code had been written by Portuguese speakers, who used Portuguese for their variable names. The Portuguese programmers left the company, and were replaced by Afrikaans-speaking ones, who couldn't maintain the code, because they couldn't understand Portuguese.

      I found this hard to believe, but I've never had the experience.
      The textbook writer, from South Africa, said names must be in English.

      I was thi
  • would be an awful lot of fun. :) "It was a dark and storm night in front of the green glow of the monitor as I snacked on pop tarts and contemplated what I am about to write to you....". One of my Engligh teachers in high school was his great-great granddaughter which made for a lot of fun at times. :)
  • Hi Schwern!

    Thanks for taking the time for writing this and thanks for sharing it with us. I couldn't have said it better myself.<tm>.
  • "I keep writing these off-the-cuff essays to people in response to questions and I really should archive them somewhere and turn them into something."

    What can I do to help make "Dear Schwern:" a reality?