Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

hex (3272)

hex
  (email not shown publicly)
http://downlode.org/

Perl, RDF and wiki hacker, London, UK. This is my former Perl blog; I now write at Earle's Notebook [downlode.org].

Journal of hex (3272)

Friday November 09, 2007
08:04 AM

A simplified parseable format for Changes files

[ #34864 ]
There's a lot of discussion going on at the moment about machine-readable Changes (or CHANGES) files: miyagawa, LTjake. hanekomu put together a new module, Module::Changes, to parse a "Changes.yml" file; RGiersig made some suggestions for the content of that file.

Discussion so far has mainly been around the use of of YAML. Points raised:

  • YAML is less expressive than RDF (me)
  • RDF is hard to write (miyagawa)
  • People want a simple format (everyone)
  • The format should be transformable from human to machine (everyone)
  • Even YAML can have too much chrome (Alias)

Thinking about all of these, I propose the following. Design constraints were (a) granularity (including Skud's suggestions of what to mention), (b) an absolute minimum of chrome, and (c) trivial to transform into other formats (such as RDF).

    v! 1.3
    @ 2007-11-08T11:15
    # This version was codenamed Muffin because we were listening to Frank Zappa at the time.
    m! This project is now maintained by ZIRCON (of Zircon Software fame).
    l! We have switched licenses. This software now uses the Greater Zork Software License.
    Please ensure that you have read the new license before using this software.
    a! New frobnitz() method - save 50 lines of manual frobnitzing by using this instead!
    b! Fixed the error in quack() where it would actually moo instead of quack. [RT 1234]
    c! The calling convention for rumpelstiltskin() has CHANGED. See perldoc.
    t! Test coverage is now 100%! Go us!

    v 1.3_01
    @ 2007-11-07T09:20
    # Developer preview for 1.3 and the CPAN testers.

    v 1.2.1
    @ 2007-11-02T20:08
    d Fixed some POD formatting mistakes.
    c Refactored accessors into AUTOLOAD. Makes no external difference.
    r Removed the deprecated honkhonkhonk() method as warned several versions ago.

As you can see, each version is represented by a block of lines. Double line breaks separate versions. Each line begins with a token denoting what it describes, optionally suffixed with an exclamation mark, which means "important". When applied to a version number, it implies "major release". (Applying it to a date or comment is meaningless and should be ignored by any parser.) The token is followed by \s+. If an item is split onto multiple lines, it is understood to continue until a new token or block break is reached.

These are the tokens:

    @  Release date. In W3C datetime format (ISO 8601).
    #  A comment.
    a  An addition to the code.
    b  A bugfix. Linking to a ticket here would be nice if it exists.
    c  A change to existing code.
    d  A change to documentation.
    l  A change to licensing.
    m  A change to the maintainer.
    r  A removal of something from the code.
    t  A change to tests.
    v  A version number.

I haven't gone quite as far as RGiersig did in his specification, as I felt that was a bit heavy. For example, release stability in my scheme is indicated by the version number - that should be implied from the existing convention of underscored version numbers for developer releases.

Vague other thoughts - case-insensitive tokens? And maybe a standard block of comments at the beginning of the file explaining what the tokens are to new readers.

Thoughts? I actually like this enough that I might start using it myself.

Update: There's a second draft now.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • in most programming languages prefix ! is a negation so I think using postfix ! is a better choice.

    just my 0.02 EUR
  • Unfortunately, due mainly to lack of indenting I don't like you format.

    Of course, I don't like all the other proposals equally as much.
    • The spec allows you to do this if you want:

      a!
        Added some groovy new feature.
      b
        Fixed that stupid little bug in the gnomon.

      Any better?

  • Each line begins with a token denoting what it describes ... The token is followed by \s+. If an item is split onto multiple lines, it is understood to continue until a new token or block break is reached.

    Maybe I missed something, but what do you do if the word 'a' is the first letter of an item split onto multiple lines? How does the parser know that's not a token?

    • Ooh, good catch. As it stands, it wouldn't. The workaround is not to split a line before an "a" :-)

      If anyone can think of a patch to the spec to fix that without adding complexity (I can't off the top of my head) I'd be interested to hear it.
      • As a format which could conceivably be written in other (human) languages, can you guarantee that none of them will have the same issue? Or that someone might refer to their 'd' subroutine and mess things up?

        Maybe subsequent lines could be indented or the preceding line could end in a backslash?

        • confound on IRC suggested starting continued lines with a '.', but that's more chrome to impede a quick visual scan of the document, as are backslashes. On the other hand, the backslash is a well-known line continuation indicator. I prefer though your suggestion of indenting. Leading whitespace already seems to be commonly used on CPAN to indicate a continued comment.

          a We added a new shiny feature that you'll all love:
            a magic automatic doodad configurator.
          b! A major bug got fixed. Really m

  • Looking at your list of abbreviations, it makes a lot of sense to me, except for one detail: the "d". I believe I'm not the only one to expect the "d" to mean "delete". Instead, "d" is for docs, "r" is for remove. So you've solved it by choosing another word instead of "delete"...

    But I'm sure mistakes are bound to be made. That people accidentally use "d" instead of "r".

    Instead I'd prefer to use another letter for "documentation", but I can't think of any other word.
    • Hmm. "d" for "delete" is a good point, however I find the phrasing "I deleted a feature" a little awkward.

      How about we take a leaf out of diff -u's book and circumvent the issue of which word to use?

      -  removed something
      +  added something

      There's no ambiguity in that...

      • I find the alphabetical codes rather unreadable. They mix too much with the text. Having tags for version number and date seems redundant when those two items are essential (and currently standard anyway.) In my rendition, the version and date are a non-indented header over a set of indented paragraphs which start with sigils.

        v0.1.1 2007-11-10

            + added new thing() method

            - removed old deal() method

            * fixed bug #12578

            % changed code for blah()

         
        • I thought the metaphor for the maintainer symbol was that someone changed their “hat” (as in “putting on my group leader hat, I say that […]”). That seemed funnily apt to me.

  • I think that *incompatible changes* and *security fixes* are very important to indicate separately in a machine readable way. Just like security fixes make you install the new version asap, incompatible changes make you wait until you have tuits for updating your code. (And when they're there together, good luck.)

    For these, I suggest "i" and "s".

    Actually, single letters make bad identifiers. How about the following self-descriptive tags:

    new
    fix
    doc
    incompatible
    license
    maint
    security
    tests

    Where changes and removal
    • I agree with Juerd on all points, but most especially that using a word rather than a letter helps a lot with readability. So, what he said.
      --
      Kirrily "Skud" Robert perl@infotrope.net http://infotrope.net/
    • Compatibility, security, fix: agree that these are necessary splits to "bug fix" ("b" in my original scheme).

      Timestamps: these follow the format specified in ISO 8601 [wikipedia.org], where the "T" is a mandatory separator. I'd like to stick to an existing standard of date representation if possible.

      I think uppercasing is too shouty... adding the important marker would make you end up with "FIX! SECURITY! NEW!". It's a bit tabloid newspaper. :-)

      With all this in mind I'm going to post a revised spec shortly for a seco

      • I think the important marker itself is not important if you split out security/incompatible. If something new is important, bump the version number.

        As for the timestamp, you'd have two things, whitespace separated, instead of one. dateTtime may be the standard, but date time is much more commonly seen in the wild. And for a very good reason.