Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

cyocum (7706)

  (email not shown publicly)

An American post-graduate student living in Scotland.

Journal of cyocum (7706)

Friday May 21, 2010
02:31 PM

Demystifying Ocaml's Functors

Since I started using Ocaml, functors have been rather mystifying to me. They are Ocaml's highest form of type abstraction through modules and, for beginners like myself, they can be down right frustrating to understand. So, I took it upon myself to learn how to write one of these things by hook or by crook. Here is what I took away from that experience.

What will do here is take you through a trivial and probably completely useless example of a Functor which creates types of Linked Lists. You will need to know about Ocaml before you dig into this so I recommend having a look over here.

So, first you need to have a module type for which you will create a functor:

module type S =
  type t

This creates a module type with only one thing in it: a generic type expression. Basically, this says: if a module is of type S then it will have a type t defined in it.

Now for the functor itself:

module Make (LinkedList : S) =
  type t = Empty | Node of LinkedList.t * t ref

  let make () =
    ref Empty

  let insert x ll =
    let new_node = Node(x, ref Empty) in
      match new_node, !ll with
      Node(_, next), Empty ->
        next := Empty;
        ll := new_node
    | Node(_, next1), (Node(_, next2) as head) ->
        next1 := head;
        ll := new_node
    | Empty, _ -> raise (Invalid_argument "Empty Argument")

  let rec search x ll =
    match !ll with
     Empty -> false
      | Node(y, next) ->
       if x = y then
         search x next

What you should notice here is that there is code here. Basically, any code that can be written without a reference to the type of thing being written should be put here. You should notice the LinkedList.t in the type expression at the top of the module. (LinkedList : S) means that LinkedList is of type S so has a different type than Make's type t (remember Make is itself a kind of module).

Now, we want to make LinkedList.t a concrete type. We do this by applying the Make functor to the module that we want to apply that functor to. So basically we want to apply the module type S to the module that we will create and attach the code in Make to that module. So, let's do that:

module IntLinkedList = Make(struct type t = int end);;

See the struct inside the Make? That is your module that you want to attach the code in Make to. Because the module is of type S you must now declare what type t is. This is also the case for ANY functions in module type S. You can actually define the module else where with a name and pass that name into Make.

Now, when you put this into the top level of Ocaml, you should get this signature:

module IntLinkedList :
    type t = Empty | Node of int * t ref
    val make : unit -> t ref
    val insert : int -> t ref -> unit
    val search : int -> t ref -> bool

Note how the line in Make: type t = Empty | Node of LinkedList.t * t ref has been replaced by the type that you specified in the call to Make. This means that this is now a Module that creates Linked Lists of Ints.

You can now stack Functor created modules like so:

module IntSet = Set.Make(struct type t = int let compare = compare end);;
module LinkedListIntSet = Make(IntSet);;

module IntSetLinkedList :
    type t = Make(IntSet).t = Empty | Node of IntSet.t * t ref
    val make : unit -> t ref
    val insert : IntSet.t -> t ref -> unit
    val search : IntSet.t -> t ref -> bool

Now, you may be wondering to yourself: why has he gone through all this trouble when he could have just used type parameters? That's the canonical way to write linked lists in Ocaml. It is just as type safe and probably faster. However, I can use a sledge hammer to crack a nut. This small example, however, shows all the steps that you need to go through to write a functor in the first instance. This is also small enough to understand. I hope it helps those out there grappling with the idea of functors.

Friday August 14, 2009
05:24 PM

LaTeX, the Humanities, and PDF commenting

One of the biggest problems when working with Humanities scholars (or scholars in other areas) is that they are accustomed to using the commenting features of their favored format (usually Word). Now, PDF has the facility to allow commenting. The only problem is that Adobe keeps the keys to this particular kingdom pretty tight. No open source PDF thing (other than PDFEdit, which isn't very stable from what I have read around the web) can add comments to a PDF in a graphical environment.

This brings us to closed source but free products that allow you to add comments graphically to a PDF. There are two that I know of and both work on Windows: Foxit Reader and PDF-XChange Viewer. Foxit has a free Linux version but it doesn't allow commenting yet. However, Foxit does work under Wine but I have not tried it yet.

This takes down one more barrier to adopting LaTeX in the humanities if you can get your University or supervisor to support an non-Adobe PDF reader.

I cannot wait for something like GNU Juggler or maybe something using PoDoFo library to allow real commenting on PDFs.

Tuesday February 10, 2009
03:35 PM

Scholarly Citation in a Digital World: Some Thoughts

Amazon has released the newest version of the Kindle. This event has caused me to re-evaluate the relationship of Humanities scholarship and its most basic and technical part, to wit, citation. Citation is the bedrock upon which scholars in the Humanities (and the Sciences) build upon and comment on each other's work in publication. It is also the source of much of scholarship's tedium. While thinking about the Kindle and the way in which the system works (it seems that the inner format is a limited form of HTML), I came to the conclusion that citation will become a great battle ground in the future of scholarship. The fundamental problem is, simply stated, this: HTML does not guarantee the placement of a particular piece of text anywhere in the document or on the screen.

The foundation of citation is the page number or, even, the concept of the page. This is, of course, taken from the idea of a book. This, however, does not hold within the realm of markup languages. The renderer of a markup page is generally allowed great freedom to present information on the computer screen. As the idea of a page is broken down by this and the fact that the Kindle and other ebook readers do not have a standard page size, the technicalities of citation in scholarship will take greater precidence than before.

To forestall much criticism, the idea of the document fragment as part of the SGML and HTML standards is not specific enough of a tool for what it is worth. This is especially true with handcrafted HTML which is created by a not so technically inclined scholar who might forget to put the appropriate anchor tags in the proper places in their text. In terms of XHTML, this might be overcome by the use of the "id" attribute and XPath to create a new form of link that would allow a reader to link directly to a specific paragraph by its "id" attribute. In more general terms, the XML standard XLink, which is sadly not widely implemented, might allow this. In terms of PDF, as it is an electronic facsimile of a book, the concept of the page is still useful in citation and does not cause much difficultly in this regard.

Whither then the citation? While some citation book have added electronic citation to their standards, they tend to use full URLs and date accessed (see MLA for an example). This is inadequate when many in the developed world are moving to devices like the Kindle and the concept of a page becomes much more nebulous. This also effects the citation of electronic resources like, which I feel is the future model for the online scholarly journals. While use of PDF with its page numbers may be satisfactory in the PDF realm, when scholarly publishers move to more fluid models of text presentation and digital only publication, the situation in the citation of these resources will be difficult. One way around this would be the ubiquitous use of DOI, but this does not solve the underlying issue of how to cite specific parts of the text. In the end, there are no easy answers but a discussion must take place and a forum created where solutions can be proposed.

I would be very happy to hear any ideas about how one might solve this puzzle.

Sunday December 07, 2008
09:45 AM

Perl6 Lives!

Ok, this has probably been done by someone else who is better than me elsewhere but I just wanted to show off some of Perl6 and the fact that you can code in Perl6 and that is just damn cool.

use v6;

say factorial(10);

sub factorial(Int $int) {
  my $fac_times = sub(Int $n, Int $acc) {
    if($n == 0) {
      return $acc;

    return $fac_times($n - 1, $acc * $n);

  die "Wrong argument!!" if $int < 0;
  return $fac_times($int, 1);

What this code does is translate the code from the Wikipedia article on tail recursion from Scheme to Perl6 for the factorial function. It even runs pretty well on an unoptimized build of parrot+rakudo. Thank you Parrot/Perl6 People!

update: I added the die and removed the useless "else" in the inner sub.

Monday November 03, 2008
07:41 AM

Latex + verse package note

When using the verse package in Latex, remember that if you put a square bracket right after a \\[ret] (which appears at the end of a verse line except the last), you will get an error. To get around that error, put the \null command just before the square bracket. I have no idea why it does that or why it works with the no-op \null there but it does.

Saturday September 13, 2008
10:01 AM

Jounal Article Database?

One of the neat things about Google Books is that I can keep track of my library digitally (I can do the same with Library Thing as well). One thing I would really love to have is a similar system for my journal articles. The main reasons for this is the fact that I have a large number of journal articles photocopied and it is kind of difficult to keep track of them (other than the fact that I keep them in huge pile rather than trying to sort them). If anyone knows of such a thing, please let me know.

What would be really neat would be to have a system that hooked into Bibtex so that it would automatically add a hyperref link to the article online rather than just to the bibliography at the end of your book/article.

Monday March 24, 2008
08:36 AM

Another Note about LaTeX and Comments in the Humanities

As many of you know, I have discussed this problem before. I have not found a general solution yet but I wanted to highlight one other problem that has manifested recently. When I get comments back from my supervisors, I have noticed that they either reference their comments by section number or page. Those that have worked with LaTeX know that the section numbering is automatically generated and you do not know the page number of something until you have compiled it to its final form. In this case, I generally lean on EMACS' search function to find where I should change something.

I am now very near the end of my PhD so it is less of a problem than before. However, I have a friend in Education to whom I taught LaTeX but she had to give it up because all of her professors use the comment features of Word to give feedback. I think that if LaTeX or PDF had an easy (and non-expensive) method for obtaining feedback, the Humanities might be more willing to give up its Word habit.

This causes me to wonder how people in Computer Science and Mathematics make comments on a LaTeX produced paper? Hopefully, I will come up with some kind of solution when I have more time to think about it.

Monday March 10, 2008
09:00 AM

Reading List Managment

Well, as many of you know from your own experience, having a reading list for your research can get a bit tedious. At first, I just had a plain text file with the title of the book and the shelfmark number for my university's library. The problem with this solution is that I had to manually erase stuff as I read it and I had duplicate entries because it was getting pretty large. In addition, my university is moving from the Dewey Decimal System to the Library of Congress System which means that my selfmark numbers sometimes go out of date and I had to go look the item up again. So, I decided that it was time to get my computer to manage the list for me. This way I could reduce duplicate entries and I could write a way of picking random books to read. One other goal was to integrate journal articles in the list as well.

At first, I thought that something like LibraryThing might be the easiest solution. While it was fairly easy to get book information, it did not allow me to enter journal article information or other information that might be of interest to scholarly users. So, after playing around with it, I decided that writing my own would be the best idea and allow me to flex my programming muscles again.

The first problem that I thought about was file format for the list. As it is just a list of hashes that will be stored in an array, I first thought I would use something like YAML::XS. It compiled fine on my system (AMD64) so I thought it would work (come to find out it segfaults on large data structures and I had to move to YAML::Syck which worked perfectly; I will do some more investigation before filing a bug).

With the file format out of the way, I had to think about data entry. I hate data entry. So, I did a quick search for something that would allow me to interface with the Library of Congress or the British Library. Well, I discovered a module called ZOOM which implements the Z39.50 protocol for library information. As I am using Ubuntu, I thought that I could install it fairly easily. Nope, Ubuntu has an old version of the YAZ library which ZOOM depends on and does not work with the newest version of ZOOM so I downloaded the library source and compiled it myself. ZOOM then installed perfectly and it works. That is one thing I really love about using Linux.

One of the problems here again is the MARC21 format which is what the Library of Congress spits out on a successful ISBN search of their database. The main hurdle is that it is difficult (or I do not know enough about the format) to determine author vs. editor of a book. From the documentation for the MARC21 format, it seems that the author could be tag 100, 110, or 111 and the editor could be tag 700 or not; I am not sure. So I have some code to look at each of these tags, using Marc::Record, such that I can get everyone in the output correctly (and even then I get it wrong sometimes). I also looked at Dublin Core Metadata, which is in XML and can be produced by the Library of Congress Z39.50 gateway. I had a very similar problem (again it could be that I do not know the format or that I am being an idiot) as there is no tag for author or editor just a "creator" tag, which is fine but I would really like to know if the "creator" is an author or an editor of a book.

Otherwise, it works fine (I love getting the correct Library of Congress Call Number as well as a good Dewey one). One of my last problems is that there seem to be no one metadata storage place for scholarly journals. I can imagine one fairly easily. There are two competing standards Digital Object Identifier (DOI) and Serial Item and Contribution Identifier (SICI). JSTOR supports both DOI and SICI but as JSTOR does not cover my discipline very well, it is a bit useless. For now, I have to enter the information manually which is a pain as I like to just copy and paste a number then have the information automatically inserted into the list.

Also, if anyone knows anything about the MARC21/Dublin Core formats and could give me some pointers (or show me where I am being stupid), that would be most appreciated. Also, if anyone knows a metadata repository for scholarly material in journals that is fairly comprehensive, that would be most helpful.

Saturday February 02, 2008
07:19 AM

Microsoft and Yahoo

I was reading about the merger proposal by Microsoft to Yahoo. I am rather firmly opposed to it. When Gates likes to talk about "innovation", it rings rather hollow when Microsoft does not "innovate"; it buys someone in the space they want to enter. Honestly, I would rather see them continue their own search engine than buy Yahoo. Can they not compete by writing their own? I am rather hoping that they get turned down by the various regulatory bodies or by Yahoo's own shareholders.

Monday December 17, 2007
09:37 AM

On Learning Lisp

Lisp as always been one of those languages which causes many strong emotions, love or loathing. For me, it has always been one of those mythical languages which only exceptionally smart people use (either in academic computer science or elsewhere). I have tried a couple of times to learn the language but I finally found a book (Practical Common Lisp) that explains it in a way that I can understand it.

One of the most hated thing about Lisp is the use of parentheses for delimiting constructs and a complete lack of syntax. On the other hand, the use of macros (both reader macros and normal macros) allows for completely redefining the language at runtime.

I found that the human mind is a very flexible thing if given the chance. The use of parentheses , while a first intimidating, is only a hindrance if you allow it to be (or if you are looking for a reason not to use Lisp) as you continue using the language the parentheses tend to fade into the background as you marshal your functions into the right forms.

While I am still learning the language (I am doing toy-like programming, like reading the riff-header from wav files and writing small macros to work with CLOS), I still have much to learn about functional programming and the Lisp/Scheme style of programming in general. This does not mean that I will be giving up Perl any time soon but I hope it will teach me a few new tricks that I can use in other programming situations.