Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

cyocum (7706)

cyocum
  (email not shown publicly)
http://cyocum.blogspot.com/

An American post-graduate student living in Scotland.

Journal of cyocum (7706)

Monday March 10, 2008
09:00 AM

Reading List Managment

[ #35872 ]

Well, as many of you know from your own experience, having a reading list for your research can get a bit tedious. At first, I just had a plain text file with the title of the book and the shelfmark number for my university's library. The problem with this solution is that I had to manually erase stuff as I read it and I had duplicate entries because it was getting pretty large. In addition, my university is moving from the Dewey Decimal System to the Library of Congress System which means that my selfmark numbers sometimes go out of date and I had to go look the item up again. So, I decided that it was time to get my computer to manage the list for me. This way I could reduce duplicate entries and I could write a way of picking random books to read. One other goal was to integrate journal articles in the list as well.

At first, I thought that something like LibraryThing might be the easiest solution. While it was fairly easy to get book information, it did not allow me to enter journal article information or other information that might be of interest to scholarly users. So, after playing around with it, I decided that writing my own would be the best idea and allow me to flex my programming muscles again.

The first problem that I thought about was file format for the list. As it is just a list of hashes that will be stored in an array, I first thought I would use something like YAML::XS. It compiled fine on my system (AMD64) so I thought it would work (come to find out it segfaults on large data structures and I had to move to YAML::Syck which worked perfectly; I will do some more investigation before filing a bug).

With the file format out of the way, I had to think about data entry. I hate data entry. So, I did a quick search for something that would allow me to interface with the Library of Congress or the British Library. Well, I discovered a module called ZOOM which implements the Z39.50 protocol for library information. As I am using Ubuntu, I thought that I could install it fairly easily. Nope, Ubuntu has an old version of the YAZ library which ZOOM depends on and does not work with the newest version of ZOOM so I downloaded the library source and compiled it myself. ZOOM then installed perfectly and it works. That is one thing I really love about using Linux.

One of the problems here again is the MARC21 format which is what the Library of Congress spits out on a successful ISBN search of their database. The main hurdle is that it is difficult (or I do not know enough about the format) to determine author vs. editor of a book. From the documentation for the MARC21 format, it seems that the author could be tag 100, 110, or 111 and the editor could be tag 700 or not; I am not sure. So I have some code to look at each of these tags, using Marc::Record, such that I can get everyone in the output correctly (and even then I get it wrong sometimes). I also looked at Dublin Core Metadata, which is in XML and can be produced by the Library of Congress Z39.50 gateway. I had a very similar problem (again it could be that I do not know the format or that I am being an idiot) as there is no tag for author or editor just a "creator" tag, which is fine but I would really like to know if the "creator" is an author or an editor of a book.

Otherwise, it works fine (I love getting the correct Library of Congress Call Number as well as a good Dewey one). One of my last problems is that there seem to be no one metadata storage place for scholarly journals. I can imagine one fairly easily. There are two competing standards Digital Object Identifier (DOI) and Serial Item and Contribution Identifier (SICI). JSTOR supports both DOI and SICI but as JSTOR does not cover my discipline very well, it is a bit useless. For now, I have to enter the information manually which is a pain as I like to just copy and paste a number then have the information automatically inserted into the list.

Also, if anyone knows anything about the MARC21/Dublin Core formats and could give me some pointers (or show me where I am being stupid), that would be most appreciated. Also, if anyone knows a metadata repository for scholarly material in journals that is fairly comprehensive, that would be most helpful.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • Please note that the MARC format is just a format, specifically aimed at data interchange between library databases. It doesn't define what and how things are cataloged.

    As to your example of the 100, 110 and 111 tags, it's dependent on if the author is a person, a company or a meeting. You also have the 130 that is the "uniform title" that overrides those three in most catalog displays. Here's a page that describes the differences between the three: http://www.loc.gov/marc/bibliographic/ecbdmain.html [loc.gov]

    --

    --
    xoa

    • Thanks for this info! I had read bits here and there about the format but I was more interested in pulling certain types of information out of it. Yes, going all MARC is really overkill for what I want. I have appended one of my entries which pretty much has everything that I need so that you get a taste of what I am doing.

      -
        DDN: 270.2/092
        ISBN: 1851821872
        LCCN: BR1720.M33
        editors:
          - Poppe, Erich.
          - Ross, Bianca.
        place: Blackrock, Co. Dublin, Irel

  • Depending how much data you need, you might try: http://isbndb.com/ [isbndb.com]
    • I had looked at that but I wanted an information source that was more "authoritative" if there is such a thing, which is why I went with the Library of Congress and the British Library.