Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

autarch (914)

autarch
  (email not shown publicly)
http://www.vegguide.org/

Journal of autarch (914)

Friday October 24, 2008
08:32 PM

I'd Like to Be Dead Like Perl

The "Perl is Dead" meme has been going around for some time. It seems like one of those self-reinforcing things that people keep repeating, but where's the evidence? The other half of the meme is that other dynamic languages, specifically Ruby, Python, and PHP are gaining market/mind share.

That is true. I hear a lot more about Python, Ruby, and even PHP these days than I did five or ten years ago. Does that mean Perl is dead? No, it just means Python, Ruby, and PHP are doing better now than in the past. That's not bad for Perl. On the contrary, my theory is that a rising "dynamic languages" tide will lift all boats.

Tim Bunce wrote about job posting trends in February of 2008, and it's interesting reading. Unsurprisingly (to me), all of Perl, PHP, Ruby, and Python jobs are growing, and while Ruby and Python are growing faster than Perl, Perl is still way ahead of them. My guess is that eventually they'll level out around Perl's percentage and start growing slower.

Today I was thinking about Perl's reported morbidity (in the context of a relatively stupid "Perl 6 is Vaporware" article (that I don't care to link to because it was lame)).

Perl could have a lot of jobs and still be dead. After all, COBOL has a lot of jobs, but no one thinks of COBOL as a "living" language, it's just undead.

I decided to take a quick look at books instead. My theory is that if people are buying books on a topic, it must have some life, because that means someone wants to learn about said topic.

The flagship Perl book is O'Reilly's Learning Perl. The fifth edition was just released in June of this year.

It's currently #3,984 amongst all books, which isn't bad. Even more impressive, it's #1 in the Amazon category of "Books > Computers & Internet > Programming > Introductory & Beginning". This would be more impressive if this category included Learning Python, but I don't think it does.

O'Reilly's Learning Python is also doing well, at #3,357 among all books. In fact, this is the highest rank book of those I looked at.

O'Reilly's Learning Ruby is at #194,677, which I can only assume reflects the book, not Ruby itself. The best-selling intro-level Ruby book is (I think) Beginning Ruby: From Novice to Professional, at #23,024.

So Perl seems to be holding its own, and for some reason the intro Ruby books aren't selling well.

On to O'Reilly's Programming Perl, which is the Perl reference, despite being rather old (8 years). It's at #12,428.

O'Reilly's Programming Python is at #32,658. I would've expected Dive Into Python to do much better than #177,394. It has very high ratings, much better than Programming Python, and I've heard good things about it on the net. Go figure.

O'Reilly's The Ruby Programming Language is at #5,048 and Programming Ruby is at #13,125. My guess is that many people skip the intro level Ruby books in favor of these two.

So what's the summary? Each of these three languages has at least one book in the top 10,000, and the best selling books for each language are all relatively close. Certainly, Perl is looking pretty good in this light.

Another interesting thing about the Perl book market is the sheer number of niche Perl books out there, one of which I co-wrote. Compare O'Reilly's Python book page to their Perl page. Of course, the Python page has more recent books, but maybe they're just catching up on topics Perl had covered years ago.

This is all quite unscientific, but I think there's some value here. My conclusion is that Perl is not quite dead yet, and is in fact doing reasonably well. While it may not have the same buzz that the new kids have, people still want to learn it.

Cross-posted from House Absolute(ly Pointless) - permalink .

Tuesday October 21, 2008
05:35 PM

But I Like Docs, Roy!

Roy Fielding, the inventor of REST, wrote a blog post recently titled REST APIs must be hypertext-driven. It's quite hard to understand, being written in pure academese, but I think I get the gist.

The gist is that for an API to be properly RESTful it must be discoverable. Specifically, you should be able to point a client at the root URI (/) and have it find all the resources that the API exposes. This is a cool idea, in theory, but very problematic in practice.

A consequence of this restriction is that any sort of documentation that contains a list of URIs (or URI templates, more likely) and documentation on accepted parameters is verboten.

Presumably, if I had a sufficiently smart client that understood the media types used in the application, I'd point it at the root URI, it'd discover all the URIs, and I could manipulate and fetch data along the way.

That's a nice theory, but has very little with how people want to use these APIs. For a simple example, let's take Netflix. Let's assume that I want to use the Netflix API to search for a movie, get a list of results and present it back for a human to pick from, and add something from that list to my queue.

Without prior documentation on what the URIs are, how would I implement my client? How do I get those search results? Does my little client appgo to the root URI and then looks at the returned data for a URI somehow "labeled" as the search URI? How does my client know which URI is which without manual intervention?

If I understand correctly this would somehow all be encoded in the definition of the media types for the API. Rather than define a bunch of URI templates up front, I might have a media type of x-netflix/resource-listing, which is maybe a JSON document containing label/URI/media type triplet. One of those label/URI pairs may be "Search/http://...". Then my client POSTS that URI using the x-netflix/movie-search media type. It gets back a x-netflix/movie-listing entity, which contains a list of movies, each of which consists of a title and URI. I GET each movie URI, which returns an x-netflix/movie document, which contains a URI template for posting to a queue? Okay, I'm lost on that last bit. I can't even figure this out.

Resource creation and modification seems even worse. To create or modify resources, we would have a media type to describe each resource's parameters and type constraints, but figuring out how to create one would involve traversing the URI space (somehow) until you found the right URI to which to POST.

Of course, this all "just works" with a web browser, but the whole point of having a web API is to allow someone to build tools that can be used outside of a human-clicks-on-things-they're-interested-in interface. We want to automate tasks without requiring any human interaction. If it requires human intervention and intelligence at each step, we might as well use a web browser.

I can sort of imagine how all this would work in theory, but I have trouble imagining this not being horribly resource-intensive (gotta make 10 requests before I figure out where I can POST), and very complicated to code against.

Worse, it makes casual use of the API much harder, since the docs basically would say something like this ...

"Here's all my media types. Here's my root URI. Build a client capable of understanding all of these media types, then point it at the root URI and eventually the client will find the URI of the thing you're interested in."

Compare this with the Pseudo-REST API Fielding says is wrong, which says "here is how you get information on a single Person. GET a URI like this ..."

Fielding's REST basically rules out casual implementers and users, since you have to build a complete implementation of all the media types in advance. Compare this to the pseudo-REST API he points out. You can easily build a client which only handles a very small subset of the APIs URIs. Imagine if your client had to handle every URI properly before it could do anything!

In the comments in his blog, Fielding throws in something that really makes me wonder if REST is feasible. He says,

A truly RESTful API looks like hypertext. Every addressable unit of information carries an address, either explicitly (e.g., link and id attributes) or implicitly (e.g., derived from the media type definition and representation structure). Query results are represented by a list of links with summary information, not by arrays of object representations (query is not a substitute for identification of resources).

Look at last sentence carefully. A "truly RESTful API", in response to a search query, responds not with the information asked for, but a list of links! So if I do a search for movies and I get a hundred movies back, what I really get is a summary (title and short description, maybe) and a bunch of links. Then if I want to learn more about each movie I have to request each of 100 different URIs separately!

It's quite possible that I've completely misunderstood Fielding's blog post, but I don't think so, especially based on what he said in the comments.

I'm not going argue that REST is something other than what Fielding says, because he's the expert, but I'm not so sure I really want to create true REST APIs any more. Maybe from now I'll be creating "APIs which share some characteristics with REST but are not quite REST".

Cross-posted from House Absolute(ly Pointless) - permalink .

Saturday October 11, 2008
11:01 AM

You don't need to scale

Programmers like to talk about scaling and performance. They talk about how they made things faster, how some app somewhere is hosted on some large number of machines, how they can parallelize some task, and so on. They particularly like to talk about techniques used by monster sites like Yahoo, Twitter, Flickr, etc. Things like federation, sharding, and so on come up regularly, along with talk of MogileFS, memcached, and job queues.

This is lot like gun collectors talking about the relative penetration and stopping power of their guns. It's fun for them, and there's some dick-wagging involved, but it doesn't come into practice all that much.

Most programmers are working on projects where scaling and speed just aren't all that important. It's probably a webapp with a database backend, and they're never going to hit the point where any "standard' component becomes an insoluble bottleneck. As long as the app responds "fast enough", it's fine. You'll never need to handle thousands of request per minute.

The thing that developers usually like to finger as the scaling problem is the database, but fixing this is simple.

If the database is too slow, you throw some more hardware at it. Do some profiling and pick a combination of more CPU cores, more memory, and faster disks. Until you have to have more than 8 CPUs, 16GB RAM, and a RAID5 (6? 10?) array of 15,000 RPM disks, your only database scaling decision will be "what new system should I move my DBMS to". If you have enough money, you can just buy that thing up front.

Even before you get to the hardware limit, you can do intelligent things like profiling and caching the results of just a few queries and often get a massive win.

If your app is using too much CPU on one machine, you just throw some more app servers at it and use some sort of simple load balancing system. Only the most brain-short-sighted or clueless developers build apps that can't scale beyond a single app server (I'm looking at you, you know who).

All three of these strategies are well-known and quite simple, and thus are no fun, because they earn no bragging rights. However, most apps will never need more than this. A simple combination of hardware upgrades, simple horizontal app server scaling, and profiling and caching is enough.

This comes back to people fretting about the cost of using things like DateTime or Moose.

I'll be the first to admit that DateTime is the slowest date module on CPAN. It's also the most useful and correct. Unless you're making thousands of objects with it in a single request, please stop telling me it's slow. If you are making thousands of objects, patches are welcome!

But really, outside your delusions of application grandeur, does it really matter? Are you really going to be getting millions of requests per day? Or is it more like a few thousand?

There's a whole lot of sites and webapps that only need to support a couple hundred or thousand users. You're probably working on one of them ;)

Cross-posted from House Absolute(ly Pointless) - permalink .

Friday September 26, 2008
07:59 PM

Perl with debugging symbols on Win32?

I'm trying to figure out what's causing a new test failure for Moose on Win32. I've reproduced it with Strawberry 5.8.8 on a Windows XP VM, and also found that it does not happen with 5.10.0.

It's a segfault in perl.exe, and without a perl.exe with debugging symbols, all I can get out of the failure is a stacktrace in assembly, which is clearly not very helpful.

Are there any Win32 experts out there who could do some investigating? Just getting a stack trace with the Perl core function names in the trace would probably be very helpful. Actually fixing the problem even more so ;)

Saturday September 06, 2008
11:00 AM

Cross-posting from Movable Type to use Perl

If you're seeing this on use Perl then the cross-poster is working. You can get it from my svn. You'll also need to install WWW::UsePerl::Journal, which I monkey patch like crazy in the plugin. I have submitted patches to barbie, though, so hopefully that'll go away in the future.

The plugin isn't too smart, so if you save the same entry it'll re-crosspost each time. Patches welcome, of course.

Cross-posted from House Absolute(ly Pointless) - permalink .

Wednesday September 03, 2008
05:43 PM

Answering "What is Moose" and "Why Moose"

I now have my own blog elsewhere. I finally got some programming content on it, New Moose Docs Aim to Answer "What is Moose?" and "Why Moose?".

As an aside, anyone have a plugin for Movable Type to crosspost to a use Perl journal?

Saturday August 16, 2008
10:33 AM

Frozen Perl mini-documentary on Current

My friend Gabe Cheifetz made a mini-documentary at Frozen Perl 2008 for Current TV. You can check it out at current.com.

Unfortunately, current made some additional changes to Gabe's final product, apparently mostly to introduce factual errors. Doh! You'll notice them as soon as you watch it, but I left a comment on their site with corrections for non-geeks.

Nonetheless, I think it's neat.

Friday August 15, 2008
09:28 AM

Looking for Co-maintainers (Log::Dispatch, maybe others)

I have a lot of modules on CPAN. There are way too many for me to give them all the attention they deserve, so often patches get dropped and bugs ignored.

In particular, there are a few that could use the attention of someone interested in helping maintain them. Log::Dispatch is used by a lot of people, and could definitely use some attention. There are a number of open bugs in RT, and I have more patches & bugs in my email inbox.

Another distro that could use some love is Alzabo. I don't think it has a lot of users, but if you are a user of it and want to see it better maintained, let me know.

If there is some other distro of mine that you'd like to help maintain, especially one with open bugs, please let me know.

Write to autarch@urth.org and tell me what you'd like to help with. If I don't know you, I'll probably ask you to start by submitting a patch. All of my code is in my svn repo, so getting the latest version is easy (https://svn.urth.org/svn/Distro-Name/trunk).

Friday February 29, 2008
06:25 PM

So you want to host a Perl workshop ...

Frozen Perl has come and gone, and I think it was a great success. There were no major catastrophes, nor any minor ones, really, and the feedback we've gotten has been very positive. We're already talking about doing this again next year.

This writeup is intended to help remind us what we can do better next year, and also to give people thinking of organizing a similar event some advice on how to do it. First, check out the writeup I did for the yapc.org site on planning a workshop. That will tell you how TPF can help you out, and give you some ideas of the basics you need to plan for.

First Things First - Venue

The absolute most important thing to do, as early as possible, is to book your venue. This was a source of pain for us, though the venue we ended up at was great. If you're planning to do an event at a university, see if there's a student group who can help you book the venue. We worked with the Bioinformatics group at the University of Minnesota, and as a result saved nearly 50% on the venue. This had a huge impact on our budget, as it basically saved us about 17% of our income!

Of course, you need to double-check things like audio-visual for your venue, though nowadays any new venue should have all the AV equipment you'd want.

Second Things Second - Sponsorship

The other thing that you can't do early enough is start contacting sponsors. If you're planning to keep the event very cheap, then you should plan to pay for most things with sponsorship. For our event, approximately 2/3 of the total expenses were paid by sponsors. Some sponsors will be very quick to respond, and some will take some cajoling. Another thing to realize is that you'll ask many, many more places than will respond. We contacted 30+ companies, and got 8 sponsors total.

Most of those sponsors were very quick to say yes. My conclusion from this is that your time is best spent contacting as many different companies as possible, rather than focusing on repeated contacts with a few.

If you're in the US, sponsors will be able to treat their sponsorship as a tax-deductible donation, though most of our sponsors didn't seem too concerned about this. Keep in mind that sponsors are getting something of value for their sponsorship, such as free admission and advertising. That should be accounted for in any thank-you letters you write.

Other sponsors may want to treat the sponsorship as a marketing expense, and as such may want an invoice for the full amount which breaks down the items they're getting. You can probably put things in this invoice like "20 free admissions", even if the sponsor isn't going to use them. They just want to be able to break down the expense in some way. Don't be afraid to ask them exactly what they want.

Budgeting

One initial mistake I made was to forecast our income and expenses for 80 attendees, assuming all 80 were paying. Obviously, this isn't the case, because organizers and speakers don't pay. We ended up with approximately 95 attendees, of whom maybe 80 paid. This includes a few day of registrations (5 or so). We also had 6 or so people who paid but didn't show up, but this didn't really matter for the budget, since we had already ordered their meals and t-shirts. It did mean the day-of registrees got lunch though.

Also keep in mind that with very low ticket prices, each additional attendee ends up being a net loss. Our expenses per attendee were close to $60 each, but the highest ticket price was $40. This wasn't a problem, but if we'd had 150 attendees, it'd would've been a disaster. Keep this sort of dynamic in mind when doing your budget forecasts.

Here's the final budget breakdown for the curious:

Income

Registration fees: $2,190 - a few folks opted to pay more than the ticket price, though that only added a $100 or so to this amount
Sponsorship: $5,000
Total: $7,190

Expenses

Venue: $1,250 - this was really cheap for such a nice venue
Event catering: $3,290.04 - included a continental breakfast, sandwiches for lunch, and an afternoon snack
Wireless access for the event: $0? - we may be getting this for free, otherwise it'll probably be around $200
Hackathon venue: $357.15 - included a $107.15 for internet, yeesh
Hackathon food: $245.15 - included a surpsising 20% service charge
T-shirts: $871.10 - these were 4-color silkscreens on black, American made (aka no sweatshop) shirts
Saturday dinner: $200 - I paid for everyone's appetizers and salads, knowing that we had a decent surplus
Total: 6213.43

This leaves us with a net surplus of $976.57. We'll probably do some sort of targeted grant with this money.

One thing to keep in mind is that some expenses may not be 100% fixed until after the event. For example, we paid for some of the food at the event based on consumption. It's a good idea to make sure you expect to have at least a few hundred dollars in surplus after the event.

Registration

ACT's registration process is damn confusing, though hopefully this will be fixed in the future. Basically, what it calls "registration" is the act of making an account or indicating interest in the event using an existing account. It is not paying.

We forgot to tell sponsors that if they wanted to use their free admissions, they needed to have people register in advance. We were fortunate that we had enough no-shows on the day of the event that we had enough meals! I almost forgot to account for our keynote speaker too, which really would have been lame.

Make sure that all of these people have accounts in the system and are marked as attending so that you include them in meal counts.

Random Notes

We were awfully late to make our shirt design, which meant a scramble to get them printed, and it also meant that Stephen Perkins had to pay for them out of pocket and get reimbursed by TPF.

We may consider closing registration a bit earlier next year to ameliorate this problem. We also did a bad job of making sure people got a shirt of the size they requested, and ended up not having shirts for some folks.

The lightning talks slide dance is a bit of a time waster. This year Ken put most of the talks on his laptop, but some folks had trouble using it cause they're not familiar with Macs. A KVM might be a better solution.

I hope all this information is useful to anyone out there planning their own event.

Thursday February 21, 2008
08:25 PM

TPF, money, and grants

Something I've talked about recently with a few folks is that TPF has "too much" money. Specifically, TPF has reported an increasing balance at the end of its last couple fiscal year. Nonprofits are not supposed to consistently make a profit (no kidding), and this has a bad smell. I don't think there's anything fishy going on, mind you, it just doesn't look right.

Part of the problem comes from a large $35k grant paid to TPF by NLNet for Parrot. TPF got the money from NLNet back in 2005 (IIRC) and it took a while before that money started flowing out of TPF.

In discussions with a couple folks in TPF, they've said that they'd really like to spend the money, but they don't have avenues to do so. As most folks know, conferences and workshops are generally profitable for the organization (Frozen Perl netted around $1k) so that's not a way to spend money.

Then there's the grants program. I'm on the grants committee list and I've seen all the grants that've come through for the past couple years. The problem with the grants system is that there's not nearly enough grant applications coming in. Then of the applications that do come in many are simply unrealistic. Either they are too vague, too niche, or too hard, and so don't get approved.

I've been thinking about this recently and I think that this failure mode is basically built-in to the current grants system. First, TPF has a sort of unwritten rule that it won't fund travel, because there are so many Perl folks who'd like to go to so many Perl events that it would be hard to handle. I think there's definitely some truth to this, though I could see a use in funding travel specifically for project hacking (which TPF has done from time to time).

Another unwritten rule is that individual grants will not be more than $10k. This also makes sense, as $10k is a lot of money to give to one proposal. So what's the problem?

I for am unlikely to ever apply for a grant under the current scheme (ignoring the fact that I'm ineligible because I'm a grant manager), even though I could probably come up with something TPF would be willing to fund.

The problem is that I just can't see how a grant could be an incentive for me. I already put a fair bit of my time into FS/OSS projects just because I want to do so. I'd love to put in much more time, but I have things like a mortgage and family to consider. Realistically, the only way I'm going to put more time into my projects is to take a sabbatical from work, or at least work part-time for a while.

But if you look at the work/money ratio for past grants there's no way that could happen. Even if I aimed for something like %60-80 of my current FT income, a grant could not come close. It'd be more like 20-30%. So the grant provides no incentive for me. I suppose I could still apply for one anyway, but I don't feel right about that because it would just be funding work I would do anyway!

I'm sure many other developers have the same issue. So what is the point of the grants program, if not to make work happen that wouldn't otherwise happen? The acknowledgement of one's work is nice, but I already get that in many other ways, and if I'm looking for acknowledgement in the form of cash, I'd expect a heck of a lot more than a couple thousand dollars (Amazon and others, contact me privately for an address to which you can mail a big fat check).

Personally, I'd be perfectly happy to see TPF fund three months of full time work on any Parrot, Perl 5, or some other project likely to be of great benefit. Of course, the number of people eligible for this sort of grant are few in number, but it might achieve more in the long run.