I come from a C++ background and I'll admit that Perl's object system really confused me initially. The camel book - bless it's heart - really doesn't explain things well at all. I've occasionally thought about learning Moose, but whenever I try to wrap my head around it, I find myself more interested in just using traditional hash-based or inside-out objects. The notion of requiring thousands of lines of code to use an object system just doesn't sit well with me. I'd rather use an object system that's 'native' to the language.
Hash-based objects are, of course, the traditional object system in Perl. So far, they're all I've actually used in 'production' code. As you can imagine, the class structure for most of my code is pretty simple, so sticking with the hash-based object system, despite its deficiencies, has worked out well for me.
The C++ programmer in me really likes the encapsulated feeling of inside-out objects. (I know, the data is not REALLY totally private, but it's good enough for me.) The extra rigmarole required to get good memory management and threading-aware capabilities is somewhat annoying, but I don't really write the sorts of scripts for which these would be major concerns. Still, for some reason, I've never actually played around with an inside-out class system. Maybe I'll use it in my next project, just so I can get a feel for it...
Anyway, the purpose of this post was to state that I, for my part, have really come to enjoy Perl5's traditional object system. (This applies to both hash-based and inside-out objects.) I say this because I have figured out how to use it in ways I never could have used C++'s object system. Anybody well versed with the Gang of Four will say that I could have just used a Decorator class for my purposes, and they're right, but the amazing thing to me is that Perl5's object system supports decorator-like behavior 'out-of-the-box,' without any special setup.
Let me illustrate what I mean. I've been working on a simulator for the Kuramoto model in which I want to observe the behavior of subsets of my given population of oscillators. I wrote a class to hold the simulation internals and a class to take subsets and finally a class that manages a collection of subsets, which I call a Stack. For purposes of my work, the Stack is really the class of interest. My directory structure looks something like this:
My/Compute.pm - simulation internals
My/Slices.pm - sub-set capabilities
My/Stack.pm - manages collections of sub-sets
My/Animator.pm - enables SDL animations
My/Record.pm - enables data recording
My/SaveTo/Piddle.pm - enables saving data to a PDL piddle
Notice those bits that talk about enabling? The Animator, Record, and SaveTo::Piddle modules extend My::Stack in useful ways by adding functions to the My::Stack package and therefore to Stack objects. The great thing is that if I want to record data, I simply
use My::Record; somewhere in my script, and if I don't want to record data, I don't use it. Similarly, if I ever want to extend my recording capabilities with a grander extension, I can write a new one in, say,
My::Record::Advanced and the
use it instead. This is great because it allows me to incrementally add capabilities to my Stack class without having to subclass it or modify the internals of the original class, or break pre-existing code.
Does this system have deficiencies? You bet it does, especially the way I've used it. In particular, this system is very difficult to use as a base class for other systems if I use two non-orthogonal packages, like
My::Record::Advanced. This is because both modules would add conflicting definitions for the
start_recording function. The result is that whenever using this sort of arrangement, you should try as best you can to keep each extension orthogonal to previous extensions. This is not reasonable for large projects, in which case you should use the Decorator design pattern, but for small projects like mine it is quite reasonable and leads to really simple code that Gets the Job Done.
This idea of extending the class with additional modules came to me by studying the PDL source code. I really like this capability. If poorly managed, it could of course lead to major headaches, but if you have a broad but shallow (and orthogonal) set of capabilities that you want to add to a class, like with PDL, this is a great way of doing it. This is a perfect example of Perl getting out of my way just letting me write code.
I've written a write-up on PDL's wiki about using PDL and SDL to create responsive, animated simulations. You can read more about it here: http://sourceforge.net/apps/mediawiki/pdl/index.php?title=Animation_with_SDL. The script at the bottom of the page gives you a simulation of a bunch of noninteracting particles in a box, for which you can change the particle size, change the time step, or even reverse time. The computational code is cleanly separated from the animation code, so it should be straightforward to add particle-particle interactions if desired.
One of the members of the pdl-users mailing list asked for some help for an upcoming talk. He'd like to compare PDL and Matlab. I used to call myself a C++ programmer, and I've written a number of analysis scripts in Matlab before getting fed up with all of it and switching to Perl/PDL. So, well, here goes...
The single greatest advantage that Matlab/PDL offers over the standard hammers - C, C++, FORTRAN, and Java - is that Matlab and PDL scripts require much, much less overhead to get them to actually do something useful. I once wrote a 2000 line C++ simulation program, of which only about 150 lines were actually interesting to me as a scientist. Also, Perl scripts tend to be less of a 'my program can do everything!' solution, because perl scripts are so cheap to write. (Most perl programmers put their effort into writing modules that do everything for them, and then put it on CPAN, which makes their code far more reusable. But I digress.)
The single greatest advantage that PDL offers over Matlab is that PDL is an extension of Perl, a vastly supperior language for systems programming, file handling, data aggregation, etc. This means you'll write even less infrastructure code with PDL than you would with Matlab and it'll be easier to comprehend. In my opinion, Perl is also a more expressive than Matlab.
First a little bit of overview for numerical computing. Matlab and PDL both try to solve similar problems. As we all know, you could do all your numerical work with C, C++, FORTRAN, or Java code and - assuming you had great patience and skill - it would run really fast. However, you pay for your fast code with increased developer hours. Scripting languages like Bash, Python, and Perl take the opposite approach, assuming that you would do better to reduce your developer hours but at the expense of computational speed. We all know this is why we like Perl, because it allows us to get to solving our problems quickly. This is trade-off often favors scripting languages, but numerical programming is an important exception. Matlab and IDL, as well as Numpy and PDL, are attempts to find a happy medium between rapid development and rapid execution.
The happy medium for all of these languages lies in optimizing vectorized operations. In other words, if you have two data sets of 100 elements a piece that you want to sum, in C you would write a for loop to iterate over all the elements, but in a numerical language you would simply write $a + $b and it would give a vector/array containing elements with the sum. That sum, and many other operations, is computed using compiled C-code that iterates over the elements of $a and $b just as fast as hand-rolled C-code. The reason that PDL, Numpy, Matlab, and IDL have had such great success is because most numerical operations can actually be implemented in this vectorized way of thinking.
Note that C and FORTRAN programmers often think in terms of for loops. If you ever write a for-loop in PDL or Matlab, you're probably not doing it right, unless you're iterating through a list of files. (If you only need to act on some of your data, take a slice - using PDL::NiceSlice if you're in PDL. If you absolutely must iterate, look into MEX for Matlab and PDL::PP for PDL.)
So, using a vectorized approach makes your PDL/Matlab code almost as fast as C or FORTRAN, but it has a huge, rarely mentioned advantage: writing your code using vectorized expressions is actually more expressive and easier to understand. With less for loops hogging up screen space, you've got less extraneous code, less off-by-one errors, and can see more of your logic in one screen.
At this point, those who do their numerics in Java and C++ will get restless and note that they could achieve similar levels of expresiveness with vector/array objects and operator overloading. They are correct, and the rest of their code would run faster than the Matlab or PDL code. But then they'd be doing all of their infrastructure code in C++ or Java, which is just a headache for me and most Perl programmers. And, they probably can't do data slicing like PDL or Matlab.
At this point, even if you still plan on using your standard hammer, hopefully you understand why I prefer to work with PDL.
Now onto explaining the differences between PDL and Matlab. Here's a list of what Matlab does better than PDL:
The most jarring difference for a simple Matlab (or IDL) user will be the lack of an IDE. PDL offers a partial replacement, however, with the PDL prompt. The PDL prompt is an interactive shell, and it's possible to set up an auto-load directory in much the same way that you could set up autoloading for Matlab or IDL. Perhaps, someday PDL could have an extension for Padre that would actually make it feel more like a full IDE, but until that time comes, PDL developers will work with an open perldl or bash shell, editing scripts in their preferred editors.
PDL has 2D and 3D graphics for visualizing data, but they're not nearly as developed as in Matlab. So the simple answer to this is, "It can be done, but it may not be as easy." From what I understand, PGPLOT is a standard plotting tool for Astronomers, so it can't be all that bad. Still, Matlab makes visualization pretty darn easy and is (rightly) a major selling point.
From the numeric standpoint, Matlab has many more features than PDL. For example, there is a standard wavelet analysis toolbox in Matlab, and as far as I know, nobody has written a wavelet analysis tool for PDL. The vast wealth of toolboxes for Matlab is another major selling point for Matlab. Again it could be done in PDL, but you may have to do it yourself.
Now, here's a list of what PDL does better than Matlab:
Basically put, PDL has an incredible base upon which to build a numerical suite. As far as I am aware, PDL's data-flow and "pdl-threading" capabilities are unmatched anywhere, including Python and C++. They are also very poorly explained, which is why they are so under-appreciated. To perpetuate the problem, I will not attempt to explain them here because I don't know them well enough myself to explain them to a beginner. But there are other advantages.
Here's why I left Matlab: file handling. Handling nontrivial filenames in Matlab is a huge pain. Yes, they do trivial, trivial file globbing and they offer regular expressions, but I only thought to look for these after I switched to Perl. It didn't even occurr to me as a possibility while I was a regular Matlab user. If you need to process a server log and extract lots of data from it, you wouldn't even think to write the extraction portion in Matlab. This is Perl's bread and butter, and if you use PDL you can include the numerical analysis in the same script.
The difference in cost between Matlab and PDL is pretty big, too. This is a major sellng point for PDL, so to speak. I'm told Matlab's price can impede collaboration, though I've never actually seen it happen. Still it's nice to have free (beer) software.
The reason those cool features that I'll point at but won't get into - data-flow and "pdl-threading" - the reason those are possible is PDL::PP. This is a very advanced feature of PDL, but it is a major difference between PDL and Matlab. In Matlab, you can write C functions and compile them into MEX files, which is nice. You can do the same for plane-old Perl, too, using a variation of C called XS. What's more, you can do this inline with your code using the Inline module. It's all very nice. But with PDL, it's even better.
I mentioned that with PDL and Matlab, you try to replace for loops with vector operations. In PDL::PP, you specify what those vector operrations should do, using loops and all, with C-like code (it eventually gets turned into C code and gets compiled), but PDL::PP HANDLES HIGHER DIMENSIONS FOR YOU, as well as multiple data types. It is far easier to write custom-rolled C-code for PDL than for Matlab. The only problem is that the documentation for PDL::PP is difficult to get through, assumes you're a PDL expert, and you're willing to give the documentation a number of reads in order to get it. (By the way, the documentation that Toumas and company did write has a nice style and covers a lot of material, but let's just say it's not quite like reading the Camel book.)
Finally, PDL has a great group of developers. They're just nice people. I'm sure you could find the same thing with Matlab, but I never even thought to look. Perl is just so much more of a culture-oriented language.
Recap: what does PDL have the Matlab doesn't? It's based in Perl, it's got all of CPAN at its disposal, it's got the amazing PDL::PP at its core, its free, and it's got cool developers. Where does Matlab win out? Great visualization, huge user base, clean IDE, and a large tool set.
Chris Marshall has recently released PDL v2.4.5, a stepping-stone in the Perl Data Language. Chris has led the charge to get cross-platform OpenGL support for PDL. In the past, PDL has used its own GL code, but with Chris's work it now uses Perl's Open GL module, affectionately known as POGL.
You can install PDL from the CPAN without any manual work. However, you'll be missing a lot of goodies, like plotting, OpenGL, and the GSL. In this journal entry, I go through the steps necessary to get PDL installed and working, goodies and all.
The Perl Data Language is a language extension that adds number-crunching and numerical array capabilities to Perl that is on par with C in terms of speed and memory consumption. If you are familiar with IDL or Matlab, think of this as Perl's extension that makes it work essentially like either of those. Having these numerical capabilities is good, but the project is quite complicated and some manual work is probably necessary to get PDL to work the way you'd like. That's what I'll discuss here. I will perform this tutorial using my cpan shell, but hopefully the instrucitons I give will be decent enough that you can perform the installation manually if you cannot use cpan.
Note to ActiveState Perl users: sisyphus (one of the PDL developers/users) has created ppm packages for PDL. I have appended a comment with details, so if this is you, scroll down to the bottom of the page.
Step 0: Install Perl's OpenGL dependencies. The OpenGL module requires freeglut. If you're using Strawberry Perl on Windows, you don't have to do this because POGL (Perl's OpenGL) will download everything it needs. If you're on Linux, you'll need to check your flavor's package manager for it's library or get it from Sourceforge.
Step 1: Install Perl's
OpenGL module. You should be able to get this on cpan without trouble by issuing:
If you run into trouble, look at the INSTALL file by typing
in cpan and then examining the file. Hopefully a few manually issued commands
will solve your problem.
If you don't use cpan, you can still get the tarball from cpan's web page. If you download the tarball, you'll need to unpack it somewhere, open a command prompt in the unpacked directory, and issue the following commands:
The last command will probably have to be run with root privileges.
Note that on some platforms, you may need to replace
a similarly named utility. I believe it's
dmake for Strawberry
Perl. If it's not, you probably know who you are and what it's called. I'm sure Google can help. (If you've got ActiveState's Perl, there's a PPM discussed in the comments at the bottom of this post.)
Step 2: Check out the new PDL code (but don't install yet!). From the
cpan shell, simply type
get PDL. If cpan is not available to you, you
can download the source code from
cpan's web page for PDL
or from sourceforge.
Either way, you'll get a tarball that you'll have to open up somewhere.
Step 3: Take care of the prerequisites. As long as you've got a C compiler, PDL should work out-of-the-box. However, you can enable a great deal of optional
functionality by ensuring you've got a few dependencies installed. To figure out what these dependencies are, go to the directory containing the (untarred)
distribution (in cpan, you do that by typing
look PDL) and examine the
DEPENDENCIES file. Make a list of the various dependencies that you'll have
to install. I recommend installing these before you install PDL. Here's an overview of the dependencies that I deem important. To read about all of the dependencies, check out the DEPENDENCIES file. If you have any questions about dependencies I don't discuss here (or about stuff I do discuss here, for that matter), you can send a query to the PDL mailing list, which
you can find here.
(I had GSL and FFTW2 on this list, but it turns out that PDL has its own FFT implementation and does not need FFTW2, and I'm not sure that GSL should come so highly recommended. Don't get me wrong - it's a great library and I've used it before, but I think it would be more appropriate to install it only if you'll actually use it. Don't worry: if you use either of these libraries, you'll have it installed and PDL should detect it and compile the PDL bindings for it.)
Remember: PDL has many other optional features that have their own prerequisites, such as SLATEC and Minuit. Please check the DEPENDENCIES file for details on what's available and what it requires.
Step 4: Install PDL. Now that you've got all the prerequisites out
of the way, go ahead and install PDL. You can use the same commands you used
when you installed POGL above. If you run into trouble here, take a look at
perldl.conf file, which will allow you to tweak the
installation process a bit more to your liking.
The build process can take a while, but you'll want to pay attention. The
process automatically detects various libraries. This is particularly important
if it misses an important library for you, in which case you will
probably need to tweak the
perldl.conf file. The file has lots
of internal self-documentation so hopefully you'll be able to figure out what
to try to fix your problem. However, if you run into trouble, check out the
BUGS file for instructions on how to report bugs or other trouble to the PDL
Step 5: Learn more about PDL. The documentation for PDL isn't great for beginners, but it gets the job done. Check out the wiki, which not only has documentation for PDL but also has links to some good external documentation. If you find some feature to be somewhat under-documented, please let me know and I'll try to tackle it some time in one of my journals. Then, hopefully, I'll propagate the documentation back to the wiki and the POD files.
I hope this helps those of you who have been curious about PDL but have not put in the effort to get it to work. If you have any questions, the readers of the PDL mailing list are usually happy to help. Check here for those email addresses.
I started to give serious time to learning and using Perl this summer. It all started with my frustrations at Matlab. I am a graduate student in Physics and I was trying to analyze some data I had taken. My matlab scripts did an excellent job analyzing data, but I continually found new parameters that needed to be tweaked or held fixed. Since I believe my filenames should be descriptive, the names of my data files kept changing in nontrivial ways.
Those familiar with Perl are aware that Perl was built to process files. Though I did not know much about Perl at the time, I knew that Perl made so easy that which seems so horribly difficult in Matlab. As I beat my head more and more against Matlab's poor file globbing and even poorer regular expression handling, I began to think to myself, "If only I could do numerical programming in Perl..." Then I (re)discovered PDL.
That discovery came in June. Since that time I have aimed to master Perl as best I could. I read the Camel book and PBP, then I got through Perl Testing and just last night I finished reading Perl Hacks. I find myself in a peculiar situation: What do I do next?
The next book on my list will probably be Higher Order Perl, but I decided that the time has come for me to write about what I know. Since my focus is largely on PDL, I find myself drawn to writing about it. This will likely include editing and updating the POD documentation and the wiki, but I also decided that a blog is in order. So, I've signed up for an account on use Perl, and we'll see how it goes.