Yes, threads suck. And in several different ways. Slow, memory-bloating, etc etc etc.
However, threads work.
Installing them from the CPAN on Windows, Mac and Linux works.
Using them as they are intended to be used works on Windows, Mac and Linux.
Integration with Wx works on Windows, Mac and Linux.
And using them to saturate a 4 CPU core development machine works, without resorting to having both them AND external processes.
And they work well enough that nobody has yet had an itch strong enough to step up and replace our task management code with something else that can support more than one CPU (and since our task system is derived from Process.pm, it is specifically designed to make it easy to use alternative backends).
So threads work, but they suck. Or at least, interpreter-copy sucks.
And in an IDE scenario where your process needs to load 50-100 meg of code just to drive the cooler IDE functions they suck even harder.
Fortunately, Padre has a rather forgiving attitude to things sucking.
We would rather have someone commit something that works and sucks, than not have it committed at all. And Padre is full of all kinds of features that work but suck. And slowly, gradually, they suck less.
The most recent couple of releases have come with stability warnings due to the arrival of our second-generation "Slave Driver" threading model.
The Slave Driver mechanism attempts to specifically contain and reduce the problems associated with threads.
During startup, we load the minimum number of modules required to conduct communication across threads, and then immediately spawn off a master thread which will remain unused while our main thread continues onward and loads up as much code as it likes.
Later on, when we need to do background work, this master thread is then further cloned into a slave thread which will bloat out incrementally, loading only the code it needs to execute the task.
While the slave driver mechanism itself landed a few weeks ago, the final step of pushing the master spawn point up into the start-up code just landed today.
The result of this is a change in our per-thread cost from 34meg per thread to about 20meg per thread (for a total reduction in memory in a low-usage case from 90meg to 60meg).
If not for the fact that loading Wx is an all or nothing proposition, we could probably cut this in half again. And while a collection of half a dozen 35meg threads (about the maximum we are likely to need to saturate 4 CPU cores) in a thread pool is a nasty amount of RAM, even for an IDE, half a dozen 10meg threads is a lot closer to a tolerable memory cost.
And if we can drop this further to around 5 meg, we get close to the memory cost of a forking/process model, which is the only other parallelism model available in the short term that supports many cores transparently.
With Miyagawa's cpanminus (or as I like to think of it, cpantiny, but hey it's his module) now officially the New Shiny of the moment, the same old arguments are resurfacing about less is more, and worse is better.
And the great wheel of opinion circles again.
Rather than bother commenting on cpanminus itself (I suspect many could guess my opinions) I thought I would add some caveats to the praise, that hopefully can help you identify and engineer your own equivalent successes.
1. These "subset" modules are truly successful only when you make the accuracy trade offs worth it.
Thus, the installation must be effortless, the user interface must be super-simple, zero-conf is an absolute must, and the module must succeed in all those niches where the "real" application is too hard, too big, or too clumsy.
If your accuracy is going to suck, NOTHING else is allowed to suck.
2. As I hint about with "real" these subset modules work best when they are an alternative solution, not the only solution.
If you are inaccurate and the only solution, you are an annoying limitation.
If you are inaccurate and an alternate solution, you are handy because you create a kind of user-pays situation. Simple people with simple use cases get a simple solution. Hardcore people with hardcore needs get a hardcore solution.
So if you want to make a small solution, it has to be a subset of something larger that works better. It can't be the only solution.
3. 99% is a just a marketing number, but the real number does matter.
Why? Take a look at the Heavy 100 index (http://ali.as/top100/") and you'll notice that many major and popular CPAN modules (like, say, Catalyst) have more than 100 dependencies.
With a 99% success rate (using stupidly naive "statistics") every single module on that Top 100 list will fail to install. In a large system with lots of recursive dependencies, it doesn't take much for your install count to grow to 20 or 50 or 100 dependencies.
So you need to be sure about WHICH subset you want to support, so you can be sure what percentage you REALLY need.
For example, despite all the work to support it, there is only one single module currently using Bzip2. Would it be worth it to remove bzip2 support entirely from the CPAN and help that author to convert? Probably.
The biggest benefits of these alternate solutions are often not only to do what they do. It's to demonstrate what is possible, pushing the competitors to do it as well.
This is just a short promotional post.
If you are interested in the Perl/PostGIS http://geo2gov.com.au/ application we took second prize with during the Mashup Australia competition (or the subject of open government data in general) my "Nerds for Democracy" partner in crime Jeffery Candiloro will be giving a talk as part the Ignite Sydney speaking event on Tuesday night Sydney time.
You should be able to watch a live stream of the talk at the Ignite video website here.
In other Nerds for Democracy news, work is now underway on our entry for the NSW apps4nsw competition. It's looking pretty awesome, in particular because it will feature a set of government data we uncovered by accident, and which we think the public has pretty much never seen before.
At work, about 50% of my job fits into the category of "The Spice Must Flow".
After 2 years of work (incorporating about 15 mini-projects) we have finally got our downtime for the year down to about 32 minutes (none of which we were the root cause for).
The biggest remaining threat, and the cause of a number of near-miss almost-downtimes is load regressions, because our code is quite complex and it doesn't take a lot for someone to accidentally introduce an extra O(logn) on top of some existing O(n logn) and load-spike something they shouldn't.
To try and deal with this, in additional to our regular pre-release load testing runs, we've been starting to accumulate a benchmark suite using the same structure and for the same reasons we have a test suite.
The idea is to produce several dozen or hundred individual benchmarks that run nightly in a controlled environment (one might term is "smoke benching") to catch performance regressions as they occur, instead of the current scenario where they are only caught just before (or just after) release.
Our current efforts are breaking down at only four or five regression tests, for a variety of reasons. So I've started to experiment with a modified (but completely compatible) version of Benchmark::Timer I'm calling internally Benchmark::Lilburne (named after our team member that does our performance testing, who just happens to also have a name starting with "B").
B:Lilburne already comes with tracking of statistical certainty, courtesy of Benchmark::Timer. To this base I've added a maximum iteration count and maximum runtime, to prevent benchmarks running too long in the face of unreliable performance. This can be common in our setup, which results in benchmarks running for hours trying to reach statistical certainty.
I've also provided a mechanism to integrate with "enterprisey" code which will often do it's own timing capture. The new ->add method lets you add an elapsed time to a benchmark that has been captured independently outside of the benchmark script, which still allowing you to retain the statistics driven iteration of trials.
Finally, we've reached the point where we absolutely have to get rid of Benchmark-style formatted output. Instead B:Lilburne comes with options to output to STDOUT for capture by an external harness instead, in the same way the Test:: family of modules uses a protocol to report test results.
Longer term I'll probably switch to JSON so we can include less table'y data, such as a "verbose" option to spit out the details timings. For simplicity I'm just using CSV as my output format in the short term, since that doesn't require me to define a META.yml-like tree structure.
Structurally, our bench suite is layed out similar to a test suite.
We have a benchmark directory, with a collection of files ending with
I'll report more on our experiments as they continue, but if you know any other prior art in this area, feel free to link me to it in the comments.
If you'd like to see the changes I've made so far, you can see the merge of B:Lilburne features back into Benchmark::Timer in my repository.
To my great joy, the "Chocolate Perl" concept I defined so long ago has finally started to crystalise. Curtis Jewell's blog details the specifics.
Having switched over to the new Alpha, clearly there's work yet to do.
I've applied a few fixes to my Perl::Shell to support proper exception handling, wxCPANPLUS crashes on startup, and the Tk-based POD browser is hellishly slow to turn on the documentation tree control.
We've also reached the point where the start menu is straining to breaking point, so it appears we'll need to do some sub-dividing to make that a bit more usable.
But on the whole, it's really starting to feel like what I'd imagined it to be.
You can see a future echo of something that we could hand to a first time Windows Perl newbie and expect them to actually get something done.
Now we just need to polish like hell so it doesn't just work, but works competently.
One of the downsides of Strawberry Perl's move from the InnoSetup
Our headline installer went from 17meg to 32meg overnight.
If you were paying (stupidly) close attention to the latest release, you might have noticed that Curtis managed to drop the installer by 3meg, without changing (at all) the compression mechanism and while adding slightly more content to the package.
Via the curious method of just changing the order in which he added the files to the archive, sorting by file extension instead of sorting by file name.
The grouping (even at a naive level) of similar types of content into the same area of the resulting file provided such a good improvement to dictionary efficiency, that it resulting in nearly a 10% improvement over plain deflate (which is almost as good as switching to bz2).
What would be even more awesome would be combining this change with LZMA as well (which builds dictionaries across much bigger areas of the file).
And if you could do it in something less than O(n^2) time, it might also be interesting to test pairs of files directly, to brute-force discover which file order was most efficient for feeding into the compression routine.
Why is PHP so much easier for newbies?
Why does Java have the best IDE tools?
Why is Ruby prettier than Perl?
Why does Perl have the best package repository?
As I've played through Mass Effect 2 over the last few weeks, I see some interesting parallels.
In the Mass Effect universe, human technology is bootstrapped by the discovery of an ancient abandoned alien observation outpost on Mars, and the further discovery that the dwarf planet Charon is really an abandoned but active interstellar jump gate covered in ice.
Other similar species have done the same, resulting in a galactic community of around a dozen civilisations all based around the same basic technological underpinnings.
Despite these civilisations believing a recently (50,000 years) extinct civilisation built the gates, it turns out the technology is perhaps millions of years old.
Every 50,000 years, the synthetic AI race that built them returns from hiding in intergalactic space to wipe out all of the existing advanced species based on "their" technology, and reset the galaxy for the next set of civilisations to rise.
In a conversation between the game's protagonist and one of these old AIs, we are lambasted by the AI for taking the shortcut on their technology. The jump gates and other technology is left in place intentionally, so that each new generation of civilisations take a controlled and predictable development path, making it easier to destroy them.
The AI posits that it is the overcoming of adversity on your own that drives true technological advancement, and that easy routes make you (technologically) weak.
I think you can see something similar in the development of the different programming languages.
Java is long and wordy, taking a long time to type. The need to work around this limitation resulted in the proliferation of powerful IDEs, resulting in the annual 20 million line of code Eclipse release train.
PHP as a web language would have been stillborn if it didn't deal competently and quickly with the need to easily deploy code, the result of which is that you can effortlessly just change
Python's need to gain mindshare against an entrenched Perl led to a huge focus on being easy to learn, to a simplification of the language, and to hugely popular things such as the PyGame library and game competitions.
Faced with the lack of truly great package repository, and with a web-heavy community, Ruby became the "prettiest" language. Creating an elegant website is both expected and required if you are going to gain mindshare for an idea.
And Perl's messy syntax and difficulties in the area of maintaining large codebases, combined with a pragmatic sysadmin-heavy community, resulted in an unmatched packaging system that allowed code to be maintained in small pieces, with enormous volumes of support infrastructure around it.
The ease of publishing and trend to smaller package that the CPAN allowed conversely means that the Perl community has never really had the need for pretty and elaborate websites, and the smaller package size means that we lack the giant headline libraries that make the payoff from website work better.
Our bias towards a pragmatic tech-savvy sysadmin userbase means we haven't really provided anything like the focus on learnability that has driven Python's gradual dominance in the mindshare of the young. It takes a certain rigour in your prioritisation to intentionally remove and dumb down existing powerful features so that the language is easier to learn.
Even for Strawberry, which focuses on the userbase with the lowest traditional knowledge, we intentionally have the smallest and most maintainable website possible and we don't even have the kind of introductory screencasts that we really really need (which should be easy but which I never seem to find the time to do).
If you throw a bunch of Perl coders against some PHP coders in a website competition, it is not unexpected that when both sides play to their strengths you will see something like http://geo2gov.com.au/html?location=e.g.+1+Oxford+Street from the Perl coders and something like http://www.hackdays.com/knowwhereyoulive/postcodes/view/2000 from the PHP coders.
The former required a massive amount of data extraction, transformation, aggregation, a gigabyte-sized PostGIS database, and deployment via a Linux virtual appliance to Amazon EC2 to allow for strategic load-shedding.
The latter required the ability to turn data into presentable and understandable information for real humans, and to make it pretty enough that they WANT to look at it.
Driving true technological progress, then, may often be about identifying weaknesses that are hard to solve but aren't completely impossible (and don't have any crippling long-term conceptual flaws at an economic or project-management level).
The three best projects I have driven - PPI, Strawberry, and (in part) Padre - all share this property. All three of these represent hard but not impossible problems, and require an awareness about which issues are intractable and which issues merely exist because there's been no need to solve them any better.
Padre in particular has suffered greatly from issues with Wx quality and threading. But given the low takeup of both threading and Wx it was reasonable to move forward on the basis that these would be fixed once there was something depending on them, and driving a need to fix them.
All of our early problems are gone now, and there is continued pressure to find ways to improve our use of (and the efficiency of) Perl's native ithreads.
Similarly, the creation of Strawberry required a lengthy year-long process of fixing Win32 bugs in all kinds of toolchain and low level modules, because we'd never had a proper working developer feedback loop before.
Similarly, Perl's current push for marketing and blogging and websites is directly resulting from Python's success in mindshare capture.
So my question for you to ponder this week is the following:
What can you see that Perl as a whole struggles to do well, for which a good solution is not impossible, and is only being held back by smaller problems which would go away if there was a working candidate solution put in place that needed those small problem solved.
In the spirit of trying to jam as many speed hacks into Padre as possible, I've finally taken it upon myself to take the awesome work demonstrated in ORLite::Array (which uses ARRAY based objects instead of HASH based objects) and moved it into the ORLite core.
The removal of the need for a hash slice in DBI doubles the speed of SELECT statements, reduces the memory cost of objects, and makes accessors quicker.
I've also integrated support for Class::XSAccessor, so now not only are they faster ARRAY objects (if you want them) the accessors are all XS-accelerated as well.
As a bonus to this I've altered the ORLite::Mirror sub-class to always use ARRAY-based objects by default, which means that all of the ORDB family of modules just instantly got doubled in speed without me needing to do any new releases.
So all up, Padre 0.56 is looking awesome.
To quote one person in the #padre channel, "padre is so fast to start now that the splashscreen is an irritation"
Now that the new resource locking system has been landed for a couple of weeks and some of our worst performance bugs have been resolved, we've been able to uncover and fix a ton of routine performance problems.
Padre 0.55 landed the first pass of these, but the upcoming Padre 0.56 is looking to be incredibly fast and has ditched much of the weight that you expect from an IDE. It's actually starting to feel light like an editor again, instead of heavy like an IDE.
Amoungst the improvements, we have a new tricksy startup mechanism that can apply startup preferences without having to load the user's config file at all (or any of the accompanying weight).
The startup process has become so fast that we're seriously considering ditching the splash screen (because it slows down the startup too much) and open files in an existing Padre has become nearly instantaneous.
Opening files is at least twice as fast thanks to an encoding detection shortcut for simple ascii files. Changing tabs is 5+ times faster due to removal of some filesystem operations and refresh locking. Closing a file is at least twice as fast, and 3-4 times faster if you choose to use the new feature preference to disable Padre's remembering of cursor locations in files.
Closing groups of files "Close This Project" and "Close Other Projects" etc has gained locking and now works incredibly quickly. With the cursor memory off the time between clicking close and the Padre window vanishing is almost instantaneous now.
Finally, we've also added a number of new refresh shortcutting to different tools and widgets, which makes Padre in general much snappier when moving around inside the same project or just doing operations that trigger off refresh cascades within the same file.
The sum total of all these improvements is that Padre feels almost like a whole new editor. It's fresh, snappy, and generally a joy to use.
And even more fancy improvements are in the pipeline. Mattias and Steffen between them have spawned a special new "slave master" threading branch which is already able to save 4-5meg of RAM per background thread by spawning of a master thread very early in the Wx startup process, while our list of loaded modules is very small. As a bonus, because it doesn't have to fork the entire Wx application tree, the slave master branch also fixes the long-time "Leaked Scalar" bug.
0.56 is definitely shaping up as something special and when it comes out I highly commend it to everyone to take a look, especially if you haven't had a look at Padre in the last 10 or so releases.
Because it works entirely through the creative use of regular expressions, it turns out that it is relatively easy to port as long as you have good enough regex support.
But this gives you an alternative solution for smaller fragments that doesn't require any server side work at all and allows pure inline client-side templating, a kind of Jemplate.Tiny.