A couple of years ago, I first came up with the idea of doing basic image recognition using the Perl regular expression engine.
Using Imager for the underying cross-platform image handling, I got a basic first version working by converting each pixel to a HTML colour code, building the image string and a search regex, and then converting the character match positions back into pixel terms.
And it worked great, until I tried to scale it up to the type of image you might actually want to look at the most, screen shots. At the time I needed to monitor OS X machines for any unexpected popups, because these were advertising displays, and that sort of thing is embarrasing.
Once I tried to apply the search technique to something the size of a 1024x768 screenshot, the overheads of transforming a million pixels in Perl started to bite hard.
While the regexp search might take 0.01 second, building the search image was taking something like 10 seconds. Just barely acceptable in my passive monitoring case, but not good for much else (like, say, writing a solitaire bot).
While on the flight home from YAPC::EU (via Iceland) I finally spent some time reorganising the search code into a driver API, and trying what I think is a much better approach.
If the most expensive part of the job is converting the search image, why not use a NATIVE image format for which fast C encoding already exists, build it in memory and then regexp it there. We shift more work into generating the search image, and a lot more work in working out where the hell the match is in pixel terms, but for hopefully a large net win.
And it turns out that 24-bit Windows BMP files make for a reasonably decent search image format. Of course there's the small problem of the scanlines going from bottom to top, and the fact the red/green/blue bytes are around the other way, and the small matter of each scanline being aligned in dword terms, resulting in between 0 and 3 useless bytes at the end of each line depending on the width of the line, but it turns out that with some funky post-match math , judicious use of quotemeta, and some post-processing of matches to remove false positives, you can generate a search expression that will quite comfortably search for an image inside of a native Windows BMP file.
Whereas the old time to capture and prepare a 1024x768 search image as #RRGGBB was around 10-12 seconds, the time to capture and prepare a 1024x768 native BMP file is in the vicinity of 150 milliseconds, which is almost two orders of magnitude faster.
Combining a 0.15 second capture cost with a 0.01 second regexp cost and another 0.01 to generate the search expression, the result is that using the new default "BMP24" driver, Imager::Search can monitor the desktop for images at around 5 frames a second!
This has the makings of a truly awesome Solitaire bot, or maybe even something like pinball...
This has got me thinking though, could I make it even faster again?
I'm now pondering the idea of a platform-specific Windows or X11 driver that would bypass Imager for the capture step altogether, and instead generate a search expression for whatever the raw byte string is that comes out of the screen capture system call...
The other major addition I need to add is to support transparency (1-bit transparency for now anyways) in the search image.
But for now, I'm happy enough with the new fast Imager::Search version to stamp it as 1.00 and release it.