P.S. This entry was originally posted to my own blog site as http://blog.agentzh.org/#post-105
The XUL format is the best among the three
Just as the topic of the talk suggests, we're migrating from Firefox clusters to WebKit ones. I'll post more details here in the near future.
Enjoy!
P.S. This entry was originally posted to my own blog site as http://blog.agentzh.org/#post-104
P.S. This entry was originally posted to my personal blog site: http://blog.agentzh.org/#post-102
Yeah, Q4 is really crazy! I've been hacking on several company projects in parallel over the last few weeks. Fortunately they're all very interesting stuffs.
We've just kicked OpenResty 0.5.2 out of the door and I'm preparing for the 0.5.3 release right now. My teammate xunxin++ has quickly implemented the YLogin handler for OpenResty, via which the users can use Yahoo! ID to login their own applications on OpenResty. Our Yahoo! registeration team helpfully worked out a sane design to allow us to reuse the Yahoo! Login system, which effectively turned Yahoo! ID into something like a passport, at least from the perspective of OpenResty users
Meanwhile, some guys from Sina.com are doing their personal projects in OpenResty. They said they really appreciated the great opportunities provided by the OpenResty architecture since various kinds of clients (e.g. web sites, cellphones, desktop apps, and etc.) could share the same set of API via OpenResty's web services). They also sent a handful of useful feedbacks and suggestions regarding OpenResty's design and implementation.
I've also been working on an intelligent crawler cluster based on Firefox, Apache mod_proxy/mod_cache, and OpenResty. The crawler itself is a plain Firefox extension named List Hunter:
http://agentzh.org/misc/listhunter.xpi
It's an enhanced version of the Haiway List Recognization Engine used by my SearchAll extension and also built by my XUL::App framework. You can install it to your Firefox and play with it if you like
Turning such a Firefox extension into tens or even hundreds of Firefox crawlers running on a bunch of production machines requires a lot of work. I devised a prefetching system which prefetches HTML pages and CSS files included in them, and caches the headers and contents for a fixed amount of time in such a way that Firefox crawlers can later load pages and CSS stuffs directly from the same cache in our local network, thus significantly reducing the page loading time in Gecko. The cache is a heavily patched version of Apache2's mod_cache with mod_disk_cache as the backend storage. The way prefetchers and crawlers interact with the Internet and the cache is via HTTP proxies based on Apache2's mod_proxy. Pipeling the prefetching and crawling processes requires OpenResty with PgQ enabled. Well, I'm still working on this cluster and my goal is 2 pages/sec for every single Firefox process. Firefox 3.1's amazing performance boost (more than 30% faster according to my own benchmark) makes me very confident in abusing Gecko to build efficient crawlers that takes advantage of the rich rendering information.
Another Firefox crawler project haunting my head is a similar one that automatically recognizes and extracts user comments from arbatrary web pages (if any comments appear, of course). Such tasks would be hard if my code has to run without the geometric informations of every DOM nodes provided by the browser rendering engine (in the form of offsetWidth, offsetHeight, offsetTop, and offsetLeft attributes of DOM elements). Some other collegues in our Alibaba's Search Tech Center are putting their head around Cobra, a pure Java HTML renderer. But I'm doubting that it would run more correctly or more efficiently than Gecko. Oh well, I'm not a Java guy anyway...
Finally, just a short note: I had a wonderful time with clkao and Jesse Vincent at Beijing Perl Workshop 2008. I learned pretty a lot about the Prophet internals during the hackation after the conference, and Jesse quickly hacked out a stub OpenResty model API for Prophet. Then we went to the Great Wall the next day. I was amazed to find Jesse hacking crazily on the Great Wall and enjoying the sunshines alone...Wow.
Enough blogging...back to hacking
P.S. This journal was originally posted to my own blog site as http://blog.agentzh.org/#post-97
P.S. This journal was originally posted to my personal blog site: http://blog.agentzh.org/#post-93
I've just uploaded UML::Class::Simple 0.10 to CPAN with the highlight of the XMI format support. It will appear on the CPAN mirror near you in the next few hours.
Thanks Maxim Zenin for contributing this feature
Oct 18 (to Jack Shen~)
I wrote a UML class diagram generator based on GraphViz. it can parse arbitrary perl OO modules and obtain the inheritance relationships and method/attribute list automatically. it's called UML::Class::Simple. And it's much easier to use than StarUML . you know, dragging mouse to draw diagrams is really painful. yay for automatic image generation!
(Here is one of the sample outputs: http://svn.berlios.de/svnroot/repos/unisimu/fast.png.)
Oct 18 (to Sal Zhong~)
i'm planning to upload UML::Class::Simple to cpan once it's mature enough. will you test it for me? bug reports and patches are most welcome.
it's still undecided how to differentiate perl classes' properties from other ordinary methods. i'm also pondering the idea of adding relationships other than inheritance. i'll be delighted if you have some ideas on these matters.
Note that i'm ignoring the Autodia module on CPAN since i'm not in favor of XML and a quite different approach has been taken in my project. anyway, i have to admit it's wise to talk to Autodia 's author and merge these efforts. at last, i must thank Alias for creating PPI and suggesting the use of Class::Inspector. they're invaluable when one wants to extract meta info from the perl world.
Oct 19 (to Jack Shen~)
I've merely finished the slides
for recap. they already reach the amount of 44 and the number is still counting. alas, still wondering what to say in the next talk on the design of methods and subroutines.
Oct 19 (to Cherry Chu~)
Thanks. the talk went pretty well. it's interesting to see that i had the feeling just before the talk that you would not come. so i was not very surprised by your absence. no problem, there's always ``the next time''.
i've been busy making slides for tomorrow's talk. they're still not finished yet. sigh. have to make more slides during the daytime tomorrow. producing so many slides is quickly getting tedious. hehe, you know that feeling, right?
Oct 22 (to He Shan~)
> hi! I've found a book. IT is so nice that i have been
> reading about it all the afternoon. it is great, just
> like an extended version of "The Practice of
> Programming". it's named "Code Complete".
I've got the feeling that you are currently on the *right* way. you'll definitely become a good hacker if you keep going. hmm, hopefully you'll join us perl camels soon.
Oct 22 (to Jack Shen~)
...LOL. apparently you are not a VB guy. inserting images into ppt slides is straightforward once you know how to record down VBA macros in the PowerPoint environment and browsing the generated code in its VB IDE. Another way to get an answer is searching the web. iirc, the method should be AddPicture or something like that. not sure though, computers are out of my reach right now.
...Python is even more powerful than MATLAB, Maple, and Haskell? i doubt that.
...I was exclusively hacking on the new tokenizer for Makefile::Parser and completely forgot that i had C# classes tonight. anyway, the next major release of M::P takes precedence over any other things.
Oct 23 (to Sal Zhong~)
I've just started to rewrite M::P's codebase (which will hopefully be released as M::P 1.00 soon). Yes, it's long overdue. I've had a pretty good plan for a scalable and extensible gmake implementation based on M::P for long.
The new M::P API will offer parsing results at two different levels:
Makefile DOM tree
It's a syntax-oriented data structure which preserves every single bit of info in the original makefile (including whitespaces and comments). So one can modify some part of the DOM tree, and write the updated makefile back to disk. I think it's useful to some GUI apps which want to edit makefiles via menus and is also beneficial to the gmake => PBS translator.
Makefile AST
The AST desugars the handwaving parts of the DOM tree down to a semantic-oriented data structure for make-like tools to ``run'' it or for some visualizer (e.g. my Makefile::Graphviz) to depict the underlying dependency relations. For the PBS emitter, I think we should work out a special AST for it since the desugaring must be lossless, much like a program correctness proving system.
I'm currently working on the M::P tokenizer and will finish the DOM tree constructor these days. The process should be going pretty fast since it is mostly test-driven.
The first goal is to implement the new M::P APIs and get my pgmake utility pass most of the gmake tests so that I can kick M::P 1.00 out of the door.
I'm stealing a lot of source code and pod from Alias's PPI module. I've noticed that the basic structure of PDOM trees can also fit my needs very well. it's called MDOM in my M::P though.
Oct 24 (to Sun Xin~)
Take care. translating may drive you mad some day. just have appropriate amount of fun, dude!
Oct 26 (to Jack Shen and Sal Zhong~)
my gnu Makefile DOM builder now supports most kinds of rules, 2 flavors of variable assignments, macro interpolations, and various command and comment syntax. Now it's trivial to add new node types and extend the DOM parser.
i'll add support for double-colon rules, the define/vpath/include/ifeq/ifneq/ifdef/ifndef/... directives, and other missing structures tomorrow. After these additions, the DOM parser will be quite complete and will serve as the solid ground that we keep standing on. constructing the Makefile AST will be much easier if we keep a DOM tree handy.
yay for test-driven development! without TDD or Alias' PPI , i wouldn't have progressed so rapidly.
Oct 29 (to Sal Zhong~)
When and where shall we take the Java exam?
...Oops, it seems impossible to release UML::Class::Simple tonight. still have several missing features to implement and the pod needs loves too. hmm, christopher may be unhappy since i earlier made the promise to him that i would make the release by *this* weekend. sigh. hopefully i'll get some cycles tomorrow.
...nod nod. but i also gotta review the data mining textbooks for the coming exam. furthermore, i'm planning to hack on two expert systems in the next week. i'll be programming in Prolog, CLIPS ,
and Perl simultaneously, which must be a lot of fun! yay!
Oct 30 (to Sal Zhong~)
I've just talked to Alias, the author of PPI , on #perl. he said that i could borrow as much source code from PPI as i would for my Makefile::DOM module. PPI::Element, PPI::Node, PPI::Token, and PPI::Dumper can be reused by my MDOM directly without many changes. i also briefly introduced the two-level ASTs to him and expressed my appreciation of PPI . It has given me plenty of inspiration on how to push my Makefile::Parser further.
This journal was originally posted as http://agentzh.spaces.live.com/blog/cns!FF3A735632E41548!128.entry