Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

agent (5836)


Agent Zhang (章亦春) is a happy Yahoo! China guy who loves Perl more than anything else.

Journal of agent (5836)

Friday April 29, 2005
04:51 AM Crashed!

[ #24444 ]

=from 2005.4.29.2:05.PM
=to ...4.29.2:45.PM

I really got annoyed when I first ran only to find out that the Educational Administration System had changed its web interface and thus caused my hacker script to crash completely. Furthermore, the new version seemed to have fixed the well-known "bug" associated with account secrecy. So we might no longer be able to check other people's info.

I should have written this student-id generator down a little earlier. It couldn't be too late as long as the EAS web site didn't change its face. But now I really have nothing to do with it! Bang!

Fortunately, the web site of our university's library didn't change its page templates. I promptly migrated to fit the user system's web interface, and brought into being, which worked perfectly fine. I have already downloaded thousands of user's account info, including the user's real-name, profession, sex, Personal ID number, record of the books borrowed, peccancy history, and so on. My little is still crazily capturing more reader's materials at the time of this writing. Oh, no problem, There was till more than 600 MB free room on my disk when I left my machine in the noon.

How should we process the huge amount of data gained from the web? The current plan is as follows:

  • Use HTML parser (written in perl) to filter out the key info from the HTML tables
  • Save it in a customized compact format
  • Convert this form of data to more specialized and more normative formats, such as CSV or even SQL commands
  • Import all the stuff to relational database system, such as MS Access or SQL Server 2000

The whole process can't be too difficult from the viewpoint of us perl programmers! Right?

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
More | Login | Reply
Loading... please wait.