Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • Hypocrisy. (Score:2, Insightful)

    by solhell (772) on 2002.03.01 17:41 (#5326)
    Let me make sure if I understood this correctly. Google doesn't let some program to query their website, then retrieve the search results, parse them and use them. Like a metasearch script that extracts information from multiple search engines and combines them. They supposedly doesn't allow people doing that.
    Let's rephrase that; a remote program (a web browser in a sense) visits their webpage, parse the data to keep only the url's of webpages that obviously Google doesn't own and only use that information that Google doesn't own. So why would this be a problem. And how is this different than Google crawling peoples web pages, caching their data and images.

    This is not a DoS attack. You don't crawl google iteratively in parallel. It is a simple one page query.

    You might argue that people can put some files to prevent search engines to index their pages. Don't forget, google extract copyrighted material from others pages, and a metasearch script extract only the data google doesn't own at all.
    • Hmm, let me sure *I* understand this correctly. You are proposing that it is ok to steal their results (i.e., cost them money) without *any* compensation? How exactly do you expect them to stay in business? Their business model is simple: either you pay per query (like Yahoo), or it's free and you "pay" (in the aggregate) by viewing/clicking on ads. Automated queries without pay don't make any sense economically.

      Re your comment on ownership of the information, you're missing the point too--Google does