Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • Assuming the issue here was the "Personal Use Only " and "No Automated Querying" sections, I really don't see how the existence of a module can violate Gogles TOS (unless there's some automated test-case that they take issue to).

    As long as the module CAN be used in accordance with their TOS (ie: as long as I can use it to write a script which I use for Personal Use) then the module itself is not in violation. If they don't like the way some asshole is using the Module, they should go after the asshole.

    I can write a meta-searching site that violates their TOS using nothing but Apache, /bin/sh, and lynx -- does that mean lynx violates their TOS and should be pulled from circulation?

    (Admittedly, I don't know the details of what module was pulled, maybe it was called Apache::SearchGoogleWithYourOwnAds and did all the work for you to create a proxy to Google that showed their results with your Ads -- but i doubt it could have been that bad.)

    • I don't they would have asked for it's removal had it not been a pressing issue. I get at least one fucktard a day who wrote a crawler to use on the cpan search engine that doesn't respect robots.txt or anything else for that matter effectively crippling the service for everyone else...idiots with a little perl and not a lot of common sense can ruin your day.

      The author removed the module voluntarily. However, the others in the namespace would do well to consider the applications of their modules and compliance with the terms of service to avoid this sort of problem in the future.

      Search engines like to provide a useful service without the added hassle of someone trying to hoover their database with 50 queries a second or more. I consider abusive crawlers to be a menace and a threat to freely available search engines like google, CPAN search and others.

      • I still say they should be going after the users, not the code.

        I mean, if people are slamming their site with a module, taking the module off CPAN isn't going to stop them -- they've still got it, and they'll still use it.

        There is definitely somethign to be said however for trying to make you modules play as nicely as possible -- having a section in the documentation on how to use the module responsably is good, but module writters might also want to consider putting "safety valves" in their code, that users have to go out of their way to open. That way you're doing your part to make your software play nice with the rest of the children, and you can point a clear finger at the user for disabling the safety feature.

        I'm reminded of some code a buddy showed me a few years ago. There was YA Buffer Overflow hole in some software, and the person who found the hole had released a C program to exploit it (in the spirit of SATAN). If you compiled the code (or got a binary from someone) and used it without looking at the source to understand what it did, you would never notice the #ifdef SCRIPT_KIDDIE block that put the users name, email, IP, hostname, and a bunch of other really usefull information into the large string that was generated to overrun the buffer -- giving anyone who had patched the bug all the data they needed to track down the person trying to hack them.

        Perhaps people writting modules in the WWW::Search hierarchy could put similar data into X- headers without documenting the "feature" so Search engines can better block/track assholes abusing the module.

        • Perhaps people writting modules in the WWW::Search hierarchy could put similar data into X- headers without documenting the "feature" so Search engines can better block/track assholes abusing the module.

          Even outside of the context of this discussion, this is a fabulous idea. Not so much to enable search engines to block abusers, but because software that uses the Net should be self-identifying, especially software that iteratively traverses a site.

          (darren)

          --
          (darren)
          • We don't need no stinkin' X- headers. That's what User-Agent is for.
            • Most of the CPAN Modules I've seen that act as HTTP clients set the User-Agent, but they also have a documented method for the User to override it (in case they need to masquarade as a particuar User-Agent.

              I'm suggesting some headers that would be completely undocumented, and could only be overridden using an undocumented method. Most people would be completely unaffected (since the extra X headers would be ignored) and anyone who was affected wouldn't have too much trouble looking at the source to figu