Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Matts (1087)

Matts
  (email not shown publicly)

I work for MessageLabs [messagelabs.com] in Toronto, ON, Canada. I write spam filters, MTA software, high performance network software, string matching algorithms, and other cool stuff mostly in Perl and C.

Journal of Matts (1087)

Monday April 15, 2002
03:13 PM

Google's new API

[ #4212 ]

So it seems google's new API has everyone going all gaga over it. Well I for one am sick of hearing about it.

You see before they decided to release this, they had a nice simple XML API. Instead of sending your query to http://google.com/search?q=foo, you sent it to http://google.com/xml?q=foo. Simple huh? The results you got back were always well formed XML. The syntax though was horrible, with single-letter tag names. I'm guessing the guy who designed it hated XML and thought it was overly verbose and wasted bandwidth.

Yet now we have an "API". Apparently having an API makes it much easier for everyone. I'm forced to agree with this because Google's original XML query output format was horribly terse to the point of being unusable. XML is supposed to be self documenting - if the tags had been named things like <results><result>..., it would have been dead easy to use. And guess what? You would have seen Java, Perl and other language APIs popping up left right and center to make use of these simple HTTP queries. Because this isn't rocket science. And it doesn't require large fat modules to build something around this.

But SOAP has "mindshare". And it's a "standard". So I'm obviously wrong, right? I do have a talk on this for OSCon, so I'd be very happy to solicit pre-conference feedback.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • There's one significant difference between Google's XML results and Google's SOAP API: the XML results gave you a document and left you on your own; the SOAP API gives you pre-digested data structures.

    Giving everyone the same (poorly-tagged) XML document left them on a very low plateau: everyone who wanted to use the XML Search needed to write a parser for this document (or re-use someone else's). It's a nice solution, but only if you're not very lazy or just like XML. [*]

    The SOAP solution on the other

    • So this is mostly because other languages XML parsers suck? If you stuck something containing <results><result... into XML::Simple, you'd get a simple array-ref. So this becomes as simple as XMLin("http://google.com/xml?q=foo") (assuming you've turned on XML::Parser's LWP support).
      • So this is mostly because other languages XML parsers suck?

        I don't know if it's an issue of suckiness or not. XML::Simple or a tree-based parser still shows an XML-centric data structure. SOAP shows an application-centric data structure for any application using SOAP. While you have to ignore a lot of bogosity to get to the <result> element in a SOAP envelope, once your SOAP library has done that (on your behalf), you're left with something that (likely) maps very well to your problem space, r

        • What has me wondering about SOAP, is why we didn't end up with CORBA over HTTP (rather than IIOP). I guess ultimately it was the pig-headedness of the OMG that slowed CORBA down - if they'd had some foresight into firewall smashing, and leveraging the web, they may have won out.

          The sad fact is that SOAP is simply slow in Perl, so for performance SOAP applications you need to use another language. And that's when it stops being fun.
  • This is a perfect application for doing it "the old way".

    The old way we could have had caching and all sorts of neat things too for free. And they could have used standard http auth for authentication instead of having to reinvent that too.

    --

    -- ask bjoern hansen [askbjoernhansen.com], !try; do();

    • But HTTP authentication is soooo second millenium. All the cool kids are doing neat-o kewl SOAP stuff now! And all of the k-rad SOAP APIs support the new widgets, not the krufty old HTTP auth.

      :-)

  • I don't think that the guy that made the schema they were using hated XML. In fact, I'm sure it's the same person that made the SOAP API (at least, I wouldn't be surprised.

    Thinking that XML is verbose is no reason to use unreadable element names. The reason the schema was designed that way was for the same reason form parameters are unreadable in most search engines: if you get several million requests a day, naming a form parameter q instead of query gains you some bandwidth. In fact, it gains you

    --

    -- Robin Berjon [berjon.com]

    • I think some Perl users wouldn't really care if they had SOAP or XML. All that matters is that they can:

      use WWW::Search::Google::type;

      my $results = WWW::Search::Google::type->new($key, $query);

      etc.

      where type is either SOAP or XML or HTML or whatever.

      Heck, even with the type being transparent. You just specify that you want to use Google, and it picks the protocol it can use dependent upon what modules the user has installed.

      So long as there's a nice package that someone has written and they all pre
      --
        ---ict / Spoon
      • I guess the difference is that Java and .NET people don't have CPAN. So they'd either expect Google to release a package for doing the queries - one each for Java and .NET, or they'll simply prefer the SOAP way. I think since Google would probably prefer not to develop APIs for different languages then SOAP is more appealing to them.

        One thing that intruiges me though is the possibility to do server-side XSLT so that you can provide both a SOAP API and a plain XML API, from the same server using the same co
        • How processing time efficient would that be compared to having two separate ones?

          Just curious since I am sure Google like to keep things efficient processing-wise.
          --
            ---ict / Spoon