Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Ovid (2709)

Ovid
  (email not shown publicly)
http://publius-ovidius.livejournal.com/
AOL IM: ovidperl (Add Buddy, Send Message)

Stuff with the Perl Foundation. A couple of patches in the Perl core. A few CPAN modules. That about sums it up.

Journal of Ovid (2709)

Tuesday June 30, 2009
06:48 AM

Never Let Them Read From Your Database

[ #39192 ]

An imaginary conversation synthesized from past discussions and the responses I wish I made.

  • Customer: We need read-only access to your database.
  • Ovid: No.
  • Customer: Please?
  • Ovid: No.
  • Customer: But I need ad-hoc queries.
  • Ovid: Your ad-hoc cartesian join returning 12 billion rows was real fun.
  • Customer: I promise I won't do it again.
  • Ovid: That's what you said about the ad-hoc cartesian join returning 10 billion rows.
  • Customer: But this time I mean it.
  • Ovid: So do I. We guarantee backwards-compatibility in our API, not our database. If we move a field from one table to another, your queries will break.
  • Customer: Then tell us when you do that.
  • Ovid: We did that with another team and had to keep delaying releases while they updated their system.
  • Customer: Then you can provide views to maintain backwards-compatibility.
  • Ovid: We do that already. "View" as in "Model-View-Controller". It's part of our REST API; you should check it out.
  • Customer: But your REST API doesn't provide all the information I need.
  • Ovid: It provides more than the information you need because much of it represents knowledge not stored in the database. If you need more information, let's see what we can do to add this to the API.
  • Customer: Why are you being so difficult?
  • Ovid: Because your temporary convenience is not more important than my long-term pain.

Don't let external customers read directly from your database. Just don't. The usual justification is the need to support ad-hoc queries. Get a few samples and try to figure out a general mechanism to support their actual business needs. If you let them read from your database, they will become dependent on this and beg you to hold off database changes or complain if you don't. As your project grows larger, the pain grows more severe. They will have the best of intentions, but good intentions mean nothing when you need to coordinate your internals with people who should know better than to violate encapsulation.

As a side note, ad-hoc queries, even if not causing performance issues, could potentially be dangerous if the people making them aren't really thinking them through. The problem is two-fold. One, they might not be really paying attention to their core business needs (this is subtle and hard to explain, but common). The other problem is that they might very well be making a query that your API already supports, but because they don't rely as much on your API, they don't know it.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • Why not give them a dump of your whole database and let them load it on their own server? It won't keep up to date, but if ad hoc queries are all they need then that won't matter. Reinventing the wheel by implementing your own home-grown query language instead of SQL may make sense in some cases, but it's not necessarily the best way.
    --
    -- Ed Avis ed@membled.com
    • Thought about that, but they need live data. Our data changes rapidly and being even one day out of date is like playing the stock market by reading a day old newspaper (well, ok, not quite that severe :). It would be good to have a series of read-only slave servers, but that still puts us in the position of them insisting that we can't make that important database change just yet. We've had that happen enough times that we have nasty hacks in our code and database [perl.org] to work around these issues.

      • How about an interface that lets them submit arbitrary SQL queries, but checks them against a whitelist first. So for example your customer might say 'we need to SELECT COUNT(*) FROM FOO' and you would say 'that seems fine, I will add it to the list'. The next day they ask for 'SELECT FRED FROM BAR' and you decide no, the FRED column is an implementation detail I don't want to support forever, so I will not allow them to make that query. That way you have control over what's happening.

        If they want a parti

        --
        -- Ed Avis ed@membled.com
  • At Qwest we had a snapshot of the production database for people to run reports off of. It had a few limitations (you couldn't snap some of the larger field types), but it worked pretty well.

    Also, with Oracle you can set IO limits on a per account basis, so queries like the one you mention would simply timeout after a while.

    Does MySQL provide such features?

  • ... good intentions mean nothing when you need to coordinate your internals with people who should know better than to violate encapsulation.

    Perhaps Perl 5 should have a REST API for writing extensions.

  • I work on a project where the API is three years out of date and has gone through five licenses in those three years. Direct database access wins, sorry.
    • I don't know what "three years out of date" means in this context, but if you mean that neither management nor developers have bothered to address customer concerns for three years, than there are far larger problems than direct database access. If you meant something else, than I guess I can't respond to that :)

      Nor do I know what five licenses has to do with the situation.

  • You're telling me. Is that different than the norm? I find it to be very normal, as meeting all customer needs in a given release cycle is impractical. The 5 licenses are important for accessing the API. Under some of them, the customer cannot use it without paying, while under others, they can. If my code used the API instead of the database, I'd have to rewrite during those changes. I didn't even mention the second (also incomplete) API that's struggling for relevance at Layer 8/9. I suspect you're talk