Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

andy.sh (8643)

andy.sh
  (email not shown publicly)
http://andy.sh/

Journal of andy.sh (8643)

Thursday July 31, 2008
01:57 PM

Collection of job interview questions (1)

[ #37072 ]

I used to be interviewing people for various positions which relate to web development. Not to say that it is a routine process, but with time pass I have a collection of my favourite questions which I ask often. Yesterday I put one more question into it. Here it is.

Suppose that you have a website with deeply nested sections. Each concrete section has its own part in the URL between slashes, and may be a subsection of the section of higher level. No depth limit is said, neither any restrictions are allowed in duplicating section names. Thus we may face with URLs like /alpha/, or /alpha/beta/, or /alpha/beta/gamma/, and also /alpha/gamma/beta/, which is a separate leaf of site tree.

Every section's URL part is stored in the database in a table with three fields: ID of the section, its URL and an ID of the parent (if present), like this: (1, 'alpha', 0), (2, 'beta', 1), ..., (N, 'beta', M). Last 'beta's are different sections with the same piece of a URL.

The task is to propose fast algorithm for parsing the URL and obtaining corresponging section ID. IDs of parents are not necessary, neither is the depth level. Just find an ID of the section described by an URL. Note that storing in the database full URLs with slashes is prohibited.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • Your algorithm is already forced on you by your data structure. You've left little choice except the obvious split followed by querying each piece in turn in a loop. Your time will be dominated by the time for round trips to the database. The two ways to improve that are to avoid round trips to the database - eg by caching information locally (memcached comes to mind), or by not going out of the database for the round trips (you can do this with a stored procedure).

    Assuming that the paths are relatively

    • I expect to hear at least two things: 1) repeated correct queries should be resolved without recursive algorithms (even if DB is used for every request) and 2) they must be cached to avoid DB calls for those URLs which look good in structure but lead to page 404. There is no the only answer, so both correct or incorrect answers will be suitable and accepted. The main goal is to understand whether a person is able to think about bottle necks of his/her algorithm.
      • The "should be resolved without recursive algorithms" sounds to me like you believe some myth about performance.

        The database doesn't care what the code looks like that makes the database calls, it just cares what the calls are. So it can't even see that the code is recursive.

        You can see the recursion on the code that runs on webservers. There the performance differences are marginal, and your webservers should not be a bottleneck since you can always just add more webservers.

        Therefore the dominating conce

        • Maybe I did not explain my though clear, but I am not against recursion in general; many things look easy in a form of recursion :-)

          What about /url/parsing/mechanism, than only one loop (no recursion) is needed to get the last ID, for (@uri_part) {select id ... where uri = $_ and parent_id = $prev_id}.

          But for common websites (where there are only few sections with duplicated names) it is even easier not to make one query per level. It might be faster to select all the data for all URI parts in one query usi

  • One option would be to build SQL for a self-join at run-time.

    So /alpha/beta/ would produce a query:

    select f2.id
    from foo f1, foo f2
    where f1.parent = 0 and f1.name = 'alpha'
        and f2.parent = f1.id and f2.name = beta'

    This won't work if the path can have extra information after the structure. eg. /alpha/beta/productid/add

    If the tree can be deeply nested, I'd go for a different data structure.

    • Yes! :-) When you have three, four or more nested parts you get too complicated SQL query which is very good for making DDOS attacks with wrong URLs.