I used to be interviewing people for various positions which relate to web development. Not to say that it is a routine process, but with time pass I have a collection of my favourite questions which I ask often. Yesterday I put one more question into it. Here it is.
Suppose that you have a website with deeply nested sections. Each concrete section has its own part in the URL between slashes, and may be a subsection of the section of higher level. No depth limit is said, neither any restrictions are allowed in duplicating section names. Thus we may face with URLs like
Every section's URL part is stored in the database in a table with three fields: ID of the section, its URL and an ID of the parent (if present), like this: (1, 'alpha', 0), (2, 'beta', 1),
The task is to propose fast algorithm for parsing the URL and obtaining corresponging section ID. IDs of parents are not necessary, neither is the depth level. Just find an ID of the section described by an URL. Note that storing in the database full URLs with slashes is prohibited.
What am I missing? (Score:1)
Your algorithm is already forced on you by your data structure. You've left little choice except the obvious split followed by querying each piece in turn in a loop. Your time will be dominated by the time for round trips to the database. The two ways to improve that are to avoid round trips to the database - eg by caching information locally (memcached comes to mind), or by not going out of the database for the round trips (you can do this with a stored procedure).
Assuming that the paths are relatively
Re: (Score:1)
Re: (Score:1)
The "should be resolved without recursive algorithms" sounds to me like you believe some myth about performance.
The database doesn't care what the code looks like that makes the database calls, it just cares what the calls are. So it can't even see that the code is recursive.
You can see the recursion on the code that runs on webservers. There the performance differences are marginal, and your webservers should not be a bottleneck since you can always just add more webservers.
Therefore the dominating conce
Re: (Score:1)
Maybe I did not explain my though clear, but I am not against recursion in general; many things look easy in a form of recursion :-)
What about /url/parsing/mechanism, than only one loop (no recursion) is needed to get the last ID,
for (@uri_part) {select id ... where uri = $_ and parent_id = $prev_id}.But for common websites (where there are only few sections with duplicated names) it is even easier not to make one query per level. It might be faster to select all the data for all URI parts in one query usi
self-join (Score:1)
One option would be to build SQL for a self-join at run-time.
So /alpha/beta/ would produce a query:
select f2.idfrom foo f1, foo f2
where f1.parent = 0 and f1.name = 'alpha'
and f2.parent = f1.id and f2.name = beta'
This won't work if the path can have extra information after the structure. eg. /alpha/beta/productid/add
If the tree can be deeply nested, I'd go for a different data structure.
Re: (Score:1)