Disclaimer: I imagine if you are reading this that you know the difference between the adjacency list and nested set methods for storing hierachical data. If not, you should read this first: http://dev.mysql.com/tech-resources/articles/hierarchical-data.html
I’ve been working on a symfony project with a hierarchical category structure, fairly standard sort of a thing. At the start I tried to use nested set and gave up due to problems getting fixtures to load – something thats crucial in order to run the project’s test suite, but also to retain the ability to easily rebuild the model and reload the data.
I went on my merry way, using the adjacency list model instead (parent_id style), and all was fine. Until the system went into production – it started to slow down. Upon looking, the number of categories now in the system and the depth of the tree meant that my methods for getting all children of a category were generating massive numbers of sql queries or, with a few changes, were generating huge queries with many joins. Neither performed that well, and whilst I could see ways to improve the situation, it felt too complicated and not ideal, plus I knew nested set would be much better and simpler.
So, some might say stupidly, I decided I had to throw a couple of hours at getting nested set fixtures to load under propel. The forum posts etc I had looked at last time still didn’t have a solution and I couldn’t find another solution either – so I went down my own path.
And, I did kind of get it to work. Sure, I cheated a bit (as you’ll see) but it gets the job done, and the cheat only impacts on data load – functionally the nested set is untouched.
Here’s what I came up with:
schema.yml for the table:
And a couple of examples from the fixtures:
You may have noticed what I’m up to. I’ve got an adjacency list style parent_id column in the model, as well as the scope/lft/rght fields for nested set, and am using the former to store the tree structure. The only exception is the root category, which has root: 1 specified – this causes it to be the nested set root. When this data is loaded, all of the other categories get added as a child of root it according to the nested set fields (but we still have correct info using the parent_id column).
(NOTE: I’ve found that loading a lot of data into the category table using this mechanism is slow – thousands of records take 10 minutes plus. Assume the nested set stuff is adding some overhead. Not good for a dev environment where you quickly need to load data. Acceptable to me as I have a separate set of data for testing/dev that only has a small number of categories.)
What we have now is still useless – we have a data model that contains two mechanisms for representing the hierachy, one we want to use and one we don’t, and our data is in the one we don’t want to use. However, we can write a quick function to use the adjacency list data to populate the nested set data.
I added the following to my CategoryPeer:
This recurses down the tree as per the adjacency model and sets the nested set fields. Once done, we can use the nested set methods to manage and query the data – this method only needs to be run when the data is loaded. This means that from then onwards we can forget this hack and use nested set normally.
I already had a symfony task I had to run post data load to do several things, rebuilding a lucene index and clearing a cache – so I’ve made this task now also call CategoryPeer::rebuildNSTree().
To get back to my reason for doing this – I was able to get the complicated/many query operations to find all children down to a single simple query using WHERE lft>x AND rght < y – which is much much faster/cleaner etc.
Hope this helps – hopefully this issue will get resolved in the future and we’ll be able to load the fixtures properly.
(Tested against symfony 1.2.2)
2 Responses
jon bennett
07|May|2009 1Morning Ben,
Sounds like this http://book.cakephp.org/view/91/Tree would be ideal, if only you could switch to #cakephp.
haha, just joshing, but seriously, if you need N level depth and speed, you want MPTT.
j
ben
07|May|2009 2Ah, my first ever non porn spam comment, even if it is bigging up Cake PHP
MPTT is what the Propel (symfony’s orm) nested set behaviour implements. Its just that it seems incapable of loading the fixtures data in properly without this horrible hack to set up the lft/rght values.
I have used the Cake one before – its how I knew the performance gain was big enough to be worth messing about doing the above for!
Symfony also allows you to use Doctrine as the ORM – the nested set behaviour on doctrine seems to have fewer issues.
Leave a reply