by kraada » Thu Dec 10, 2009 10:09 am
As far as I understand it, there are two different levels of fragmentation that we're talking about here.
First is the low level hard drive fragmentation. This is what your defragger is fixing - the physical files on the hard drive are written in multiple different locations (because of how hard drives write files) and the defragger makes it so that on the physical hard drive, the files are located just next to one another.
Then there's the "higher level" PostgreSQL fragmenting. PostgreSQL saves its database data in different files called nodes in your data/base/ directory. I will admit all of the technical details are beyond me, but the main purpose of clustering is to make sure all of the data in a given table is ordered intelligently in as few files as fit the parameters. That way fewer files need to be opened and the most used data is "near the top" so to speak.
You can see that these two functions are similar in theory but not actually related - so you can have a hard drive that needs defragging with a perfectly clustered PostgreSQL install, and vice versa. When you have a defragged HD and CLUSTER runs, the reason your hard drive gets more fragmented is simply the normal way hard drives get fragmented - files are written. When you have a perfectly clustered database, the reason the database gets "unclustered" is that you add (and remove) data from the database.
I hope that clears things up for you guys.