Page 1 of 4

is this for real? (import speed)

PostPosted: Thu Nov 19, 2009 10:41 am
by almaplayera
Image

92 hours and counting. this is on a server box running only PT3 & postgres with 2GB RAM and 15k rpm disk.

5 million hands is not a lot of hands! i want to import 50 million hands...like...regularly... i need to be able to deal with databases this size without so much waiting around.

i've seen the performance tuning stuff and i'll try to tweak some of the settings, but i imagine they have a minor impact. is anyone able to import millions of hands in a matter of hours rather than a matter of days/weeks? if so, you must be doing something fundamentally different than me, and i'd love to know what it is.

Re: is this for real? (import speed)

PostPosted: Thu Nov 19, 2009 10:44 am
by kraada
The last time I ran a manual import of 2M hands clustering took between 5 and 6 hours. I don't think my computer's specs are all that impressive, so there must be something else slowing things down for you. What sort of anti-virus/anti-spyware/firewall software are you using?

Re: is this for real? (import speed)

PostPosted: Thu Nov 19, 2009 10:46 am
by almaplayera
no firewall, a/v, or any other software at all is installed. windows firewall is enabled... PT3 and postgresql are the only apps installed.

Re: is this for real? (import speed)

PostPosted: Thu Nov 19, 2009 11:24 am
by kraada
What version of PostgreSQL do you have installed? Also, do you still see files being created/edited/changed in the C:\Program Files\PostgreSQL\8.3\data\base directory?

Re: is this for real? (import speed)

PostPosted: Thu Nov 19, 2009 11:37 am
by almaplayera
postgresql v8.4.1, and yes there is still plenty of write activity to that directory

Re: is this for real? (import speed)

PostPosted: Thu Nov 19, 2009 12:01 pm
by kraada
I'll look to see if I can find anything different about the 8.4.x cluster mechanism; my tests were using 8.3.x.

If there's still write activity, it's still really clustering so definitely let it finish. Oh, how much hard drive space do you have free?

Re: is this for real? (import speed)

PostPosted: Thu Nov 19, 2009 12:58 pm
by almaplayera
the size of the db is about 40GB (btw, is that the expected size for a db w/ 7-8 million hands?). until a couple hours ago, there was only about 20gb of free disk space. i moved some files and now there is 44GB of free space.

i've read that not having enough space to make a complete copy of the db can slow clustering. but the import itself took 30+ hours for 4 million hands (+ 1 million duplicates). and even if the cluster was twice as fast with more free disk space, that means the whole process would be 60+ hours. and it still has to do vacumn, analyze, and cache update. supposedly the cache update can take as long as the cluster. i mean, this is ridiculous. i'm looking at 100+ hours to import 5 million hands? i need to be able to import 50 million in 5 hours.

Re: is this for real? (import speed)

PostPosted: Thu Nov 19, 2009 1:39 pm
by kraada
That wouldn't have been enough free space and certainly slowed things down for you.

Updating the cache should not take as long as cluster; it takes about the same amount of time on smaller databases, as databases get larger updating the cache is still the second longest function but it shouldn't take nearly 30 hours (but expect it to take a few hours).

We are still investigating what changes were made in the 8.4 revision of PostgreSQL to see if we can streamline anything on our end; last I heard there was nothing particularly useful there.

One thought for you, in terms of performance: If you make the move to SSDs, you won't need to run cluster at all. That would save you a ton of time, though SSDs are not cheap. To do back-of-the-envelope calculations, if your hands are full ring, 1M hands = 10G on disk. so 50M hands would be 500G on disk - that would be quite a lot of space in SSDs, though prices are getting cheaper all of the time.

Re: is this for real? (import speed)

PostPosted: Thu Nov 19, 2009 2:00 pm
by almaplayera
thing is, the 15k disk i'm running on now performs way better than a 7200RPM disk. it's not as fast as SSD, but it's not that far away.

forget 50 million, just looking at 5 million: you said clustering 2M took 6 hours for you on an average PC. so i assume the import was about the same, and the cache update was maybe 2-3 hours? so a total of 15 hours for 2 million. so on an average PC, users should expect importing & housekeeping for 5 million hands to take upwards of 40 hours?

Re: is this for real? (import speed)

PostPosted: Thu Nov 19, 2009 2:14 pm
by kraada
No, I certainly wouldn't expect that.

But you already admitted that there was not enough free space on the disk, which distinctly slows things down by a large margin. Now that you've freed the space up things should go more quickly. Also, once the database has been clustered the benefit from the cluster will remain for future clusterings, so the next one wouldn't take nearly as long. The other normal reasons for slower results don't seem to apply (e.g. having an antivirus read every file that you open), and we certainly haven't changed anything; cluster is a PostgreSQL process, not a PT3 process.