converting multiple databases

Experiencing technical difficulties? Think you've found a problem with PokerTracker 3? Report it here.

Moderator: Moderators

converting multiple databases

Postby Hercules » Wed Feb 13, 2008 7:27 pm

So far it seems that beta4 has solved the import/converting crashes for me.
As an experiment, I want it to convert all my PT2 databases into 1 big mean mf'ing PT3 database, containing >25 million hands.

1. Is there any way I can set beta4 to convert multiple databases at once, so it can run while I sleep?

After converting the first database, i noticed a >100% increase in database size; the old PT2-database is 1.9GB, the new beta4-database is 4.1GB.
2. I have ~100GB in 30 PT2-databases, should I expect to have a 200GB beta4-database when I am done?
3. While clustering, beta4 needs nearly twice as much diskspace. Do I need a 400GB HD for this experiment?
Hercules
 
Posts: 78
Joined: Fri Jan 25, 2008 2:38 pm

Re: converting multiple databases

Postby Josh » Wed Feb 13, 2008 9:16 pm

1. Not at this time.
2. Did you cluster after the conversion? This does a vacuum, which reclaims a lot of space taken with the UPDATE statements PT3 does
3. Twice as much diskspace as what? I'm not sure I understand.
Josh
Site Admin
 
Posts: 6222
Joined: Sat Dec 08, 2007 6:03 pm

Re: converting multiple databases

Postby Hercules » Thu Feb 14, 2008 12:37 pm

Yes, I clustered after the conversion, but any beta4-database is twice as big as the PT2-database it was converted from.
When I export all the hands from the (1.9GB) PT2-database and import them in a new beta4-database, the beta4-database is twice as big as well.

In my example, the beta4-database was ~4.2GB before clustering.
While clustering the beta4-database expanded to ~8GB.
After clustering finished, the beta4-database was ~4.1GB.

I guess clustering creates a copy of the database before it actually starts clustering, and deleting that copy might be the last thing it does before clustering is complete.
That's also why I asked if a 200GB database needs a 400GB harddisk...

Oh well, I'll find out soon enough... in an hour I will begin creating this huge database on an empty 500GB drive.
Hercules
 
Posts: 78
Joined: Fri Jan 25, 2008 2:38 pm

Re: converting multiple databases

Postby _dave_ » Thu Feb 14, 2008 6:04 pm

[quote="Hercules"6dq]I guess clustering creates a copy of the database before it actually starts clustering, and deleting that copy might be the last thing it does before clustering is complete.
[/quote6dq]

That is pretty much exactly what it does. CLUSTER is very much like a "defragment" on a hard disk. A contrived example - after a big import all the rows for player "Hercules" are scattered all over the filesystem, mixed up with all the other data of other players. After CLUSTER, it will hopefully have re-ordered the filesystem so all "Hercules" hands are right next to each other in order on the disk - so when it comes to retrieve your stats, it can just suck them up freom the disk in almost a single swoop. sort of :)

here's the docs that explain it properly:

http://www.postgresql.org/docs/8.2/stat ... uster.html

Josh - interesting thing on there regarding the alternative method just game me a thought about those experiencing *very* lengthy cluster and posting about it... maybe they are low on RAM (or running Vista) so they are using Windows swap (likely on the same disk as the database)? This has potential to cause massive grinding of the disk, making CLUSTER execution very slow indeed...
_dave_
 
Posts: 1147
Joined: Sun Dec 09, 2007 6:19 pm

Re: converting multiple databases

Postby APerfect10 » Thu Feb 14, 2008 6:07 pm

[quote="_dave_"90s]Josh - interesting thing on there regarding the alternative method just game me a thought about those experiencing *very* lengthy cluster and posting about it... maybe they are low on RAM (or running Vista) so they are using Windows swap (likely on the same disk as the database)? This has potential to cause massive grinding of the disk, making CLUSTER execution very slow indeed...[/quote90s]

I would venture to guess that is exactly the problem...

-Derek
APerfect10
Site Admin
 
Posts: 4466
Joined: Sat Dec 08, 2007 6:03 pm

Re: converting multiple databases

Postby Hercules » Thu Feb 14, 2008 7:34 pm

Win2K, 4GB RAM, no swapping here.
swapfile should be on another drive when there's gonna be swapping.

I accidentally converted a tiny 853-hand database to the beta4-database with 1 million hands.
auto-clustering after import took almost 2 hours. :o
Hercules
 
Posts: 78
Joined: Fri Jan 25, 2008 2:38 pm

Re: converting multiple databases

Postby APerfect10 » Fri Feb 15, 2008 12:06 am

That does not surprise me if your drive is a 7200RPM. It takes me approximately 30 minutes to cluster 500K hands with 2x10K in RAID 0 so that sounds about right...

Best regards,

Derek
APerfect10
Site Admin
 
Posts: 4466
Joined: Sat Dec 08, 2007 6:03 pm

Re: converting multiple databases

Postby _dave_ » Fri Feb 15, 2008 12:28 am

[quote="APerfect10"ws6]That does not surprise me if your drive is a 7200RPM. It takes me approximately 30 minutes to cluster 500K hands with 2x10K in RAID 0 so that sounds about right...
[/quotews6]

I suspect significant gains can be had from TABLESPACE to a dedicated partition.
[codews6]
postgres@newdb /home/dave $ time psql -d "PT3a31-DB1" -c "CLUSTER;"
CLUSTER

real 1m25.845s
user 0m0.002s
sys 0m0.004s
[/codews6]

~40K hands on a single 160GB 7200 drive, low end CPU, 2GB RAM. Not cached (nor even accessed) beforehand.

CLUSTER should be pretty much linear, no?
_dave_
 
Posts: 1147
Joined: Sun Dec 09, 2007 6:19 pm

Re: converting multiple databases

Postby APerfect10 » Fri Feb 15, 2008 1:42 am

I am not about to even think I have a remote clue as to the algorithm they are using but based on the fact that the database is essentially being copied, you would think that it would be near linear. Quite frankly, the entire clustering algorithm does not make much sense to me but I am sure they have a reason for it. I do not understand why they would not simply do a sequential scan like any defragment, move out/move in the files. That seems to be a much faster approach (or at least a much more storage friendly approach). I am not sure why PostgreSQL would expect the user to only fill 1/2 of their hard drive with the data leaving enough room to replicate and cluster.

Best regards,

Derek

PS. Or maybe the copy, move-in only method is faster but it still does not make sense from a storage stand point, IMHO.
APerfect10
Site Admin
 
Posts: 4466
Joined: Sat Dec 08, 2007 6:03 pm

Re: converting multiple databases

Postby Hercules » Fri Feb 15, 2008 9:26 pm

FWIW:

Yesterday I installed PostgreSQL 8.3 on an empty (7200PRM) drive.
So far I have converted 7 PT2 databases, totalling 3,530,677 hands. These 7 databases use 12.2 GB of diskspace.

In the PT3beta4-database are 3,530,509 hands. The database is 18,5 GB.
Clustering obviously takes longer after every converted database, last time it still needed <1 hour.

For some reason things work out a helluvalot better now compared to the 1st attempt:

[bt0p]1st attempt[/bt0p]: empty 7200RPM 250GB SATA drive, using PostgreSQL 8.2.
3,5 million hands PT2 databases use 12.2 GB of diskspace, the new PT3beta4 database was 25.5 GB. (+109%)
I nuked PT3beta4 when clustering the database was running for >5 hours.

[bt0p]2nd attempt[/bt0p]: empty 7200RPM [color=#FF0000t0p]500GB[/colort0p] SATA drive, using PostgreSQL [color=#FF0000t0p]8.3[/colort0p].
3,5 million hands PT2 databases use 12.2 GB of diskspace, the new PT3beta4 database is 18.5 GB. (+52%)
Clustering the database just completed in 56 minutes and 48 seconds.

These differences are weird. I can't imagine that PostgreSQL 8.3 is doing THAT much better.

Anyhoo, more databases are screaming to be converted. The goal is 27,5 million hands. Should be done within a week.
Hercules
 
Posts: 78
Joined: Fri Jan 25, 2008 2:38 pm

Next

Return to Technical Support / Bug Reports [Read Only]

Who is online

Users browsing this forum: No registered users and 55 guests

cron
highfalutin