Faster or not?

by **whiskyjohn** » Thu May 14, 2015 12:42 am

Hello,

Can someone familiar with Postgresql/Pokertracker internals tell me if there is any advantage to changing the order of terms in a summation?

For example, let's say I have a stat with the following conditions:

sum(if[...
AND cash_hand_player_statistics.enum_f_cbet_action = 'C'
AND cash_hand_player_statistics.enum_t_cbet_action = 'C'
AND cash_hand_player_statistics.enum_r_cbet_action = 'R'
, 1, 0])

Now since we humans play poker in order of Flop/Turn/River, I have written the summation in this order. However, it is clear
that the frequency of these things happening is in the reverse order. That is, a player will raise a River cbet far less often than
he will call a Turn cbet, than he will call a Flop cbet.

So my question is should I look to write these summations in a reverse order to speed up cache rebuild, report generation,
HUD population, etc. For one stat and one player, it won't make much difference, but for a large database and many stats,
is there a noticeable difference? Has anyone tested this?

Thanks for any comments,
John

by **WhiteRider** » Thu May 14, 2015 2:42 am

If your stat is going to be cached then it won't make any difference, other than to a cache rebuild.
Without anything to back it up, my thought would be that testing for the thing least likely to be true would be slightly more efficient (i.e. river first). I don't know how that actually translates to performance of PostgreSQL queries though.

by **whiskyjohn** » Thu May 14, 2015 3:07 am

WhiteRider wrote:If your stat is going to be cached then it won't make any difference, other than to a cache rebuild.
Without anything to back it up, my thought would be that testing for the thing least likely to be true would be slightly more efficient (i.e. river first). I don't know how that actually translates to performance of PostgreSQL queries though.

Yeah, that's basically my thinking too. In some programming languages there are debugging/optimizing tools you can use that
show you where a program is spending all its time. I don't know what's available for Postgresql or how to use it.

Changing the order around can make a stat far less readable for the human, which violates a cardinal rule of programming.
However, if speed is of the issue, some rules can be broken if the performance benefit is substantial. I will venture to guess
that the performance benefit will not be substantial, so I may tinker with this a little and then revert to what is most readable,
unless someone else can offer a better analysis.

John

by **kraada** » Thu May 14, 2015 7:29 am

The difference in speed is going to be so small that it's likely not worth worrying much about honestly.

by **whiskyjohn** » Thu May 14, 2015 8:06 am

kraada wrote:The difference in speed is going to be so small that it's likely not worth worrying much about honestly.

I'm not so convinced of that kraada. I have a fairly large database and I have maxed out the number of stats that can be cached
in one table.

The first thing I did was rewrite all stats not to use and string comparisions and then I did some initial reordering for Preflop
actions that put the rarest ones first. This did seem to improve the cache rebuild time noticeably. However, I can't really
profile a stat as I would like to because I have no experience using any Postgresql optimizations. The only thing I can do right
now is check the relative frequency of one action to another.

For example, look at this one simple case:

Action / Hands
enum_f_cbet_action = 'C' 51538
enum_t_cbet_action = 'C' 22803
enum_r_cbet_action = 'C' 9578

These numbers are taken from one of the most active players in my database. If I write a traditional Flop/Turn/River summation,
I will make about an order of magnitude more tests than if I simply reverse them. Since there is no reason to suspect this ratio
differs much for any player in my database, then it stands to reason that caching this one stat in an optimal order can potentially
be an order of magnitude faster, all other things being equal.

Of course, there are other factors involved, but I have created stats where the ratios are far more desparate than above. The
problem in optimizing stats is at least two-fold: 1) I don't have a good way to test the efficacy of my optimizations and 2)
rebuilding the cache is already a long, tiresome process.

Hence, any suggestions/comments by those more knowledgeable about Postgresgql or PT internals would help. It is easy to
say optimizations may not make much difference, but without more details, I will continue with a few experiments anyway.
Like I said, I have noticed a difference from my original constructions.

That said, I should also note that I was using Postgresql 9.3.x but have since upgraded to 9.4.1 for testing purposes. It could
be that the latest version is more efficient also. One hopes so, anyway.

John

by **kraada** » Thu May 14, 2015 9:28 am

I'd wager that removing the string comparisons contributed more than the ordering, but I'm interested in hearing the results of your further testing - please do keep us apprised.

by **whiskyjohn** » Thu May 14, 2015 10:10 am

kraada wrote:I'd wager that removing the string comparisons contributed more than the ordering, but I'm interested in hearing the results of your further testing - please do keep us apprised.

Yes, you are probably right about that. But I will continue with the testing.

In many cases, I can use one of two boolean or enumerated statements to check a condition per street, such as above.
Sometimes, I will need three:

AND cash_hand_player_statistics.amt_t_bet_facing > 0 AND not(cash_hand_player_statistics.flg_t_check_raise)
AND cash_hand_player_statistics.flg_t_fold

This is an example of a Heads Up Player who check/folds the Turn. I think that's the shortest way to write the action
without involving any string comparisions. But again, I could reorder them because t_fold is less frequent than t_bet_facing.
I put them in this order for readability.

BTW, it would be nice to have both check_raise_opp and check_raise_action. The former is mostly for symmetry but
the latter can make testing more concise and should speed up caching and the like.

John

by **kraada** » Thu May 14, 2015 2:15 pm

You'd need more to guarantee that your snipped has someone out of position - that player could raise in position when facing a bet and fold to a 3bet and that would qualify. For that matter, multiway he could call the bet and fold to a raise.

To guarantee a turn check/fold without testing strings this works:
cash_hand_player_statistics.cnt_t_raise = 0 and cash_hand_player_statistics.cnt_t_call = 0 and cash_hand_player_statistics.flg_t_check and cash_hand_player_statistics.flg_t_fold

If you didn't raise or call and checked and folded you couldn't have done anything else.

by **whiskyjohn** » Thu May 14, 2015 11:24 pm

kraada wrote:You'd need more to guarantee that your snipped has someone out of position - that player could raise in position when facing a bet and fold to a 3bet and that would qualify. For that matter, multiway he could call the bet and fold to a raise.

To guarantee a turn check/fold without testing strings this works:
cash_hand_player_statistics.cnt_t_raise = 0 and cash_hand_player_statistics.cnt_t_call = 0 and cash_hand_player_statistics.flg_t_check and cash_hand_player_statistics.flg_t_fold

If you didn't raise or call and checked and folded you couldn't have done anything else.

Sorry, I should have mentioned that position was already established before those lines that I wrote. Basically, I have written a lot of stuff of the form:

sum(if[ preflop checks for various Heads Up conditions...
flop conditions...
turn conditions...
river conditions...
,1, 0])

Now it is just a matter of ordering all these conditions to produce the fewest comparisions. The relative frequencies will be about the same for all players,
such as River conditions less frequent than Turn conditions, and so on. But sometimes there are a condition like a Turn re-raise may be far less frequent
than a River check/fold. So one could "optimally order" these conditions, but making sense of the stat by looking at it might prove difficult for the human eye.

But getting back to string comparisions, it is my assumption from many long years of coding, that integer comparisions are far far more efficient than
string comparisions. The overhead of calling a string function alone is slower, regardless of what it does. There are a couple places in my stats that I
have begrudingly used a regular expression for certain board combinations, but that is only to avoid a bewildering array of if/then/else conditionals that
would drive me batty! If PT would provide #define macro type ability that I have talked about before, it would all be far nicer to code and read.

John

by **BillGatesIII** » Fri May 15, 2015 8:57 am

Without any sources to backup my ideas, I'm pretty sure PostgreSQL optimizes queries so it doesn't matter if you write 'A and B and C' or 'C and A and B'. If for example A is used many times in different columns, the optimizer will first check the value of A before checking B and C (and maybe it's doing it parallel, I don't know) and store the result. So in my opinion, the best way to optimize query times is to be consistent with the database fields you use.

And like kraada said, I don't think you will notice any difference. PostgreSQL configuration, hardware and database design and size have a bigger influence on the speed of inserting, updating and retreiving records.

Faster or not?

Faster or not?

Re: Faster or not?

Re: Faster or not?

Re: Faster or not?

Re: Faster or not?

Re: Faster or not?

Re: Faster or not?

Re: Faster or not?

Re: Faster or not?

Re: Faster or not?

Who is online

14 day free trial
download now

PT4 Supported Sites

PT4 Videos

Press Center

follow us on