BillGatesIII wrote:With all respect, whatever performance improvement you might achieve (if any), I think it's way too little to spend much energy on.
But, because you asked, here are some answers
whiskyjohn wrote:...In my initial post, I noted that if I simply reverse the order of summation, I can reduce
the number of comparisions because of the relative frequency of Flop/Turn/River actions.
Reversing the order does not mean it will be executed in that order by PostgreSQL. Like I said, it might be a parallel execution or in whatever order the optimizer thinks will be best. Just for the fun, I did rewrite a stat so it looked like this.
- Code: Select all
sum(case
when enum_r_cbet_action = 'C'
then case
when enum_t_cbet_action = 'C'
then case
when enum_f_cbet_action = 'C'
then 1
else 0 end
else 0 end
else 0 end)
I'm not even sure if it did force PostgreSQL to execute from top to bottom because the execution plan and times where the same as for the original stat
So maybe it always evaluates everything, which I doubt because the documentation says otherwise, I don't know.
Regardless of the execution plan, I would not expect to see much change in execution time unless millions of records
are processed. Assuming for a moment that Postgres did execute in the summation order written, the idea is that the
effect would be cumulative. As I wrote in response to kraada, one of the issues I have is that I cannot adequately test
if any of these changes produce a measurable effect because rebuilding the cache is already a long, tiresome process.
I wouldn't mind if a few simple reorderings saved an hour on a 24-hour rebuild but I am not going to run a lot of tests
just to see if anything changed.
BillGatesIII wrote:whiskyjohn wrote:...Across all players, the order Ci,Cj,Ck in S1 is the same, but of course their values are all different because the players
are all different. Does Postgresql/PT process stats S1..Sn one player at a time before proceeding to the next player?
The short answer is no. The long answer depends on what you're looking at. Retrieving non-cached stats? Updating the cache with the last hand played? Rebuilding the cache? Running a report? Retrieving a cached stat?
For now, I am content just to discuss rebuilding the cache and retreiving a cached stat for the HUD. Is there reason to think that
doing the same for a non-cached stat would execute differently? I have maxed out the number of cached stats so I do use a mix
of both.
BillGatesIII wrote:whiskyjohn wrote:...Likewise, for the same player, the order of Ci,Cj,Ck may be the same in all stats Sn where they are used, but again the values
are all different because the preceding conditions C0..Ci-1 may be different. But perhaps this is not the right way to think
about a summation. That is, proceeding from top to bottom, getting smaller in each successive check, until finally arriving
at the result with the last term in the summation. This is how I traditionally think of it, and why reordering is efficient.
Instead, I suppose all conditions could be counted as true or false for all hands in the database, and then the "intersection"
of various combinations of conditions can produce the final result of summation. If that is true, perhaps this is what you
mean by "If for example A is used many times in different columns, the optimizer will first check the value of A before checking
B and C (and maybe it's doing it parallel, I don't know) and store the result."
Can you comment a bit more please? It is certainly possible that Postgresql is looking at how many times Ci, Cj, and Ck are
referenced across all stats Sn in order to "store the result". But without knowing a priori how one datatype compares to
another datatype, it would seem to require processing each datatype (player in this case) at a time, and also would require
that the summation operation proceed using the intersection-like method.
This part of your post I do not fully understand.
Well, I would like to find a more clear way to describe what is going on, but I would need to know more about Postgresql to do it.
You are basically saying that don't count on conditions to be executed in the order that I write them. I watched the video that
kraada referenced and there the concepts were more about how Postgresql decides the best way to search though database
records, depending on if a table is indexed or not, it's size, and other pieces of information. And the other part of the video
was about how tables could be merged.
These are all important concepts to managing database queries but it does not seem to address what I am after with summation
order. If I know a priori that Ck is true an order of magnitude less than Cj, and Cj is true an order of magnitude less than Ci, then
the best way to execute these conditions is Ck->Cj->Ci and not Ci->Cj->Ck. However, this assumes some kind of sequential search
of the records that reference the specific fields needed for comparisions in Ci, Cj, and Ck. This sequential search also requires
that the number of times Ci, Cj, and Ck are true depends on preceding conditions C0...Ci-1.
What I was trying to describe above is that one could view the entire summation like a kind of Venn Diagram and rather than
starting with condition C0, followed by C1, followed by C2... where evaluation of successive conditions produces fewer and
fewer matching records; instead the "intersection" of all conditions which are true produces the sum. Perhaps it is easier to
think of what I mean if I said color all C0s blue, all C1s green, all C2s yellow, etc. The sum is now all the records of the
database which are "painted" blue & green & yellow(like a bitmask if you know that construction). And furthermore, if I know
C0 is a frequently used condition, I can remember to leave all the C0 records painted blue when I go on to sum fields of another stat.
Now all this would be taking place with some temporary index or construction that requires more detail. That would be the
optimizers job. But I am not yet convinced such a plan is being constructed behind the scenes.
BillGatesIII wrote:Maybe an example will give you some answers. When PT4 updates the cache or retrieves non cached stats, part of the query will look like this.
- Code: Select all
sum(case when condition_a and condition_b and condition_c then 1 else 0 end) as "column_1", sum(case when condition_a and condition_b and condition_d then 1 else 0 end) as "column_2", sum(case when condition_d and condition_e then 1 else 0 end) as "column_3", sum(case when condition_a and not condition_b then 1 else 0 end) as "column_4", sum(case when not condition_a and condition_d and not condition_f then 1 else 0 end) as "column_4"
What I mean by storing (or remembering) a value is that for example if for a specific record the value of condition_a is false, for all of the columns using condition_a the remaining parts within the sum() for those columns don't have to be evaluated. And again, I don't know if PostgreSQL is that smart but it wouldn't suprise me if indeed it is that smart.
Well now here I don't fully understand what you say, but perhaps we are saying the same thing. Remembering/storing that "condition_a is false",
to quote your text above, is the same thing I mean when I say color C0 blue so that if C0 is referenced again, all records for which it is true(or false)
are already colored blue.
I'm not convinced Postgresql is so "smart", however, mainly because the optimizations I learned about were not being done at this level.
BTW, is a full vaccuum/analyze helpful in planning a better optimization path? Is there something more that can be done?
BillGatesIII wrote:whiskyjohn wrote:Thanks for any comments/clarifications.
John
You're welcome, always a pleasure having a conversation with you
Thanks. I don't really look on any of this as a waste of time. I am actually far more interested in the programming than the actual
poker playing. So learning more details about Postgresql, and how to design stats, and better mine the database is all of use to me.
John