by kraada » Mon Sep 03, 2012 9:57 am
Going for custom stats is going to make everything take much longer because we need to calculate in addition every player's value for that stat.
Here's what's going on - any expression in the Filters Expression area is added as a restriction in the WHERE clause of your SQL query. So your normal SQL might be something like:
select X from T where W;
where X is the data you want to get, T is the table(s) involved and W are the restrictions. In this case, we're putting extra restrictions on the query used to get our final data.
The expression we've got here is:
cash_hand_player_statistics.id_hand in (SELECT chps.id_hand from cash_hand_player_statistics chps where chps.flg_p_first_raise and chps.id_player in (SELECT p.id_player from player p, cash_cache cc where p.id_player = cc.id_player group by p.id_player HAVING ((( (CASE WHEN ( sum(cc.cnt_p_3bet_def_opp_when_open_raised)) <> 0 THEN ((sum(cc.cnt_p_3bet_def_action_fold_when_open_raised) * 1.0 )/( sum(cc.cnt_p_3bet_def_opp_when_open_raised))) ELSE 0 END) ) * 100) BETWEEN X and Y)))
So in this case we're restricting to times where a hand (cash_hand_player_statistics.id_hand) matches a subquery (SELECT ....). So what does our subquery do?
We select hands (chps.id_hand) from a clean copy of cash_hand_player_statistics (which we reference as chps) where the player made the first raise (chps.flg_p_first_raise) and the player (chps.id_player) matches a subquery (SELECT ....) which picks out the players who have the stat value we want (in this case folding to 3bet after raise between X and Y).
This final subquery will be fairly quick because the stats only need to retrieve and divide two numbers (assuming the denominator is nonzero). If you use a custom stat it would need to retrieve the player's action for every hand, check to see if it should be added, if so, add it, then after all times and opportunities have been added up do the division. This would need to be done for all players. So I hope it's clear why cached values are much preferable here.
But for completion's sake I'll break down the last subquery:
(SELECT p.id_player from player p, cash_cache cc where p.id_player = cc.id_player group by p.id_player HAVING ((( (CASE WHEN ( sum(cc.cnt_p_3bet_def_opp_when_open_raised)) <> 0 THEN ((sum(cc.cnt_p_3bet_def_action_fold_when_open_raised) * 1.0 )/( sum(cc.cnt_p_3bet_def_opp_when_open_raised))) ELSE 0 END) ) * 100) BETWEEN X and Y))
This selects player id values from a clean player table and cache table. We join so that id_player refers to the same player in both tables. We group by players since we want to be returning player data and we want players that have a certain property (HAVING is different from WHERE because WHERE checks to see that some data matches before it's added to the result set, HAVING checks data in the result set and only displays rows which match the check). The CASE statement makes sure that the denominator of our fraction is not 0 (we'd get divide by zero errors then), and if it doesn't, then we add up the times and opportunities of the columns for our stat and divide and make sure that value is between x and y.
To change it to use a custom stat you'd basically want to change data in the HAVING clause to the SQL to tabulate and divide by your custom stat - but since that will require joining to (at least) the cash_hand_player_statistics table again which is the largest table in the database containing the most data - reading every row for every player is a whole lot of extra number crunching.