Moving From Run Expectancy to Win Probability

Using a Markov framework for estimating conditional Win Probability based on game situation

Oct 16, 2024

In a previous post, we demonstrated how to estimate Run Expectancy (RE) as a function of game state transition probabilities (either uninformed or informed). If you haven’t already, please give that a read as the methodology here will be an extension of that framework.

Previously, we had defined RE using a 28x28 transition matrix, comprised of 24 transient game states and 4 absorbing end-of-inning states. The system must eventually decay to one of the 4 absorbing states, and we calculate RE as the sum of runs scored during all at-bat steps.

Now, we will want to modify the system to calculate full-game (9-inning) RE probabilities rather than single-inning RE. Instead of calculating one average RE value, we will want to compute the probability distributions of each team’s total runs scored. Then, we can consider which score combinations would result in each team winning, which can be aggregated to produce Win Probability.

To do this, we will modify the game state definition to include the inning and the total number of runs scored. We will now define game state as:

\(Game State = \{inning\}-\{outs\}-\{runner_{{1b}}\}-\{runner_{{2b}}\}-\{runner_{{3b}}\}-\{totalruns\}\)

When one inning ends, the system transitions to the start state for the next inning (no outs and bases empty). For example, with two outs and no runners on base in the first inning, we expect a strikeout to produce a transition from 1-2-0-0-0-0 to 2-0-0-0-0-0.

Since we are only considering 9-inning Win Probability (no extra innings), we will consider all states in the 10th inning to be absorbing. Like before with our end-of-inning states, all game states 10-0-0-0-0-X can only transition to themselves. Then, rather than use separate reward and transition matrices, we can simply measure the probability of our system decaying to each absorbing state, since the state contains the final number of runs scored.

Just as in our RE demonstration, we will start with the methodology for calculating uninformed WP before extending to a framework for estimating informed WP. Full Python code (and accompanying CSV files) can be found in our GitHub repository.

As in our RE demonstration, we will start by loading historical MLB transition counts from 2021-2024 (the same used in the RE demonstration, which can be found here). Rather than use inning-specific transition counts, we will instead assume that transitions are equally likely regardless of inning. For one, this significantly increases the sample size (especially for less common starting states), making our transition probabilities more stable. More importantly, though, it removes bias that may arise from the fact that teams usually deploy their top relievers in later innings. We would rather inform our RE calculations deliberately.

This first step is to create synthetic transitions using our observation counts. We will replicate our transition data for every possible inning and starting score combination, modifying the start and end state for each transition based on the configuration and number of runs scored. In doing so, we must set a maximum possible score (we use 20) so that there are a finite number of absorbing states. All transitions that would otherwise result in a state with more total runs than our maximum threshold will be instead be capped.

As before, we will add synthetic absorbing rows so that all 10th inning states transition to themselves. Then, we create the transition matrix:

import pandas as pd
import numpy as np

######

# READ TRANSITION COUNTS FILE
df = pd.read_csv('mlb_transition_counts.csv')

# CLEAN END-OF-INNING STATES
df['inning_end'] = np.where(df['next_state'].isin(['3-0', '3-1', '3-2', '3-3']), 1, 0)
df['next_state'] = np.where(df['inning_end'] == 1, '0-0-0-0', df['next_state'])

# ADD INNING/SCORE TOTAL TO GAME STATES
max_score = 20
df_list = []

for inning in range(1, 10):
    for score in range(max_score + 1):
        df_i = df.copy()

        # INNING CHANGES
        df_i['inning_before'] = inning
        df_i['inning_after'] = np.where(df['inning_end'] == 0, df_i['inning_before'], df_i['inning_before'] + 1)

        # SCORE CHANGES
        df_i['score_before'] = score
        df_i['score_after'] = df_i['score_before'] + df['runs_scored']
        df_i['score_after'] = np.where(df_i['score_after'] > max_score, max_score, df_i['score_after'])

        # GAME STATES
        df_i['game_state'] = df_i['inning_before'].astype(str) + '-' + df_i['game_state'] + '-' + df_i['score_before'].astype(str)
        df_i['next_state'] = df_i['inning_after'].astype(str) + '-' + df_i['next_state'] + '-' + df_i['score_after'].astype(str)

        df_list.append(df_i)

df = pd.concat(df_list).reset_index(drop=True)

# ADD INITIAL/ABSORBENT STATE TRANSITIONS
initial_states = ['1-0-0-0-0-0']
absorbent_states = [f'10-0-0-0-0-{i}' for i in range(max_score + 1)]

initial_rows = [{'game_state': s, 'next_state': s, 'runs_scored': 0, 'transition_count': 0} for s in initial_states]
absorbent_rows = [{'game_state': s, 'next_state': s, 'runs_scored': 0, 'transition_count': 1} for s in absorbent_states]

df = pd.concat([df, pd.DataFrame(absorbent_rows), pd.DataFrame(initial_rows)], axis=0).reset_index(drop=True)

# TRANSITION MATRIX
transitions = pd.pivot_table(data=df, values='transition_count', index='game_state', columns='next_state', fill_value=0, aggfunc='sum')
transitions = transitions / transitions.values.sum(axis=1, keepdims=True)
transitions.fillna(0, inplace=True)

Before we were interested in Run Expectancy, which involved using our transition matrix to compute weighted averages for each starting state. Now that we have redefined our game state definition, we are interested only in which states our system will (eventually) decay to. Since the likelihood of the system being at each state in exactly n steps in found by raising the transition matrix to the nth power, we can find the probability distribution of decaying to each absorbing state by choosing a sufficiently large value of n.

# FINAL STATES
n_steps = 100
final_states = np.linalg.matrix_power(transitions.values, n_steps)
final_states = pd.DataFrame(data=final_states, index=transitions.index, columns=transitions.columns)[absorbent_states]
final_states.columns = [int(x.split('-')[-1]) for x in final_states.columns]

These values correspond to the likelihood of transitioning from every starting state to all absorbing states in 100 steps (more than enough at-bats for one team in any realistic game). Since the only information contained in each absorbing state is the total number of runs scored, we parse out this element and rename the columns accordingly.

We now have the uninformed probability distribution of total runs scored for a team given any possible starting state. However, to calculate Win Expectancy, we need to find the joint distribution of runs scored for both team. From a given game situation (inning, half inning, score, outs, runners), we can find the run distributions for each team. Then, we can make the assumption that the two distributions are independent to produce a joint distribution of final scores. Although this assumption may not be valid for informed RE values (depending on setup), it is valid for uninformed RE values.

There are several pre-processing steps required to create entries for all possible game situations and final scores. Due to the large size of the RE DataFrame, it is performative to drop unnecessary rows and split into smaller DataFrames before conducting join operations. We take the following steps to create a joined DataFrame of all possible game situations and final score combinations, along with the estimated likelihoods of each team scoring that number of total runs .

# MELT PIVOT AND RETAIN ONLY POSSIBLE STATES
final_states = pd.melt(final_states, ignore_index=False, var_name='runs_scored', value_name='probability').reset_index()
final_states = final_states.loc[final_states['probability'] > 0].copy()
final_states[['inning', 'outs', 'runner_1b', 'runner_2b', 'runner_3b', 'team_score']] = np.array(final_states['game_state'].apply(lambda x: np.array(x.split('-'))).to_list()).astype(int)

# DUPLICATE START-OF-INNING ROWS TO ACCOUNT FOR BATTING AND PITCHING STATES
final_states_batting = final_states.copy()
final_states_batting['bat_flag'] = 1

final_states_pitching = final_states.loc[final_states['outs'] + final_states['runner_1b'] + final_states['runner_2b'] + final_states['runner_3b'] == 0].copy()
final_states_pitching['bat_flag'] = 0

final_states = pd.concat([final_states_batting, final_states_pitching], axis=0).reset_index(drop=True)
final_states['pitch_flag'] = 1 - final_states['bat_flag']

# DUPLICATE HOME/AWAY
final_states_home = final_states.copy()
final_states_home.rename(columns={'team_score': 'home_score'}, inplace=True)
final_states_home['half_inning'] = np.where(final_states_home['bat_flag'] == 1, 'bottom', 'top')
final_states_home['effective_inning'] = final_states_home['inning']
final_states_home.set_index('effective_inning', inplace=True)

final_states_away = final_states.copy()
final_states_away.rename(columns={'team_score': 'away_score'}, inplace=True)
final_states_away['half_inning'] = np.where(final_states_away['bat_flag'] == 1, 'top', 'bottom')
final_states_away['effective_inning'] = np.where(final_states_away['pitch_flag'] == 1, final_states_away['inning'] - 1, final_states_away['inning'])
final_states_away.set_index('effective_inning', inplace=True)

# JOIN STATES
home_batting = final_states_home.loc[final_states_home['bat_flag'] == 1][['half_inning', 'inning', 'outs', 'runner_1b', 'runner_2b', 'runner_3b', 'home_score', 'runs_scored', 'probability']]
home_pitching = final_states_home.loc[final_states_home['pitch_flag'] == 1][['home_score', 'runs_scored', 'probability']]

away_batting = final_states_away.loc[final_states_away['bat_flag'] == 1][['half_inning', 'inning', 'outs', 'runner_1b', 'runner_2b', 'runner_3b', 'away_score', 'runs_scored', 'probability']]
away_pitching = final_states_away.loc[final_states_away['pitch_flag'] == 1][['away_score', 'runs_scored', 'probability']]

joined_top = home_pitching.join(away_batting, lsuffix='_home', rsuffix='_away').reset_index(drop=True)
joined_bottom = home_batting.join(away_pitching, lsuffix='_home', rsuffix='_away').reset_index(drop=True)
joined = pd.concat([joined_top, joined_bottom], axis=0)
joined = joined.loc[joined['inning'] < 10].copy()

We can then flag the winning team in each scenario (or mark if it would result in a tie) and multiply the two team likelihoods together to produce the scenario’s joint likelihood.

These values can be pivoted into a more readable (and database-compatible) format. To account for any rounding errors, we re-normalize the probabilities so that they sum to 1.

# PIVOT AND CLEAN
agg = pd.pivot_table(data=agg, index=['half_inning', 'inning', 'outs', 'runner_1b', 'runner_2b', 'runner_3b', 'home_score', 'away_score'], columns=['outcome'], values='joint_probability')
agg = agg / agg.values.sum(axis=1, keepdims=True)
agg.fillna(0, inplace=True)
agg.reset_index(inplace=True)

Similar to our informed RE demonstration, we can calculate informed Win Probability by using tailored transition matrices for each at-bat. Assuming we have already found these transition probabilities, all we need to do is to multiply the transition matrices sequentially (until reaching a sufficiently large number of steps) rather than raise the singular transition matrix to a large power.

However, for performance reasons, it will be more efficient for us to convert these transition matrices into scipy sparse arrays. These types of objects are designed to make mathematic operations faster when dealing with large matrices with many zero values. The difference in speed is more noticeable when doing sequential multiplication than raising the matrix to a power due to the significant increase in operations.

We can generate randomly perturbed transition matrices for each team and matchup by applying random scaling factors to our league-average transition matrix:

from scipy import sparse

######

# CONVERT TO SPARSE
transitions_sparse = sparse.csr_matrix(transitions.values)

# CREATE RANDOM TRANSITION MATRICES FOR EACH TEAM/MATCHUP
matchup_transitions_home = []
matchup_transitions_away = []

for i in range(100):
    # GET RANDOM PERTRUBATION COEFFICIENTS
    x_home = np.random.normal(loc=0, scale=1)
    x_home = np.exp(x_home)

    x_away = np.random.normal(loc=0, scale=1)
    x_away = np.exp(x_away)

    # SCALE TRANSFORMATION MATRICES
    transitions_home = transitions_sparse.power(x_home)
    transitions_away = transitions_sparse.power(x_away)

    transitions_home = transitions_home / transitions_home.sum(axis=1)
    transitions_away = transitions_away / transitions_away.sum(axis=1)

    # APPEND
    matchup_transitions_home.append(transitions_home)
    matchup_transitions_away.append(transitions_away)

Then, we can multiply the matchup-specific transition matrices sequentially to get the equivalent final_states DataFrames for each team:

# FINAL STATES AFTER N STEPS FOR EACH TEAM
n_steps = 100
final_states_home = sparse.csr_matrix(np.identity(transitions.values.shape[0]))
final_states_away = sparse.csr_matrix(np.identity(transitions.values.shape[0]))

for i in range(n_steps):
    final_states_home = final_states_home @ matchup_transitions_home[i]
    final_states_away = final_states_away @ matchup_transitions_away[i]

final_states_home = pd.DataFrame(data=final_states_home.toarray(), index=transitions.index, columns=transitions.columns)[absorbent_states]
final_states_away = pd.DataFrame(data=final_states_away.toarray(), index=transitions.index, columns=transitions.columns)[absorbent_states]

final_states_home.columns = [int(x.split('-')[-1]) for x in final_states_home.columns]
final_states_away.columns = [int(x.split('-')[-1]) for x in final_states_away.columns]

All subsequent logic from our uninformed calculations remains the same. However, since we now have separate run-scoring distributions for each team, we will need to modify the code slightly:

# MELT PIVOT AND RETAIN ONLY POSSIBLE STATES
final_states_home = pd.melt(final_states_home, ignore_index=False, var_name='runs_scored', value_name='probability').reset_index()
final_states_home = final_states_home.loc[final_states_home['probability'] > 0].copy()

final_states_away = pd.melt(final_states_away, ignore_index=False, var_name='runs_scored', value_name='probability').reset_index()
final_states_away = final_states_away.loc[final_states_away['probability'] > 0].copy()

final_states_home[['inning', 'outs', 'runner_1b', 'runner_2b', 'runner_3b', 'home_score']] = np.array(final_states_home['game_state'].apply(lambda x: np.array(x.split('-'))).to_list()).astype(int)
final_states_away[['inning', 'outs', 'runner_1b', 'runner_2b', 'runner_3b', 'away_score']] = np.array(final_states_away['game_state'].apply(lambda x: np.array(x.split('-'))).to_list()).astype(int)

# SPLIT BATTING AND PITCHING STATES FOR EACH TEAM
final_states_home_batting = final_states_home.copy()
final_states_home_batting['half_inning'] = 'bottom'
final_states_home_batting['effective_inning'] = final_states_home_batting['inning']

final_states_away_batting = final_states_away.copy()
final_states_away_batting['half_inning'] = 'top'
final_states_away_batting['effective_inning'] = final_states_away_batting['inning']

final_states_home_pitching = final_states_home.loc[final_states_home['outs'] + final_states_home['runner_1b'] + final_states_home['runner_2b'] + final_states_home['runner_3b'] == 0].copy()
final_states_home_pitching['half_inning'] = 'top'
final_states_home_pitching['effective_inning'] = final_states_home_pitching['inning']

final_states_away_pitching = final_states_away.loc[final_states_away['outs'] + final_states_away['runner_1b'] + final_states_away['runner_2b'] + final_states_away['runner_3b'] == 0].copy()
final_states_away_pitching['half_inning'] = 'bottom'
final_states_away_pitching['effective_inning'] = final_states_away_pitching['inning'] - 1

# JOIN STATES
final_states_home_batting = final_states_home_batting.set_index('effective_inning')[['half_inning', 'inning', 'outs', 'runner_1b', 'runner_2b', 'runner_3b', 'home_score', 'runs_scored', 'probability']]
final_states_home_pitching = final_states_home_pitching.set_index('effective_inning')[['home_score', 'runs_scored', 'probability']]

final_states_away_batting = final_states_away_batting.set_index('effective_inning')[['half_inning', 'inning', 'outs', 'runner_1b', 'runner_2b', 'runner_3b', 'away_score', 'runs_scored', 'probability']]
final_states_away_pitching = final_states_away_pitching.set_index('effective_inning')[['away_score', 'runs_scored', 'probability']]

joined_top = final_states_home_pitching.join(final_states_away_batting, lsuffix='_home', rsuffix='_away').reset_index(drop=True)
joined_bottom = final_states_home_batting.join(final_states_away_pitching, lsuffix='_home', rsuffix='_away').reset_index(drop=True)

joined = pd.concat([joined_top, joined_bottom], axis=0)
joined = joined.loc[joined['inning'] < 10].copy()

We can then use the same post-join code to calculate aggregate Win Probability for each team.

Actually computing these matchup-specific transition matrices will be more complicated than in our single-inning case. While we might have made the assumption that the current pitcher would remain for the entire inning, we can be almost certain that there will be pitching changes throughout a full 9-inning game. Nevertheless, we do have information regarding the inning, outs, runners, and number of times through the order.

We might first seek to predict how deep the starting pitcher is likely to go in the game as a function of inning, times through the order, and runs allowed. This would allow us to denote for matchup and game state whether we expect to see the starter or a reliever. Then, we might decide to consider the entire bullpen as one aggregate pitching entity. Since we would know both “players,” we could follow similar methodology as our calculations of informed RE.

MLB’s Substack

Discussion about this post