Modeling Run Expectancy as a Markov Process
Developing a dynamic framework for predicting runs scored based on game state transition probabilities
Those familiar with the world of baseball statistics have probably heard of Run Expectancy (RE). The goal of RE is to measure how many runs we expect the average team to score given a current game state.
A game state is typically defined by the configuration of outs and runners. Since there are 3 possible out values and 8 possible runner configurations, we have 24 potential starting states that a game can be in at any given moment. For notation purposes, we will define the state as:
For a game state with two outs and a runner on second base, we would define the state to be 2-0-1-0. RE assigns a value to each state corresponding to the number of expected runs scored in the remainder of the inning.
Typically, this value is calculated by looking at every time a given state has been reached (over some representative historical sample) and taking the average number of runs scored throughout the rest of the inning. Then, to produce RE estimates tailored to a specific game or situation, users can introduce “adjustments” based on a wide range of factors such as the stadium, date, or lineup position.
Instead, we can take a different approach—one that offers significantly more flexibility. At the uninformed level, we would like the outcome probabilities for each at-bat to be dependent only on the game state. This is reflective of the possibility that our situation could involve any random combination of MLB players and environment. At the informed level, we would like to be able to directly modify the outcome probabilities for potential at-bats based on whichever factors we wish to include.
Both goals are conducive to representing RE as a Markov process—specifically, a Markov Reward Model with Absorption. Markov processes are used to model the likelihood of systems to transition from one state to another during a discrete time step. Crucially, the likelihood of state transitions are assumed to be independent of all prior transitions, meaning the probability of a system transitioning from state A to B will be the same regardless of how the system arrived at state A.
In our context, we are concerned with the likelihood of transitioning from each of our 24 game states to all other possible states during an at-bat. If we begin with no outs and no runners (0-0-0-0), there are only 5 states we can reach in one at-bat step. These are 1-0-0-0, 0-1-0-0, 0-0-1-0, 0-0-0-1, and 0-0-0-0, which correspond to an out, single, double, triple, and home run respectively. There is some non-zero probability associated with each of these possible state transitions, and zero probability associated with all other state transitions. Since it is possible to transition from all 24 game states to at least one other state, we refer to these states as transient.
However, those paying attention should have noticed that for certain game states, it is possible to transition to a state not included in the 24 transient states. For example, if we have two outs and bases empty (2-0-0-0), it is possible that an out is recorded and the inning ends. Once we have transitioned into an inning-ending state, there is no way to return to any of our 24 game states. Simply put, once the inning is over it’s over. From a Markov perspective, these inning-ending states are called absorbing states. Once a system enters an absorbing state there is only one possible transition: to itself.
Take a moment to notice that for all transient state changes, there is defined number of runs that must be scored in accordance to baseball logic. Since each runner (including the batter) must either make an out, reach base, or score, the number of runs scored in one step is given by:
For example, if we transition from state 0-1-1-1 to state 0-0-0-0, we calculate:
This, of course, corresponds to a no-out grand slam.
This relationship is not the same for absorbing state changes. If we transition from state 2-1-1-1 (two outs and bases loaded) to the inning end, it would be possible to score 0, 1, 2, or 3 runs before making the third out. To address this, we create separate absorbing states corresponding to each number of potential runs scored. For notation, we will name these states 3-0, 3-1, 3-2, and 3-3 respectively.
We now have 28 game states (24 transient and 4 absorbing), and a defined number of runs scored for each transition. The next step is to estimate the likelihood of each state transition. For uninformed RE, we can just count the number of times each transition has occurred over some historical period and convert these counts to percentages. For informed RE, we can model transition probabilities based on whatever factors we desire. We can even define different probabilities based on the batter/pitcher matchup.
We will start by demonstrating the methodology for calculating uninformed RE before extending to a framework for estimating informed RE. Full Python code (and accompanying CSV files) can be found in our GitHub repository.
The first step is to load historical MLB transition counts from 2021-2024 (found here) into a Pandas DataFrame. We take several steps to ensure that all state transitions will have at least one entry (even if the count is 0) and that all absorbing states will transition to themselves.
import pandas as pd
import numpy as np
######
# READ TRANSITION COUNTS FILE
df = pd.read_csv('mlb_transition_counts.csv')
# ADD ZERO ROWS TO ENSURE SQUARE MATRICES
all_states = list(set(list(df['game_state'].unique()) + list(df['next_state'].unique())))
zero_rows = [{'game_state': s1, 'next_state': s2, 'runs_scored': r, 'transition_count': 0} for s1 in all_states for s2 in all_states for r in range(0, 5)]
df = pd.concat([df, pd.DataFrame(zero_rows)], axis=0)
# ADD ABSORBENT STATE TRANSITIONS (IDENTITY)
absorbent_states = [f'3-{i}' for i in range(0, 4)]
absorbent_rows = [{'game_state': s, 'next_state': s, 'runs_scored': 0, 'transition_count': 1} for s in absorbent_states]
df = pd.concat([df, pd.DataFrame(absorbent_rows)], axis=0)
Then, we create two 28x28 matrices containing the transition probabilities and rewards (runs scored) associated with each state transition. Each matrix entry contains the likelihood and corresponding reward associated with transitioning from state i (given by the row index) to state j (given by the column index).
# TRANSITION MATRIX
transitions = pd.pivot_table(data=df, values='transition_count', index='game_state', columns='next_state', fill_value=0, aggfunc='sum')
transitions = transitions / transitions.values.sum(axis=1, keepdims=True)
transitions.fillna(0, inplace=True)
# REWARD MATRIX
rewards = pd.pivot_table(data=df.loc[df['transition_count'] > 0], values='runs_scored', index='game_state', columns='next_state', fill_value=0)
One of the core properties of Markov transition matrices is that we can find the probability of transitioning from state i to j in exactly n time steps as:
Since we have the number of runs scored during each transition, we can use this property to calculate the number of expected runs from each given start state during each at-bat step. Since all systems will eventually transition to absorbing states, we can sum the expected runs scored for the first 100 at-bat steps from each given start state.
# RUNS EXPECTED AFTER N STEPS
n_steps = 100
runs_expected = np.hstack([(np.linalg.matrix_power(transitions.values, i - 1) @ (transitions.values * rewards.values)).sum(axis=1).reshape(-1, 1) for i in range(1, n_steps + 1)]).sum(axis=1)
runs_expected = pd.DataFrame(data=runs_expected, index=transitions.index, columns=['runs_expected'])
This produces uninformed RE values for each initial game state. These values correspond to the expected number of runs scored given we know nothing about the game environment, inning, score, batters, or pitcher. What if we instead wish to estimate informed RE using some of the known factors?
One of the main advantages of the Markov framework is that we are able to represent RE as a function of the underlying transition matrix (since the reward matrix is defined by baseball logic). If we want to model RE given that the game played at Coors Field—an unique environment due to it’s high altitude and deep fences—we might choose to simply replace the MLB-wide transition counts with Coors Field-specific counts (found here). Unsurprisingly, this produces higher RE estimates than our uninformed values.
# READ TRANSITION COUNTS FILE
df = pd.read_csv('coors_field_transition_counts.csv')
# ADD ZERO ROWS TO ENSURE SQUARE MATRICES
all_states = list(set(list(df['game_state'].unique()) + list(df['next_state'].unique())))
zero_rows = [{'game_state': s1, 'next_state': s2, 'runs_scored': r, 'transition_count': 0} for s1 in all_states for s2 in all_states for r in range(0, 5)]
df = pd.concat([df, pd.DataFrame(zero_rows)], axis=0)
# ADD ABSORBENT STATE TRANSITIONS (IDENTITY)
absorbent_states = [f'3-{i}' for i in range(0, 4)]
absorbent_rows = [{'game_state': s, 'next_state': s, 'runs_scored': 0, 'transition_count': 1} for s in absorbent_states]
df = pd.concat([df, pd.DataFrame(absorbent_rows)], axis=0)
# TRANSITION MATRIX
transitions = pd.pivot_table(data=df, values='transition_count', index='game_state', columns='next_state', fill_value=0, aggfunc='sum')
transitions = transitions / transitions.values.sum(axis=1, keepdims=True)
transitions.fillna(0, inplace=True)
# REWARD MATRIX
rewards = pd.pivot_table(data=df.loc[df['transition_count'] > 0], values='runs_scored', index='game_state', columns='next_state', fill_value=0)
# RUNS EXPECTED AFTER N STEPS
n_steps = 100
runs_expected = np.hstack([(np.linalg.matrix_power(transitions.values, i - 1) @ (transitions.values * rewards.values)).sum(axis=1).reshape(-1, 1) for i in range(1, n_steps + 1)]).sum(axis=1)
runs_expected = pd.DataFrame(data=runs_expected, index=transitions.index, columns=['runs_expected'])
Suppose instead that we would like to use information about each batter/pitcher matchup. In accounting for location, we assumed that the effects of Coors Field would be felt equally throughout the game (meaning the transition matrix would be constant for all at-bat steps). Now, we are interested in using different transition probabilities for each at-bat step.
Before diving into approaches for estimating these transition probabilities, we will start by developing the methodology for calculating RE given that we have already found appropriate matrices for each matchup. As a placeholder we will create 9 modified version of the league-average transition matrix (using the MLB-wide transition counts), each applying different random scaling factors to all non-zero probabilities.
# CREATE RANDOM TRANSITION MATRICES FOR EACH MATCHUP
matchup_transitions = []
for i in range(9):
# GET RANDOM PERTURBATION COEFFICIENTS
x = np.random.normal(loc=0, scale=1, size=transitions.values.shape)
x = np.exp(x)
# SCALE TRANSFORMATION MATRIX
transitions_i = np.power(transitions.values, x)
transitions_i = transitions_i / transitions_i.sum(axis=1, keepdims=True)
# APPEND
matchup_transitions.append(transitions_i)
We now need to decompose our previous RE calculation to understand how we might adapt it to work with different matrices at each at-bat step.
The first component corresponds to the likelihood of the system transitioning from each initial game state to all other states in exactly i-1 steps. Since the ending state at step i-1 is the starting state at step i, this is the likelihood of beginning the step i at-bat in each state given the initial game state. In the case where i=1 (the first step) this is simply the identity matrix, since the game states at the start of the at-bat are equal to the initial game states. For all subsequent steps, we find this component by sequentially multiplying each prior step’s transition matrix.
The second component (T*R) represents the expected runs scored during the step-i at-bat for every possible single-step transition. The rewards matrix R will remain constant, since the number of runs scored is fixed for each transition. We then simply need to replace T with the transition probabilities for our desired at-bat.
Given our set of 9 transition matrices, we can represent RE with this modified approach:
# RUNS EXPECTED AFTER N STEPS
n_steps = 100
prev_transitions = np.identity(transitions.values.shape[0])
runs_expected = []
for i in range(n_steps):
# SELECT MATRICES
transitions_i = matchup_transitions[i % 9]
# GET EXPECTED RUNS
runs_expected_i = (prev_transitions @ (transitions_i * rewards.values)).sum(axis=1).reshape(-1, 1)
runs_expected.append(runs_expected_i)
# UPDATE PREV TRANSITIONS
prev_transitions = np.matmul(prev_transitions, transitions_i)
# CONSOLIDATE RUNS EXPECTED (SUM STEPS)
runs_expected = np.hstack(runs_expected).sum(axis=1)
runs_expected = pd.DataFrame(data=runs_expected, index=transitions.index, columns=['runs_expected'])
If you go back to the code block that creates our random transition matrices, you will notice that we can modify the magnitude of perturbances by decreasing the scale parameter in the random generator. As we decrease this parameter, we expect our calculated RE values to converge to the previous uninformed values (since all steps will have the same league-average transition probabilities). We can use this to confirm that our RE calculation works as expected.
Now that we have the framework in place for calculating informed RE given we can produce transition matrices customized to each matchup, how do we go about actually estimating these values?
Using years of historical data provides stable league-average transition probabilities, but when considering individual player performance our sample sizes are much smaller (especially if we only want to consider current-year data). This is compounded by the addition of more factors. If we want transition probabilities to be specific to the stadium, batter, pitcher, inning, and current season, we might not have a single historical example that represents a perfect comparison.
There are a variety of straightforward approaches we can take that allow us to include as many known game factors as desired. We could take the weighted average of independently-calculated transition probabilities for each factor, based on sample size or perceived importance. Alternatively, we might use a Bayesian approach and model the transitions from each state using Dirichlet distributions based on the number of observed counts for each transition.
However, as the number and complexity of factors increases, we will likely want to instead train calibrated classification models to predict the probability of each possible transition. This gives us additional flexibility to include different types of factors and to incorporate non-linear interaction effects.