Phoenix Suns vs Oklahoma City Thunder Matchup Prediction with Logistic Regression

In this post, I will review some stats from a sportsbook and build a logistic regression classifier to estimate the probability of a basketball team winning a match. For our experiment, we will use the following metadata collected at 8 am CST from a sportsbook page. We will assume that the data on this page is trustworthy and correct. Also, without any means, I suggest you use this exact code to bet, but the math behind it can help build a stronger classifier if more data is provided. For now, we will create a binary classifier that targets the probability that Phoenix (pho) will win over Oklahoma (okc).

Fig1. Sportsbook Data

I have translated this data into a JSON wen can use to load the needed stats for our model. Let's use the following definition to hold out the data:

1 - Load the data into a JSON Object

pho_vs_okc = {
  "matchup": {
    "date": "2025-12-10",
    "time_est": "7:30 PM",
    "venue": "Paycom Center, Oklahoma City, OK, USA"
  },
  "betting_consensus": {
    "spread": {
      "phoenix": "+14.5",
      "oklahoma_city": "-14.5",
      "consensus": {
        "phoenix_pct": 61,
        "oklahoma_city_pct": 39
      }
    },
    "total": {
      "over_pct": 69,
      "under_pct": 31
    },
    "moneyline": {
      "phoenix": '',
      "oklahoma_city": ''
    }
  },
  "offensive_team_records": {
    "phoenix_suns": {
      "points_per_game": 115.88,
      "points_against_per_game": 113.46,
      "first_half_points": 57.00,
      "fg_pct": 46.74,
      "ft_pct": 79.03,
      "three_pt_pct": 36.76,
      "off_rebounds": 8.96,
      "fg_made": 42.17,
      "ft_made": 16.96,
      "three_pt_made": 14.58
    },
    "oklahoma_city_thunder": {
      "points_per_game": 123.04,
      "points_against_per_game": 106.88,
      "first_half_points": 60.38,
      "fg_pct": 49.72,
      "ft_pct": 82.98,
      "three_pt_pct": 37.37,
      "off_rebounds": 12.83,
      "fg_made": 44.13,
      "ft_made": 20.92,
      "three_pt_made": 13.88
    }
  },
  "defensive_opponent_stats": {
    "phoenix_suns": {
      "opp_points_per_game": 113.46,
      "opp_rebounds": 11.17,
      "opp_fg_pct": 47.61,
      "opp_ft_pct": 81.00,
      "opp_three_pt_pct": 35.87
    },
    "oklahoma_city_thunder": {
      "opp_points_per_game": 106.88,
      "opp_rebounds": 11.04,
      "opp_fg_pct": 42.80,
      "opp_ft_pct": 76.20,
      "opp_three_pt_pct": 36.93
    }
  },
  "best_available_odds": {
    "spread": {
      "phoenix": "+14.5 -110",
      "oklahoma_city": "-14.5 -105"
    },
    "total": {
      "over": "225.5 -110",
      "under": "225.5 -110"
    },
    "moneyline": {
      "phoenix": "+675",
      "oklahoma_city": "-1000"
    }
  }
}

From the json object we will be using a limited set of attributes for our model:

points_per_game (ppg): defines how many points per game the team usually achieves.
points_against_per_game (papg): average of points that the opposite team scores. This is a measure of how good their defense is.
fg_pct (fgp): Field goal percentage is a term used to estimate how many points a team does at any part of the court (like 1 pts and 3 pts shots combined)
ft_pct (ftp): This is similar to fg_pct , but only on free throws.
three_pt_pct (tpp): This is the three-point percentage per game.
off_rebounds (ofr): Offensive rebounds; this indicates an increase in possession. A team with better possession wins the game?
odds_feature (oft): this is a new feature to be created as a difference of the Market probability between Pho and Okc with the vig removed (we will discuss this later in this post).
home_feature (hmf): -1 if pho is the away team.

Lets load all the raw data into some team-named variables:

2 - Load initial team-named variables

import math

# extract stats

pho = pho_vs_okc["offensive_team_records"]["phoenix_suns"]
okc = pho_vs_okc["offensive_team_records"]["oklahoma_city_thunder"]

pho_def = pho_vs_okc["defensive_opponent_stats"]["phoenix_suns"]
okc_def = pho_vs_okc["defensive_opponent_stats"]["oklahoma_city_thunder"]

odds = pho_vs_okc["best_available_odds"]["moneyline"]

3 - Baseline Logistic Regression Model

$$\begin{aligned} \text{Let } & \mathbf{x} = \big( \mathrm{ppg},\ \mathrm{papg},\ \mathrm{fgp},\ \mathrm{ftp},\ \mathrm{tpp},\ \mathrm{ofr},\ \mathrm{oft},\ \mathrm{hmf} \big) \\[6pt] \text{and } & \mathbf{w} = \big( w_{\mathrm{ppg}},\ w_{\mathrm{papg}},\ w_{\mathrm{fgp}},\ w_{\mathrm{ftp}},\ w_{\mathrm{tpp}},\ w_{\mathrm{ofr}},\ w_{\mathrm{oft}},\ w_{\mathrm{hmf}} \big), \ \text{with intercept } w_{0}. \end{aligned}$$

$$z(\mathbf{x}) = w_{0} + w_{\mathrm{ppg}} \cdot \mathrm{ppg} + w_{\mathrm{papg}} \cdot \mathrm{papg} + w_{\mathrm{fgp}} \cdot \mathrm{fgp} + w_{\mathrm{ftp}} \cdot \mathrm{ftp} + w_{\mathrm{tpp}} \cdot \mathrm{tpp} + w_{\mathrm{ofr}} \cdot \mathrm{ofr} + w_{\mathrm{oft}} \cdot \mathrm{oft} + w_{\mathrm{hmf}} \cdot \mathrm{hmf}.$$

$$P(\text{Pho wins} \mid \mathbf{x}) = \sigma\!\left( z(\mathbf{x}) \right) = \frac{1}{1 + e^{-z(\mathbf{x})}}.$$

$$\hat{y} = \begin{cases} 1, & \text{if } P(\text{Pho wins} \mid \mathbf{x}) \ge 0.5, \\[6pt] 0, & \text{otherwise}. \end{cases}$$

Our model is composed of two main components: a linear relationship between the defined variables in x and their weights in w so that $$z(x) = w_0 + xw$$ where w0 is the intercept, x are our variables from the JSON and w are weights calibrated manually. The second component is the logistic function $$\sigma(z(x))$$ which estimates the probability that PHO wins the game. This means that the probability that OKC wins the game is $$1 - \sigma(z(x))$$.

Why this might work?

Well, this model combines all the selected stats into one single score.

Think of the stats from the JSON — things like field-goal percentage, free-throw percentage, points per game, rebounds, turnovers, and so on. Each of these stats gives us a clue about how strong each team is.

The model takes these numbers, multiplies each one by a weight that tells us how much that stat matters, and adds everything together. The result is a single score:

A positive score means stats in favor of PHO, and a negative score means stats in favor of OKC

This score doesn’t mean “probability” yet — it’s just a raw measure of which team looks stronger on paper. To convert that raw score into something meaningful, we pass it through a “squashing” function. This function takes any number — whether extremely high or extremely low — and compresses it into a value between 0 and 1.

You can think of it like a dial:

If the score strongly favors PHO → the dial turns closer to 1.0
If the score is neutral → the dial settles around 0.5
If the score favors OKC → the dial moves closer to 0.0

That final value is: The estimated probability that PHO wins the game, because the logictic function is a one-vs all classifier.

4 - Convert Moneyline into market probabilities with no vig

This algorithm takes betting moneyline odds and turns them into clean probabilities of each team winning. It begins by converting each team’s American moneyline into an “implied probability.” Positive moneylines mean the team is an underdog, so the formula uses 100 divided by the moneyline plus 100. Negative moneylines mean the team is a favorite, so the formula uses the absolute value of the line divided by that value plus 100. This gives a raw probability for each team, but sportsbooks include extra margin called the vig, so these raw probabilities add up to more than one. To correct this, the algorithm adds both raw probabilities to find the total vig, then divides each team’s raw probability by that total. After this division, the probabilities are properly scaled so they add up to one and represent the true implied chance of each team winning without the sportsbook’s built-in margin.

The following code makes the convertion and creates the odds_feature as a difference between the clean probabilities between pho and okc with no vig. This is a feature that establishes what the sportbook believes is going to win, but we will penalize this score with a weight so it does not remove importance of other variables.

# Convert moneyline odds to Market-implied probabilities

def implied_prob_from_moneyline(ml):
    """Convert American moneyline to raw implied probability."""
    if ml > 0:  # e.g. +675
        return 100 / (ml + 100)
    else:       # e.g. -1000
        return abs(ml) / (abs(ml) + 100)

pho_raw = implied_prob_from_moneyline(int(odds["phoenix"]))
okc_raw = implied_prob_from_moneyline(int(odds["oklahoma_city"]))

# Normalize (remove vig)
vig = pho_raw + okc_raw
pho_prob = pho_raw / vig
okc_prob = okc_raw / vig

# Create the odds feature for the model
# Is negative if OKC is favored Is positive if Phoenix is favored
odds_feature = pho_prob - okc_prob

5 - Load all variables X (X-Delta)

Now we load our features as the difference between all selected stats for each team this is the first step toward creating the feature vector X

delta_points = pho["points_per_game"] - okc["points_per_game"]
delta_points_against = pho["points_against_per_game"] - okc["points_against_per_game"]
delta_fg_pct = pho["fg_pct"] - okc["fg_pct"]
delta_ft_pct = pho["ft_pct"] - okc["ft_pct"]
delta_3pt_pct = pho["three_pt_pct"] - okc["three_pt_pct"]
delta_off_reb = pho["off_rebounds"] - okc["off_rebounds"]
home_feature = -1  # Phoenix is the away team

6 - Define some cherry-picked weights W for each value of Xi

Its important to mention that logistic regression is an algorithm that when exposed to multiple rows of data it can automatically estimate the values of W as an optimization problem. In this case we only have two sets of data from the sportsbook, one line of stats for PHO and one for OKC, so optimization is this case is not an option. This means that we need to place some values for each value of w so that they makes sense. This process is usually done by an expert in the NBA, so because I am not, I used some LLM to help me estimate these parameters. This is the resultant expression:

# Define the model weights (hand-built / interpretable) 

weights = {
    "intercept": 0.0,
    "points_per_game": 0.12, # weight defined to control best predictor (points)
    "points_against_per_game": -0.10, # how good team defense is
    "fg_pct": 0.08, # field goald percentage / almost as important as ppg
    "ft_pct": 0.04, # free throw percentage
    "three_pt_pct": 0.03, # three points percentage
    "off_rebounds": 0.06, # offensive rebounds (increase posession)
    "odds_feature": 1.50, # delta between pho_prob - okc_prob with no vig
    "home_feature": -0.60 # tells if team A (in this case Pho) is home or not
}

In a logistic regression model, each weight tells you how strongly a feature pushes the prediction toward a win or a loss. The farther a weight is from zero, the more influence that feature has on the final probability.

When a weight is close to 1, it means that increasing that feature pushes the model strongly toward predicting a higher probability of winning. A positive weight acts like a “boost.” Every time that feature increases by one unit, the model becomes noticeably more confident that the team will win. A weight near 1 means the model treats that feature as very important in signaling success.

When a weight is close to –1, it means that increasing that feature pushes the model strongly toward predicting a lower probability of winning. A negative weight acts like a “penalty.” As that feature increases, the model becomes more confident the team will lose. A weight near –1 means the model believes that feature strongly signals weakness.

Weights near zero barely affect the prediction at all, because they don’t push the probability up or down in a meaningful way.

So in simple terms, values near 1 are strong positive influencers toward winning, values near –1 are strong negative influencers toward losing, and values near zero don’t matter much to the model.

7 - Mix all ingredients and get the Final Probability

# Logistic function to convert score to win probability

def logistic(x):
    return 1 / (1 + math.exp(-x))

pho_win_prob = logistic(score)
okc_win_prob = 1 - pho_win_prob

This simple logictic function then estimates the probabilities that either one or the other team will win.

pho_win_prob: 6.34%
okc_win_prob: 93.65%

So the model concluded that based out on the defined weights and the selected stats, OKC is the clear winner. But, did OKC won? Yes it did! look at the game results from CBS Sports

Oklahoma Thunder was the winner as predicted by the model.

8 - Was the model lucky?

We dont know for sure. There is no certainty this model works unless we keep executing this code on more games and see if in the long run, we are always right, or wrong.

Phoenix Suns vs Oklahoma City Thunder Matchup Prediction with Logistic Regression

1 - Load the data into a JSON Object

2 - Load initial team-named variables

3 - Baseline Logistic Regression Model

Why this might work?

4 - Convert Moneyline into market probabilities with no vig

5 - Load all variables X (X-Delta)

6 - Define some cherry-picked weights W for each value of Xi

7 - Mix all ingredients and get the Final Probability

8 - Was the model lucky?

Comments

More from this blog

Checkmate: How Chess Shaped Culture, Science, and Artificial Intelligence

Tensor vs Matrix: an example with computer vision

The Bias-Variance Trade-off Explained

Custom Annotation for Roboflow Pre-trained Models with CV2

Command Palette

1 - Load the data into a JSON Object

2 - Load initial team-named variables

3 - Baseline Logistic Regression Model

Why this might work?

4 - Convert Moneyline into market probabilities with no vig

5 - Load all variables X (X-Delta)

6 - Define some cherry-picked weights W for each value of Xi

7 - Mix all ingredients and get the Final Probability

8 - Was the model lucky?

Comments

More from this blog