This page shows how to combine NBA play by play data with SportVu data. The play by play dramatically increases the usefulness of the SportVu data by allowing the identification of plays that are misses and makes as well as the type of shot, e.g., layup or dunk. I have also posted my earlier markdown on exploring the SportVu data.


First getting the sportVU data

To read the sportvu data, first download the _functions.R file in my github repository for this project.

library(RCurl)
## Loading required package: bitops
library(jsonlite)
library(dplyr)
## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
source("_functions.R")

The sportvu_convert_json function takes the sportvu json file and converts it into a data frame. For this game, the function takes about 3 minutes to convert the file. The resulting data frame is about 2.6 million observations by 13 variables.

all.movements <- sportvu_convert_json("data/0021500431.json")
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
str(all.movements)
## 'data.frame':    2646562 obs. of  13 variables:
##  $ player_id : chr  "2225" "2225" "-1" "-1" ...
##  $ lastname  : chr  "Parker" "Parker" "ball" "ball" ...
##  $ firstname : chr  "Tony" "Tony" NA NA ...
##  $ jersey    : chr  "9" "9" NA NA ...
##  $ position  : chr  "G" "G" NA NA ...
##  $ team_id   : num  1.61e+09 1.61e+09 NA NA 1.61e+09 ...
##  $ x_loc     : num  51.7 51.7 52.9 52.9 60.4 ...
##  $ y_loc     : num  40.3 40.3 39.9 39.9 31.8 ...
##  $ radius    : num  0 0 2.5 2.5 0 ...
##  $ game_clock: num  716 716 716 716 716 ...
##  $ shot_clock: num  13.3 13.3 13.3 13.3 13.3 ...
##  $ quarter   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ event.id  : num  2 1 1 2 2 1 2 1 2 1 ...

View the Play by Play data

gameid = "0021500431"
pbp <- get_pbp(gameid) #From the .functions file
head(pbp)
##      GAME_ID EVENTNUM EVENTMSGTYPE EVENTMSGACTIONTYPE PERIOD WCTIMESTRING
## 1 0021500431        0           12                  0      1      8:11 PM
## 2 0021500431        1           10                  0      1      8:11 PM
## 3 0021500431        2            5                 45      1      8:11 PM
## 4 0021500431        3            2                  5      1      8:12 PM
## 5 0021500431        4            4                  0      1      8:12 PM
## 6 0021500431        5            5                 45      1      8:12 PM
##   PCTIMESTRING                          HOMEDESCRIPTION NEUTRALDESCRIPTION
## 1        12:00                                     <NA>               <NA>
## 2        12:00 Jump Ball Towns vs. Duncan: Tip to Green               <NA>
## 3        11:43                                     <NA>               <NA>
## 4        11:29                    MISS Wiggins 2' Layup               <NA>
## 5        11:28                                     <NA>               <NA>
## 6        11:27                                     <NA>               <NA>
##                                           VISITORDESCRIPTION SCORE
## 1                                                       <NA>  <NA>
## 2                                                       <NA>  <NA>
## 3  Parker Out of Bounds - Bad Pass Turnover Turnover (P1.T1)  <NA>
## 4                                      Leonard BLOCK (1 BLK)  <NA>
## 5                              Leonard REBOUND (Off:0 Def:1)  <NA>
## 6 Leonard Out of Bounds - Bad Pass Turnover Turnover (P1.T2)  <NA>
##   SCOREMARGIN PERSON1TYPE PLAYER1_ID       PLAYER1_NAME PLAYER1_TEAM_ID
## 1        <NA>           0          0               <NA>            <NA>
## 2        <NA>           4    1626157 Karl-Anthony Towns      1610612750
## 3        <NA>           5       2225        Tony Parker      1610612759
## 4        <NA>           4     203952     Andrew Wiggins      1610612750
## 5        <NA>           5     202695      Kawhi Leonard      1610612759
## 6        <NA>           5     202695      Kawhi Leonard      1610612759
##   PLAYER1_TEAM_CITY PLAYER1_TEAM_NICKNAME PLAYER1_TEAM_ABBREVIATION
## 1              <NA>                  <NA>                      <NA>
## 2         Minnesota          Timberwolves                       MIN
## 3       San Antonio                 Spurs                       SAS
## 4         Minnesota          Timberwolves                       MIN
## 5       San Antonio                 Spurs                       SAS
## 6       San Antonio                 Spurs                       SAS
##   PERSON2TYPE PLAYER2_ID PLAYER2_NAME PLAYER2_TEAM_ID PLAYER2_TEAM_CITY
## 1           0          0         <NA>            <NA>              <NA>
## 2           5       1495   Tim Duncan      1610612759       San Antonio
## 3           0          0         <NA>            <NA>              <NA>
## 4           0          0         <NA>            <NA>              <NA>
## 5           0          0         <NA>            <NA>              <NA>
## 6           0          0         <NA>            <NA>              <NA>
##   PLAYER2_TEAM_NICKNAME PLAYER2_TEAM_ABBREVIATION PERSON3TYPE PLAYER3_ID
## 1                  <NA>                      <NA>           0          0
## 2                 Spurs                       SAS           5     201980
## 3                  <NA>                      <NA>           0          0
## 4                  <NA>                      <NA>           5     202695
## 5                  <NA>                      <NA>           0          0
## 6                  <NA>                      <NA>           0          0
##    PLAYER3_NAME PLAYER3_TEAM_ID PLAYER3_TEAM_CITY PLAYER3_TEAM_NICKNAME
## 1          <NA>            <NA>              <NA>                  <NA>
## 2   Danny Green      1610612759       San Antonio                 Spurs
## 3          <NA>            <NA>              <NA>                  <NA>
## 4 Kawhi Leonard      1610612759       San Antonio                 Spurs
## 5          <NA>            <NA>              <NA>                  <NA>
## 6          <NA>            <NA>              <NA>                  <NA>
##   PLAYER3_TEAM_ABBREVIATION
## 1                      <NA>
## 2                       SAS
## 3                      <NA>
## 4                       SAS
## 5                      <NA>
## 6                      <NA>

Join the Play by Play data on shots to SportVu data

Joining the data is pretty simple, because both the play by play data and SportVu use common event IDs. The only issue I have found is the the SportVu data may contain more event IDs (such as the ball going out of bounds), that are not found in the play by play data.

pbp <- pbp[-1,]
colnames(pbp)[2] <- c('event.id')
#Trying to limit the fiels to join to keep the overall size manageable
pbp <- pbp %>% select (event.id,EVENTMSGTYPE,EVENTMSGACTIONTYPE,SCORE)
pbp$event.id <- as.numeric(levels(pbp$event.id))[pbp$event.id]
all.movements <- merge(x = all.movements, y = pbp, by = "event.id", all.x = TRUE)

Lets take a look at what it adds

Extract all data for event ID 303

id303 <- all.movements[which(all.movements$event.id == 303),]
head(id303)
##         event.id player_id lastname firstname jersey position    team_id
## 1644741      303        -1     ball      <NA>   <NA>     <NA>         NA
## 1644742      303    203937 Anderson      Kyle      1        F 1610612759
## 1644743      303    201937    Rubio     Ricky      9        G 1610612750
## 1644744      303    201988    Mills     Patty      8        G 1610612759
## 1644745      303    203952  Wiggins    Andrew     22      G-F 1610612750
## 1644746      303    203937 Anderson      Kyle      1        F 1610612759
##            x_loc    y_loc   radius game_clock shot_clock quarter
## 1644741  5.43835 24.73073 10.63683     359.75       5.49       3
## 1644742 65.31054 22.12468  0.00000     346.42      19.03       3
## 1644743 46.60167 20.00475  0.00000     376.60      22.70       3
## 1644744 38.77574 21.41917  0.00000     359.40      23.71       3
## 1644745 11.18441 34.04307  0.00000     359.40      23.69       3
## 1644746  8.62043  2.05544  0.00000     364.39       7.11       3
##         EVENTMSGTYPE EVENTMSGACTIONTYPE   SCORE
## 1644741            1                 98 67 - 48
## 1644742            1                 98 67 - 48
## 1644743            1                 98 67 - 48
## 1644744            1                 98 67 - 48
## 1644745            1                 98 67 - 48
## 1644746            1                 98 67 - 48

The key here is to look at the EVENTMSGTYPE and EVENTMSGACTIONTYPE These fields contain information about the play as well as what happened on the play. I do not have definitive guide to these fields, but here is a starting point:

EVENTMSGTYPE

1 - Make 2 - Miss 3 - Free Throw 4 - Rebound 5 - out of bounds / Turnover / Steal 6 - Personal Foul 7 - Violation 8 - Substitution 9 - Timeout 10 - Jumpball 12 - Start Q1? 13 - Start Q2?

EVENTMSGACTIONTYPE

1 - Jumpshot 2 - Lost ball Turnover 3 - ? 4 - Traveling Turnover / Off Foul 5 - Layup 7 - Dunk 10 - Free throw 1-1 11 - Free throw 1-2 12 - Free throw 2-2 40 - out of bounds 41 - Block/Steal 42 - Driving Layup 50 - Running Dunk 52 - Alley Oop Dunk 55 - Hook Shot 57 - Driving Hook Shot 58 - Turnaround hook shot 66 - Jump Bank Shot 71 - Finger Roll Layup 72 - Putback Layup 108 - Cutting Dunk Shot


Comparing player distance for misses, makes, and rebounds

Just to show the power of the play by play data, lets compare how far Ginobili travels on misses, makes, and rebounds.

ginobili_make <- all.movements[which(all.movements$lastname == "Ginobili" & all.movements$EVENTMSGTYPE == 1),]
ginobili_miss <- all.movements[which(all.movements$lastname == "Ginobili" & all.movements$EVENTMSGTYPE == 2),]
ginobili_rebound <- all.movements[which(all.movements$lastname == "Ginobili" & all.movements$EVENTMSGTYPE == 4),]
#Makes
travelDist(ginobili_make$x_loc, ginobili_make$y_loc)
## [1] 621.9733
#Misses
travelDist(ginobili_miss$x_loc, ginobili_miss$y_loc)
## [1] 311.2476
#Rebounds
travelDist(ginobili_rebound$x_loc, ginobili_rebound$y_loc)
## [1] 361.7619

There are lots of explanation for these numbers, but this should give you an idea of the power of the play by play.


Comparing player distance on layups

Lets look at what players run the farthest on plays where there is a layup.

player_layup <- all.movements[which(all.movements$EVENTMSGACTIONTYPE == 5),]
player.groups <- group_by(player_layup, lastname)
dist.traveled.players <- summarise(player.groups, totalDist=travelDist(x_loc, y_loc),playerid = max(player_id))
arrange(dist.traveled.players, desc(totalDist))
## Source: local data frame [25 x 3]
## 
##    lastname totalDist playerid
##       (chr)     (dbl)    (chr)
## 1      ball  211.7860       -1
## 2  Aldridge  193.2782   200746
## 3    Duncan  188.0446     1495
## 4     Dieng  163.8324   203476
## 5   Leonard  161.1590   202695
## 6     Jones  144.9321  1626145
## 7    LaVine  141.2286   203897
## 8     Towns  138.0612  1626157
## 9   Wiggins  132.3646   203952
## 10 Anderson  131.2474   203937
## ..      ...       ...      ...

Lets compare this to the list of players that run the farthest when a layup is made.

player_layup <- all.movements[which(all.movements$EVENTMSGACTIONTYPE == 5 & all.movements$EVENTMSGTYPE == 1),]
player.groups <- group_by(player_layup, lastname)
dist.traveled.players <- summarise(player.groups, totalDist=travelDist(x_loc, y_loc),playerid = max(player_id))
arrange(dist.traveled.players, desc(totalDist))
## Source: local data frame [23 x 3]
## 
##    lastname totalDist playerid
##       (chr)     (dbl)    (chr)
## 1      ball 125.30158       -1
## 2     Jones 110.31559  1626145
## 3     Dieng 106.61730   203476
## 4    LaVine 103.23012   203897
## 5  Muhammad  86.74623   203498
## 6    Duncan  83.20876     1495
## 7   Leonard  77.70864   202695
## 8      West  76.24240     2561
## 9    Parker  73.74633     2225
## 10 Ginobili  71.06211     1938
## ..      ...       ...      ...

You can see that the list changes, because not every layup results in a made basket. These examples illustrate the power of using the play by play data.


Credits

I hope this helps people combine the SportVu data with the play by play data. I had some great help figuring all of this out. I need to credit Justin, Darrly Blackport, and Grant Fiddyment.

For more of my explorations on the NBA data you can see my NBA Github repo. You can find more information about me, Rajiv Shah or my other projects or find me on Twitter.