This is a simple walkthrough for getting started with the NBA SportVu movement data. The goal is to show how to parse a game file and perform basic EDA.

This post is inspired by the work of Savvast Tjortjoglou and Tanya Cashorali. I wanted to extend their work to analyzing an entire game.


For the play, I choose the NBA’s top rated play on December 23rd, 2015. It is a game between San Antonio and Minnesota and the play occurs with 6 minutes left in the third quarter.


Download the data

Neil Johnson has taken the time to compile the movement data for NBA games at his github reposistory.

To get this game, you will need to download the file.

wget https://github.com/neilmj/BasketballData/blob/master/2016.NBA.Raw.SportVU.Game.Logs/12.23.2015.SAS.at.MIN.7z?raw=true

Unzip this file and you should end up with a file named: 0021500431.json


Reading the data into R

To read this file, first download the _functions.R file in my github repository for this project.

library(RCurl)
## Loading required package: bitops
library(jsonlite)
## 
## Attaching package: 'jsonlite'
## 
## The following object is masked from 'package:utils':
## 
##     View
library(dplyr)
## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(plotly)
## Loading required package: ggplot2
## 
## Attaching package: 'plotly'
## 
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## 
## The following object is masked from 'package:graphics':
## 
##     layout
source("_functions.R")

The sportvu_convert_json function takes the json file and converts it into a data frame. For this game, the function takes about 3 minutes to convert the file. The resulting data frame is about 2.6 million observations by 13 variables.

all.movements <- sportvu_convert_json("data/0021500431.json")
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
str(all.movements)
## 'data.frame':    2646562 obs. of  13 variables:
##  $ player_id : chr  "2225" "2225" "-1" "-1" ...
##  $ lastname  : chr  "Parker" "Parker" "ball" "ball" ...
##  $ firstname : chr  "Tony" "Tony" NA NA ...
##  $ jersey    : chr  "9" "9" NA NA ...
##  $ position  : chr  "G" "G" NA NA ...
##  $ team_id   : num  1.61e+09 1.61e+09 NA NA 1.61e+09 ...
##  $ x_loc     : num  51.7 51.7 52.9 52.9 60.4 ...
##  $ y_loc     : num  40.3 40.3 39.9 39.9 31.8 ...
##  $ radius    : num  0 0 2.5 2.5 0 ...
##  $ game_clock: num  716 716 716 716 716 ...
##  $ shot_clock: num  13.3 13.3 13.3 13.3 13.3 ...
##  $ quarter   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ event.id  : num  2 1 1 2 2 1 2 1 2 1 ...

Finding a specific play

The specific play we are interested in has the event ID #303. The NBA site has both the video and movement data available. The movement data shows the play:


Extract movement for one player

The sportvu data provides movement data for every player and the ball. As an example, lets look at the movement of Ginobili for this play.

##Extract all data for event ID 303
id303 <- all.movements[which(all.movements$event.id == 303),]
##Extract all data for Ginobili on event ID #303
ginobili <- all.movements[which(all.movements$lastname == "Ginobili" & all.movements$event.id == 303),]

This data can be visualized to show how Ginobili moves around the court. The colors represent three different time ranges of movement. The y axis is the length of the court. An NBA court is 94 feet by 50 feet. (Savvast Tjortjoglou takes the time to plot this on a basketball court background image.)

p <- plot_ly(data = ginobili, x = x_loc, y = y_loc, mode = "markers", color=cut(ginobili$game_clock, breaks=3)) %>% 
    layout(xaxis = list(range = c(0, 100)), 
           yaxis = list(range = c(0, 50))) 
p


Get distance travelled for one player

I have a simple function to get the distance a player travels:

travelDist(ginobili$x_loc, ginobili$y_loc)
## [1] 283.4333

Get speed of a player

Building off the distance, it is possible to calculate the speed of a player.

seconds = max(ginobili$game_clock) - min(ginobili$game_clock)
speed = travelDist(ginobili$x_loc, ginobili$y_loc)/seconds  #in feet per second
speed
## [1] 8.355933

Get distance for all the players for a specific event

The next step is generalizing this approach to all the players.

player.groups <- group_by(id303, lastname)
dist.traveled.players <- summarise(player.groups, totalDist=travelDist(x_loc, y_loc),playerid = max(player_id))
arrange(dist.traveled.players, desc(totalDist))
## Source: local data frame [11 x 3]
## 
##    lastname totalDist playerid
##       (chr)     (dbl)    (chr)
## 1      ball  387.7486       -1
## 2    LaVine  334.1868   203897
## 3      Diaw  293.1026     2564
## 4  Ginobili  283.4333     1938
## 5     Dieng  279.9276   203476
## 6  Anderson  273.2606   203937
## 7     Towns  266.6479  1626157
## 8     Mills  264.9849   201988
## 9     Rubio  231.3741   201937
## 10  Wiggins  224.7092   203952
## 11     West  223.7639     2561

Get distance for all the players for the entire game

This part extends the measurement to an entire game for all the players. For this game, the most active players went a little over 2 miles, which makes sense.

deduped.data <- unique( all.movements[ , 1:12 ] )  ##This takes about 30 seconds to run
player.groups <- group_by(deduped.data, lastname)
dist.traveled.players <- summarise(player.groups, totalDist=travelDist(x_loc,y_loc),playerid = max(player_id))
total <- arrange(dist.traveled.players, desc(totalDist))
total
## Source: local data frame [25 x 3]
## 
##    lastname totalDist playerid
##       (chr)     (dbl)    (chr)
## 1      ball 28331.149       -1
## 2     Towns 12756.956  1626157
## 3   Leonard 12565.014   202695
## 4   Wiggins 11924.002   203952
## 5     Dieng 10506.066   203476
## 6     Rubio 10379.569   201937
## 7  Aldridge 10232.743   200746
## 8    Parker 10168.108     2225
## 9    LaVine  9540.312   203897
## 10    Mills  8988.654   201988
## ..      ...       ...      ...

Get the distance between a player and the ball for one event

A more interesting use of the data is to see how distances between people and the ball change over time. This code shows you how to get the distance between two parties for an event. The example here uses Ginobili and the ball.

ginobili <- all.movements[which((all.movements$lastname == "Ginobili"| all.movements$lastname == "ball") & all.movements$event.id == 303),]
#Get distance for each player/ball
distgino <- ginobili %>% filter (lastname=="Ginobili") %>% select (x_loc,y_loc) 
distball <- ginobili %>% filter (lastname=="ball") %>% select (x_loc,y_loc) 
distlength <- 1:nrow(distgino)
#Use the R function dist for calculating distance
distsdf <- unlist(lapply(distlength,function(x) {dist(rbind(distgino[x,], distball[x,]))}))
#Add the game_clock
ball_distance <- ginobili %>% filter (lastname=="ball") %>% select (game_clock) %>% mutate(distance=distsdf)
plot_ly(data = ball_distance, x=game_clock, y=distsdf,mode = "markers")


Get the distance between a player and the ball for one event (using functions)

This part uses the same logic as above, but with functions I created to make it cleaner.

#Get Clock Info
clockinfo <- get_game_clock("Ginobili",303)
#Get Distance
playerdistance <- player_dist("Ginobili","ball",303)
#Plot
plot_ly(data = clockinfo, x=game_clock, y=playerdistance,mode = "markers")


Get the distance between all players and the ball for one event

This section generalizes the code to view the distance between all the players and the ball. The plot can be a bit messy (and does not show in Rpubs), but its an interesting way to see the interactions between players and the ball.

pickplayer <- "ball"
pickeventID <- 303

#Get all the players
players <- all.movements %>% filter(event.id==pickeventID) %>% select(lastname) %>% distinct(lastname)
#Calculate distance
bigdistance <- lapply(list(players$lastname)[[1]],function (x){player_dist(pickplayer,x,pickeventID)})
bigdistancedf <- as.data.frame(do.call('cbind',bigdistance))
colnames(bigdistancedf) <- list(players$lastname)[[1]]
#Get Clock Info
clockinfo <- get_game_clock(pickplayer,pickeventID)
bigdistancedf$game_clock <- clockinfo$game_clock
head(bigdistancedf)
##    Wiggins    Rubio Ginobili   LaVine    Towns    Dieng    Mills     West
## 1 31.32434 27.70318 28.05735 14.15580 11.15434 12.22589 7.365803 5.390532
## 2 31.17150 27.74658 27.89755 13.94656 10.78685 11.99696 7.008844 5.372184
## 3 31.12644 27.87612 27.90609 13.84602 10.63837 11.71629 6.895338 5.531435
## 4 31.02972 28.02709 27.71331 13.72026 10.31316 11.66066 6.520126 5.384511
## 5 30.95374 28.09493 27.72385 13.58995 10.20118 11.26154 6.476467 5.550497
## 6 31.02895 28.35244 27.78799 13.63587 10.15257 11.14932 6.415761 5.731264
##   ball     Diaw Anderson game_clock
## 1    0 5.975104 2.975292      377.9
## 2    0 6.098550 2.890022      377.9
## 3    0 6.006581 2.449993      377.9
## 4    0 6.244153 2.642072      377.9
## 5    0 6.067141 2.161015      377.9
## 6    0 6.006893 2.071653      377.9
##Plot with plotly - not elegant but shows you one way to visualize the data
for(i in 1:(ncol(bigdistancedf)-1)){
if(i==1){
  pString<-"p <- plot_ly(data = bigdistancedf, x = game_clock, y = bigdistancedf[,1], name = colnames(bigdistancedf[1]))"
} else {
  pString<-paste(pString, " %>% add_trace(y =",  eval(paste("bigdistancedf[,",i,"]",sep="")),", name=", eval(paste("colnames(bigdistancedf[", i,"])",sep="")), ")", sep="")
}
}
eval(parse(text=pString))
print(p)

Get a distance matrix between all the players for one eventID

The movement data also allows for the analysis of the distance between players. For example, if you are interested in the distance between LaVine and Ginobili for a certain play. This function calculates a matrix of all the distances between players and the ball.

pickeventID <- 303
players_matrix <- player_dist_matrix(pickeventID)
str(players_matrix)
## 'data.frame':    990 obs. of  111 variables:
##  $ Wiggins_Rubio    : num  6.67 6.73 6.76 6.78 6.83 ...
##  $ Wiggins_Ginobili : num  13.3 13.3 13.3 13.3 13.2 ...
##  $ Wiggins_LaVine   : num  17.2 17.3 17.3 17.3 17.4 ...
##  $ Wiggins_Towns    : num  21.1 21.5 21.7 22 22.2 ...
##  $ Wiggins_Dieng    : num  25.4 25.6 25.8 26 26.2 ...
##  $ Wiggins_Mills    : num  27.2 27.1 26.9 26.7 26.6 ...
##  $ Wiggins_West     : num  27.5 27.4 27.4 27.3 27.3 ...
##  $ Wiggins_ball     : num  31.3 31.2 31.1 31 31 ...
##  $ Wiggins_Diaw     : num  34 34 33.9 33.9 33.8 ...
##  $ Wiggins_Anderson : num  33.7 33.4 33.2 33 32.7 ...
##  $ Rubio_Wiggins    : num  6.67 6.73 6.76 6.78 6.83 ...
##  $ Rubio_Ginobili   : num  17.2 17.4 17.6 17.8 17.9 ...
##  $ Rubio_LaVine     : num  13.8 14.1 14.4 14.6 14.8 ...
##  $ Rubio_Towns      : num  19 19.5 20 20.5 21 ...
##  $ Rubio_Dieng      : num  19.9 20.3 20.6 21 21.3 ...
##  $ Rubio_Mills      : num  25 25 25 25 24.9 ...
##  $ Rubio_West       : num  24.8 24.9 25.1 25.3 25.5 ...
##  $ Rubio_ball       : num  27.7 27.7 27.9 28 28.1 ...
##  $ Rubio_Diaw       : num  29.4 29.5 29.7 29.8 29.9 ...
##  $ Rubio_Anderson   : num  29.7 29.6 29.6 29.6 29.6 ...
##  $ Ginobili_Wiggins : num  13.3 13.3 13.3 13.3 13.2 ...
##  $ Ginobili_Rubio   : num  17.2 17.4 17.6 17.8 17.9 ...
##  $ Ginobili_LaVine  : num  16.9 16.9 17 17 17 ...
##  $ Ginobili_Towns   : num  17 17.2 17.3 17.4 17.5 ...
##  $ Ginobili_Dieng   : num  27.7 27.9 28 28.1 28.2 ...
##  $ Ginobili_Mills   : num  21.6 21.7 21.7 21.6 21.6 ...
##  $ Ginobili_West    : num  22.9 22.8 22.7 22.6 22.6 ...
##  $ Ginobili_ball    : num  28.1 27.9 27.9 27.7 27.7 ...
##  $ Ginobili_Diaw    : num  32.7 32.7 32.6 32.6 32.5 ...
##  $ Ginobili_Anderson: num  30.9 30.7 30.3 30.2 29.8 ...
##  $ LaVine_Wiggins   : num  17.2 17.3 17.3 17.3 17.4 ...
##  $ LaVine_Rubio     : num  13.8 14.1 14.4 14.6 14.8 ...
##  $ LaVine_Ginobili  : num  16.9 16.9 17 17 17 ...
##  $ LaVine_Towns     : num  5.72 6 6.22 6.46 6.7 ...
##  $ LaVine_Dieng     : num  10.9 11 11 11.1 11.2 ...
##  $ LaVine_Mills     : num  11.4 11.1 10.8 10.5 10.2 ...
##  $ LaVine_West      : num  10.9 10.9 10.8 10.7 10.6 ...
##  $ LaVine_ball      : num  14.2 13.9 13.8 13.7 13.6 ...
##  $ LaVine_Diaw      : num  17 17 16.9 16.8 16.7 ...
##  $ LaVine_Anderson  : num  16.5 16.2 15.9 15.6 15.3 ...
##  $ Towns_Wiggins    : num  21.1 21.5 21.7 22 22.2 ...
##  $ Towns_Rubio      : num  19 19.5 20 20.5 21 ...
##  $ Towns_Ginobili   : num  17 17.2 17.3 17.4 17.5 ...
##  $ Towns_LaVine     : num  5.72 6 6.22 6.46 6.7 ...
##  $ Towns_Dieng      : num  13.5 13.6 13.7 13.8 14 ...
##  $ Towns_Mills      : num  6.13 5.63 5.21 4.8 4.45 ...
##  $ Towns_West       : num  6.46 6.1 5.81 5.54 5.27 ...
##  $ Towns_ball       : num  11.2 10.8 10.6 10.3 10.2 ...
##  $ Towns_Diaw       : num  15.7 15.6 15.5 15.4 15.3 ...
##  $ Towns_Anderson   : num  14 13.6 13 12.8 12.3 ...
##  $ Dieng_Wiggins    : num  25.4 25.6 25.8 26 26.2 ...
##  $ Dieng_Rubio      : num  19.9 20.3 20.6 21 21.3 ...
##  $ Dieng_Ginobili   : num  27.7 27.9 28 28.1 28.2 ...
##  $ Dieng_LaVine     : num  10.9 11 11 11.1 11.2 ...
##  $ Dieng_Towns      : num  13.5 13.6 13.7 13.8 14 ...
##  $ Dieng_Mills      : num  15.4 15 14.6 14.2 13.8 ...
##  $ Dieng_West       : num  13.5 13.5 13.5 13.4 13.4 ...
##  $ Dieng_ball       : num  12.2 12 11.7 11.7 11.3 ...
##  $ Dieng_Diaw       : num  10.67 10.41 10.16 9.88 9.57 ...
##  $ Dieng_Anderson   : num  12.5 12.2 12.1 11.6 11.4 ...
##  $ Mills_Wiggins    : num  27.2 27.1 26.9 26.7 26.6 ...
##  $ Mills_Rubio      : num  25 25 25 25 24.9 ...
##  $ Mills_Ginobili   : num  21.6 21.7 21.7 21.6 21.6 ...
##  $ Mills_LaVine     : num  11.4 11.1 10.8 10.5 10.2 ...
##  $ Mills_Towns      : num  6.13 5.63 5.21 4.8 4.45 ...
##  $ Mills_Dieng      : num  15.4 15 14.6 14.2 13.8 ...
##  $ Mills_West       : num  2.214 1.778 1.42 1.142 0.945 ...
##  $ Mills_ball       : num  7.37 7.01 6.9 6.52 6.48 ...
##  $ Mills_Diaw       : num  13.2 12.9 12.7 12.5 12.3 ...
##  $ Mills_Anderson   : num  10.31 9.87 9.3 9.16 8.63 ...
##  $ West_Wiggins     : num  27.5 27.4 27.4 27.3 27.3 ...
##  $ West_Rubio       : num  24.8 24.9 25.1 25.3 25.5 ...
##  $ West_Ginobili    : num  22.9 22.8 22.7 22.6 22.6 ...
##  $ West_LaVine      : num  10.9 10.9 10.8 10.7 10.6 ...
##  $ West_Towns       : num  6.46 6.1 5.81 5.54 5.27 ...
##  $ West_Dieng       : num  13.5 13.5 13.5 13.4 13.4 ...
##  $ West_Mills       : num  2.214 1.778 1.42 1.142 0.945 ...
##  $ West_ball        : num  5.39 5.37 5.53 5.38 5.55 ...
##  $ West_Diaw        : num  11 11.2 11.3 11.4 11.4 ...
##  $ West_Anderson    : num  8.37 8.26 7.96 8.03 7.69 ...
##  $ ball_Wiggins     : num  31.3 31.2 31.1 31 31 ...
##  $ ball_Rubio       : num  27.7 27.7 27.9 28 28.1 ...
##  $ ball_Ginobili    : num  28.1 27.9 27.9 27.7 27.7 ...
##  $ ball_LaVine      : num  14.2 13.9 13.8 13.7 13.6 ...
##  $ ball_Towns       : num  11.2 10.8 10.6 10.3 10.2 ...
##  $ ball_Dieng       : num  12.2 12 11.7 11.7 11.3 ...
##  $ ball_Mills       : num  7.37 7.01 6.9 6.52 6.48 ...
##  $ ball_West        : num  5.39 5.37 5.53 5.38 5.55 ...
##  $ ball_Diaw        : num  5.98 6.1 6.01 6.24 6.07 ...
##  $ ball_Anderson    : num  2.98 2.89 2.45 2.64 2.16 ...
##  $ Diaw_Wiggins     : num  34 34 33.9 33.9 33.8 ...
##  $ Diaw_Rubio       : num  29.4 29.5 29.7 29.8 29.9 ...
##  $ Diaw_Ginobili    : num  32.7 32.7 32.6 32.6 32.5 ...
##  $ Diaw_LaVine      : num  17 17 16.9 16.8 16.7 ...
##  $ Diaw_Towns       : num  15.7 15.6 15.5 15.4 15.3 ...
##  $ Diaw_Dieng       : num  10.67 10.41 10.16 9.88 9.57 ...
##  $ Diaw_Mills       : num  13.2 12.9 12.7 12.5 12.3 ...
##  $ Diaw_West        : num  11 11.2 11.3 11.4 11.4 ...
##  $ Diaw_ball        : num  5.98 6.1 6.01 6.24 6.07 ...
##   [list output truncated]

Credits

I hope this has been a useful introduction to working with the NBA movement data. For more of my explorations on the NBA data you can see my NBA Github repo. You can find more information about me, RajivShah or my other projects or find me on Twitter.