This is a simple walkthrough for getting started with the NBA SportVu movement data. The goal is to show how to parse a game file and perform basic EDA.
This post is inspired by the work of Savvast Tjortjoglou and Tanya Cashorali. I wanted to extend their work to analyzing an entire game.
For the play, I choose the NBA’s top rated play on December 23rd, 2015. It is a game between San Antonio and Minnesota and the play occurs with 6 minutes left in the third quarter.
Neil Johnson has taken the time to compile the movement data for NBA games at his github reposistory.
To get this game, you will need to download the file.
wget https://github.com/neilmj/BasketballData/blob/master/2016.NBA.Raw.SportVU.Game.Logs/12.23.2015.SAS.at.MIN.7z?raw=true
Unzip this file and you should end up with a file named: 0021500431.json
To read this file, first download the _functions.R file in my github repository for this project.
library(RCurl)
## Loading required package: bitops
library(jsonlite)
##
## Attaching package: 'jsonlite'
##
## The following object is masked from 'package:utils':
##
## View
library(dplyr)
##
## Attaching package: 'dplyr'
##
## The following objects are masked from 'package:stats':
##
## filter, lag
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(plotly)
## Loading required package: ggplot2
##
## Attaching package: 'plotly'
##
## The following object is masked from 'package:ggplot2':
##
## last_plot
##
## The following object is masked from 'package:graphics':
##
## layout
source("_functions.R")
The sportvu_convert_json function takes the json file and converts it into a data frame. For this game, the function takes about 3 minutes to convert the file. The resulting data frame is about 2.6 million observations by 13 variables.
all.movements <- sportvu_convert_json("data/0021500431.json")
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
str(all.movements)
## 'data.frame': 2646562 obs. of 13 variables:
## $ player_id : chr "2225" "2225" "-1" "-1" ...
## $ lastname : chr "Parker" "Parker" "ball" "ball" ...
## $ firstname : chr "Tony" "Tony" NA NA ...
## $ jersey : chr "9" "9" NA NA ...
## $ position : chr "G" "G" NA NA ...
## $ team_id : num 1.61e+09 1.61e+09 NA NA 1.61e+09 ...
## $ x_loc : num 51.7 51.7 52.9 52.9 60.4 ...
## $ y_loc : num 40.3 40.3 39.9 39.9 31.8 ...
## $ radius : num 0 0 2.5 2.5 0 ...
## $ game_clock: num 716 716 716 716 716 ...
## $ shot_clock: num 13.3 13.3 13.3 13.3 13.3 ...
## $ quarter : num 1 1 1 1 1 1 1 1 1 1 ...
## $ event.id : num 2 1 1 2 2 1 2 1 2 1 ...
The specific play we are interested in has the event ID #303. The NBA site has both the video and movement data available. The movement data shows the play:
The sportvu data provides movement data for every player and the ball. As an example, lets look at the movement of Ginobili for this play.
##Extract all data for event ID 303
id303 <- all.movements[which(all.movements$event.id == 303),]
##Extract all data for Ginobili on event ID #303
ginobili <- all.movements[which(all.movements$lastname == "Ginobili" & all.movements$event.id == 303),]
This data can be visualized to show how Ginobili moves around the court. The colors represent three different time ranges of movement. The y axis is the length of the court. An NBA court is 94 feet by 50 feet. (Savvast Tjortjoglou takes the time to plot this on a basketball court background image.)
p <- plot_ly(data = ginobili, x = x_loc, y = y_loc, mode = "markers", color=cut(ginobili$game_clock, breaks=3)) %>%
layout(xaxis = list(range = c(0, 100)),
yaxis = list(range = c(0, 50)))
p
I have a simple function to get the distance a player travels:
travelDist(ginobili$x_loc, ginobili$y_loc)
## [1] 283.4333
Building off the distance, it is possible to calculate the speed of a player.
seconds = max(ginobili$game_clock) - min(ginobili$game_clock)
speed = travelDist(ginobili$x_loc, ginobili$y_loc)/seconds #in feet per second
speed
## [1] 8.355933
The next step is generalizing this approach to all the players.
player.groups <- group_by(id303, lastname)
dist.traveled.players <- summarise(player.groups, totalDist=travelDist(x_loc, y_loc),playerid = max(player_id))
arrange(dist.traveled.players, desc(totalDist))
## Source: local data frame [11 x 3]
##
## lastname totalDist playerid
## (chr) (dbl) (chr)
## 1 ball 387.7486 -1
## 2 LaVine 334.1868 203897
## 3 Diaw 293.1026 2564
## 4 Ginobili 283.4333 1938
## 5 Dieng 279.9276 203476
## 6 Anderson 273.2606 203937
## 7 Towns 266.6479 1626157
## 8 Mills 264.9849 201988
## 9 Rubio 231.3741 201937
## 10 Wiggins 224.7092 203952
## 11 West 223.7639 2561
This part extends the measurement to an entire game for all the players. For this game, the most active players went a little over 2 miles, which makes sense.
deduped.data <- unique( all.movements[ , 1:12 ] ) ##This takes about 30 seconds to run
player.groups <- group_by(deduped.data, lastname)
dist.traveled.players <- summarise(player.groups, totalDist=travelDist(x_loc,y_loc),playerid = max(player_id))
total <- arrange(dist.traveled.players, desc(totalDist))
total
## Source: local data frame [25 x 3]
##
## lastname totalDist playerid
## (chr) (dbl) (chr)
## 1 ball 28331.149 -1
## 2 Towns 12756.956 1626157
## 3 Leonard 12565.014 202695
## 4 Wiggins 11924.002 203952
## 5 Dieng 10506.066 203476
## 6 Rubio 10379.569 201937
## 7 Aldridge 10232.743 200746
## 8 Parker 10168.108 2225
## 9 LaVine 9540.312 203897
## 10 Mills 8988.654 201988
## .. ... ... ...
A more interesting use of the data is to see how distances between people and the ball change over time. This code shows you how to get the distance between two parties for an event. The example here uses Ginobili and the ball.
ginobili <- all.movements[which((all.movements$lastname == "Ginobili"| all.movements$lastname == "ball") & all.movements$event.id == 303),]
#Get distance for each player/ball
distgino <- ginobili %>% filter (lastname=="Ginobili") %>% select (x_loc,y_loc)
distball <- ginobili %>% filter (lastname=="ball") %>% select (x_loc,y_loc)
distlength <- 1:nrow(distgino)
#Use the R function dist for calculating distance
distsdf <- unlist(lapply(distlength,function(x) {dist(rbind(distgino[x,], distball[x,]))}))
#Add the game_clock
ball_distance <- ginobili %>% filter (lastname=="ball") %>% select (game_clock) %>% mutate(distance=distsdf)
plot_ly(data = ball_distance, x=game_clock, y=distsdf,mode = "markers")
This part uses the same logic as above, but with functions I created to make it cleaner.
#Get Clock Info
clockinfo <- get_game_clock("Ginobili",303)
#Get Distance
playerdistance <- player_dist("Ginobili","ball",303)
#Plot
plot_ly(data = clockinfo, x=game_clock, y=playerdistance,mode = "markers")
This section generalizes the code to view the distance between all the players and the ball. The plot can be a bit messy (and does not show in Rpubs), but its an interesting way to see the interactions between players and the ball.
pickplayer <- "ball"
pickeventID <- 303
#Get all the players
players <- all.movements %>% filter(event.id==pickeventID) %>% select(lastname) %>% distinct(lastname)
#Calculate distance
bigdistance <- lapply(list(players$lastname)[[1]],function (x){player_dist(pickplayer,x,pickeventID)})
bigdistancedf <- as.data.frame(do.call('cbind',bigdistance))
colnames(bigdistancedf) <- list(players$lastname)[[1]]
#Get Clock Info
clockinfo <- get_game_clock(pickplayer,pickeventID)
bigdistancedf$game_clock <- clockinfo$game_clock
head(bigdistancedf)
## Wiggins Rubio Ginobili LaVine Towns Dieng Mills West
## 1 31.32434 27.70318 28.05735 14.15580 11.15434 12.22589 7.365803 5.390532
## 2 31.17150 27.74658 27.89755 13.94656 10.78685 11.99696 7.008844 5.372184
## 3 31.12644 27.87612 27.90609 13.84602 10.63837 11.71629 6.895338 5.531435
## 4 31.02972 28.02709 27.71331 13.72026 10.31316 11.66066 6.520126 5.384511
## 5 30.95374 28.09493 27.72385 13.58995 10.20118 11.26154 6.476467 5.550497
## 6 31.02895 28.35244 27.78799 13.63587 10.15257 11.14932 6.415761 5.731264
## ball Diaw Anderson game_clock
## 1 0 5.975104 2.975292 377.9
## 2 0 6.098550 2.890022 377.9
## 3 0 6.006581 2.449993 377.9
## 4 0 6.244153 2.642072 377.9
## 5 0 6.067141 2.161015 377.9
## 6 0 6.006893 2.071653 377.9
##Plot with plotly - not elegant but shows you one way to visualize the data
for(i in 1:(ncol(bigdistancedf)-1)){
if(i==1){
pString<-"p <- plot_ly(data = bigdistancedf, x = game_clock, y = bigdistancedf[,1], name = colnames(bigdistancedf[1]))"
} else {
pString<-paste(pString, " %>% add_trace(y =", eval(paste("bigdistancedf[,",i,"]",sep="")),", name=", eval(paste("colnames(bigdistancedf[", i,"])",sep="")), ")", sep="")
}
}
eval(parse(text=pString))
print(p)
The movement data also allows for the analysis of the distance between players. For example, if you are interested in the distance between LaVine and Ginobili for a certain play. This function calculates a matrix of all the distances between players and the ball.
pickeventID <- 303
players_matrix <- player_dist_matrix(pickeventID)
str(players_matrix)
## 'data.frame': 990 obs. of 111 variables:
## $ Wiggins_Rubio : num 6.67 6.73 6.76 6.78 6.83 ...
## $ Wiggins_Ginobili : num 13.3 13.3 13.3 13.3 13.2 ...
## $ Wiggins_LaVine : num 17.2 17.3 17.3 17.3 17.4 ...
## $ Wiggins_Towns : num 21.1 21.5 21.7 22 22.2 ...
## $ Wiggins_Dieng : num 25.4 25.6 25.8 26 26.2 ...
## $ Wiggins_Mills : num 27.2 27.1 26.9 26.7 26.6 ...
## $ Wiggins_West : num 27.5 27.4 27.4 27.3 27.3 ...
## $ Wiggins_ball : num 31.3 31.2 31.1 31 31 ...
## $ Wiggins_Diaw : num 34 34 33.9 33.9 33.8 ...
## $ Wiggins_Anderson : num 33.7 33.4 33.2 33 32.7 ...
## $ Rubio_Wiggins : num 6.67 6.73 6.76 6.78 6.83 ...
## $ Rubio_Ginobili : num 17.2 17.4 17.6 17.8 17.9 ...
## $ Rubio_LaVine : num 13.8 14.1 14.4 14.6 14.8 ...
## $ Rubio_Towns : num 19 19.5 20 20.5 21 ...
## $ Rubio_Dieng : num 19.9 20.3 20.6 21 21.3 ...
## $ Rubio_Mills : num 25 25 25 25 24.9 ...
## $ Rubio_West : num 24.8 24.9 25.1 25.3 25.5 ...
## $ Rubio_ball : num 27.7 27.7 27.9 28 28.1 ...
## $ Rubio_Diaw : num 29.4 29.5 29.7 29.8 29.9 ...
## $ Rubio_Anderson : num 29.7 29.6 29.6 29.6 29.6 ...
## $ Ginobili_Wiggins : num 13.3 13.3 13.3 13.3 13.2 ...
## $ Ginobili_Rubio : num 17.2 17.4 17.6 17.8 17.9 ...
## $ Ginobili_LaVine : num 16.9 16.9 17 17 17 ...
## $ Ginobili_Towns : num 17 17.2 17.3 17.4 17.5 ...
## $ Ginobili_Dieng : num 27.7 27.9 28 28.1 28.2 ...
## $ Ginobili_Mills : num 21.6 21.7 21.7 21.6 21.6 ...
## $ Ginobili_West : num 22.9 22.8 22.7 22.6 22.6 ...
## $ Ginobili_ball : num 28.1 27.9 27.9 27.7 27.7 ...
## $ Ginobili_Diaw : num 32.7 32.7 32.6 32.6 32.5 ...
## $ Ginobili_Anderson: num 30.9 30.7 30.3 30.2 29.8 ...
## $ LaVine_Wiggins : num 17.2 17.3 17.3 17.3 17.4 ...
## $ LaVine_Rubio : num 13.8 14.1 14.4 14.6 14.8 ...
## $ LaVine_Ginobili : num 16.9 16.9 17 17 17 ...
## $ LaVine_Towns : num 5.72 6 6.22 6.46 6.7 ...
## $ LaVine_Dieng : num 10.9 11 11 11.1 11.2 ...
## $ LaVine_Mills : num 11.4 11.1 10.8 10.5 10.2 ...
## $ LaVine_West : num 10.9 10.9 10.8 10.7 10.6 ...
## $ LaVine_ball : num 14.2 13.9 13.8 13.7 13.6 ...
## $ LaVine_Diaw : num 17 17 16.9 16.8 16.7 ...
## $ LaVine_Anderson : num 16.5 16.2 15.9 15.6 15.3 ...
## $ Towns_Wiggins : num 21.1 21.5 21.7 22 22.2 ...
## $ Towns_Rubio : num 19 19.5 20 20.5 21 ...
## $ Towns_Ginobili : num 17 17.2 17.3 17.4 17.5 ...
## $ Towns_LaVine : num 5.72 6 6.22 6.46 6.7 ...
## $ Towns_Dieng : num 13.5 13.6 13.7 13.8 14 ...
## $ Towns_Mills : num 6.13 5.63 5.21 4.8 4.45 ...
## $ Towns_West : num 6.46 6.1 5.81 5.54 5.27 ...
## $ Towns_ball : num 11.2 10.8 10.6 10.3 10.2 ...
## $ Towns_Diaw : num 15.7 15.6 15.5 15.4 15.3 ...
## $ Towns_Anderson : num 14 13.6 13 12.8 12.3 ...
## $ Dieng_Wiggins : num 25.4 25.6 25.8 26 26.2 ...
## $ Dieng_Rubio : num 19.9 20.3 20.6 21 21.3 ...
## $ Dieng_Ginobili : num 27.7 27.9 28 28.1 28.2 ...
## $ Dieng_LaVine : num 10.9 11 11 11.1 11.2 ...
## $ Dieng_Towns : num 13.5 13.6 13.7 13.8 14 ...
## $ Dieng_Mills : num 15.4 15 14.6 14.2 13.8 ...
## $ Dieng_West : num 13.5 13.5 13.5 13.4 13.4 ...
## $ Dieng_ball : num 12.2 12 11.7 11.7 11.3 ...
## $ Dieng_Diaw : num 10.67 10.41 10.16 9.88 9.57 ...
## $ Dieng_Anderson : num 12.5 12.2 12.1 11.6 11.4 ...
## $ Mills_Wiggins : num 27.2 27.1 26.9 26.7 26.6 ...
## $ Mills_Rubio : num 25 25 25 25 24.9 ...
## $ Mills_Ginobili : num 21.6 21.7 21.7 21.6 21.6 ...
## $ Mills_LaVine : num 11.4 11.1 10.8 10.5 10.2 ...
## $ Mills_Towns : num 6.13 5.63 5.21 4.8 4.45 ...
## $ Mills_Dieng : num 15.4 15 14.6 14.2 13.8 ...
## $ Mills_West : num 2.214 1.778 1.42 1.142 0.945 ...
## $ Mills_ball : num 7.37 7.01 6.9 6.52 6.48 ...
## $ Mills_Diaw : num 13.2 12.9 12.7 12.5 12.3 ...
## $ Mills_Anderson : num 10.31 9.87 9.3 9.16 8.63 ...
## $ West_Wiggins : num 27.5 27.4 27.4 27.3 27.3 ...
## $ West_Rubio : num 24.8 24.9 25.1 25.3 25.5 ...
## $ West_Ginobili : num 22.9 22.8 22.7 22.6 22.6 ...
## $ West_LaVine : num 10.9 10.9 10.8 10.7 10.6 ...
## $ West_Towns : num 6.46 6.1 5.81 5.54 5.27 ...
## $ West_Dieng : num 13.5 13.5 13.5 13.4 13.4 ...
## $ West_Mills : num 2.214 1.778 1.42 1.142 0.945 ...
## $ West_ball : num 5.39 5.37 5.53 5.38 5.55 ...
## $ West_Diaw : num 11 11.2 11.3 11.4 11.4 ...
## $ West_Anderson : num 8.37 8.26 7.96 8.03 7.69 ...
## $ ball_Wiggins : num 31.3 31.2 31.1 31 31 ...
## $ ball_Rubio : num 27.7 27.7 27.9 28 28.1 ...
## $ ball_Ginobili : num 28.1 27.9 27.9 27.7 27.7 ...
## $ ball_LaVine : num 14.2 13.9 13.8 13.7 13.6 ...
## $ ball_Towns : num 11.2 10.8 10.6 10.3 10.2 ...
## $ ball_Dieng : num 12.2 12 11.7 11.7 11.3 ...
## $ ball_Mills : num 7.37 7.01 6.9 6.52 6.48 ...
## $ ball_West : num 5.39 5.37 5.53 5.38 5.55 ...
## $ ball_Diaw : num 5.98 6.1 6.01 6.24 6.07 ...
## $ ball_Anderson : num 2.98 2.89 2.45 2.64 2.16 ...
## $ Diaw_Wiggins : num 34 34 33.9 33.9 33.8 ...
## $ Diaw_Rubio : num 29.4 29.5 29.7 29.8 29.9 ...
## $ Diaw_Ginobili : num 32.7 32.7 32.6 32.6 32.5 ...
## $ Diaw_LaVine : num 17 17 16.9 16.8 16.7 ...
## $ Diaw_Towns : num 15.7 15.6 15.5 15.4 15.3 ...
## $ Diaw_Dieng : num 10.67 10.41 10.16 9.88 9.57 ...
## $ Diaw_Mills : num 13.2 12.9 12.7 12.5 12.3 ...
## $ Diaw_West : num 11 11.2 11.3 11.4 11.4 ...
## $ Diaw_ball : num 5.98 6.1 6.01 6.24 6.07 ...
## [list output truncated]
I hope this has been a useful introduction to working with the NBA movement data. For more of my explorations on the NBA data you can see my NBA Github repo. You can find more information about me, RajivShah or my other projects or find me on Twitter.