This page shows how to calculate player velocity, acceleration, and jerk. For background, here is a primer on physics as well as a paper measuring acceleration using SportVu data by Philip Maymin. I am not an expert in physics, so please gently correct me if there are errors.
In this markdown, I want to show how to calculate these metrics using the SportsVU data. As a starting point, it is necessary to use my previous notebooks to grab the data.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(plotly)
## Loading required package: ggplot2
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:graphics':
##
## layout
library(TTR)
source("_functions.R")
## Loading required package: bitops
To demonstrate calculating velocity, I picked a play with a quick pass in the Magic Wizards game on January 1st (event ID of 422). You can see the youtube video or the SportVU movement data not currently available.
The first step is extracting the data for event ID 422. Please refer to my other posts for how this data is downloaded and merged. I am importing a file that has been previously processed in a data frame with movement data.
all.movements <- read.csv("data/0021500490.gz")
event_df <- all.movements %>%
dplyr::arrange(quarter,desc(game_clock),x_loc) %>%
filter(event.id==422)
Lets start with looking at the velocity of the ball. I will later go into how velocity is calculated. The graph here shows how velocity changes over the play. Compare this by looking at the movement of the actual play. The results are in feet per second. 10 feet per second converts to 6.8 miles per hour.
df_ball <- event_df %>%
filter(player_id == "-1") %>%
filter (shot_clock != 24) #Remove some extra spillover data from the next event
#Using a function I created to get velocity
v <- velocity(df_ball$x_loc, df_ball$y_loc)
mean(v)
## [1] 12.81256
#Plotting
f <- list(
family = "Helvetica, monospace",
size = 18,
color = "#7f7f7f"
)
x <- list(
title = "Time",
titlefont = f
)
y <- list(
title = "Velocity ft/s",
titlefont = f
)
plot_ly(y=v) %>%
layout(xaxis = x, yaxis = y)
Lets step through the calculation of velocity. Calculating acceleration and jerk are just higher orders of the diff function - (take a look at my functions for the detail).
##Need to calculate the difference between two points - Use the R function diff
diffx <- as.vector((diff(df_ball$x_loc)))
head(diffx)
## [1] -0.34738 -0.28795 -0.19737 -0.29548 -0.35086 -0.20222
diffy <- as.vector((diff(df_ball$y_loc)))
##Next lets calculate the distance between each of these points
diffx2 <- diffx ^ 2
diffy2 <- diffy ^ 2
a<- diffx2 + diffy2
b<-sqrt(a)
##Then we need to divide by time - in this case time is 0.04 for the interval between points
b <- b / .04
head(b)
## [1] 14.74226 10.21297 11.23659 10.25592 11.66777 10.18068
The velocity can also be seen as a simple time series. R has some great functions for time series, so lets start by creating a time series object in R. The plot here can be confusing, because as the play goes on, the time gets smaller. View it from right to left.
timeseries <- cbind (v, df_ball$game_clock[-1]) #in creating the diff, we lose one value
timeseries <- as.data.frame(timeseries)
ball.ts <- ts(timeseries,end=df_ball$game_clock[1],start=df_ball$game_clock[136],frequency = 25)
plot_ly(x=timeseries$V2,y=v,data = timeseries) %>%
layout(xaxis = x, yaxis = y)
Time series data can be noisy and have all sorts of spikes. A traditional method for dealing with this is smoothing the data using a simple moving average. I have found 5 periods seems to work the best to average the data (n). Please let me know your experience in tweaking the time series data.
##Averaging over 3 points
timeseriesSMA3v <- SMA(timeseries,n=3)
plot_ly(y=timeseriesSMA3v) %>%
layout(xaxis = x, yaxis = y)
##Averaging over 5 points
timeseriesSMA5v <- SMA(timeseries,n=5)
plot_ly(y=timeseriesSMA5v) %>%
layout(xaxis = x, yaxis = y)
##Averaging over 8 points
timeseriesSMA8v <- SMA(timeseries,n=8)
plot_ly(y=timeseriesSMA8v) %>%
layout(xaxis = x, yaxis = y)
##Acceleration
a <- acceleration(df_ball$x_loc, df_ball$y_loc)
timeseriesa <- cbind (a, df_ball$game_clock[-1:-2])
timeseriesSMAa <- SMA(timeseriesa,n=3)
plot_ly(y=timeseriesSMAa)
##Jerk
j <- jerk(df_ball$x_loc, df_ball$y_loc)
timeseriesj <- cbind (j, df_ball$game_clock[-1:-3])
timeseriesSMAj <- SMA(timeseriesj,n=3)
plot_ly(y=timeseriesSMAj)
For more of my explorations on the NBA data you can see my NBA Github repo, specific posts include EDA, merging play by play data, and measuring player spacing using convex hulls.
I have pages providing more background on me, Rajiv Shah, my other projects, or find me on Twitter.