Dan Vatterott

Data Scientist

My First Kodi Addon - PBS NewsHour (a Tutorial)

I’ve been using Kodi/XBMC since 2010. It provides a flexible and (relatively) intuitive interface for interacting with content through your TV (much like an apple TV). One of the best parts of Kodi is the addons - these are apps that you can build or download. For instance, I use the NBA League Pass addon for watching Wolves games. I’ve been looking for a reason to build my own Kodi addon for years.

Enter PBS NewsHour. If you’re not watching PBS NewsHour, I’m not sure what you’re doing with your life because it’s the shit. It rocks. PBS NewsHour disseminates all their content on youtube and their website. For the past couple years, I’ve been watching their broadcasts every morning through the Youtube addon. This works fine, but it’s clunky. I decided to stream line watching the NewsHour by building a Kodi addon for it.

I used this tutorial to build a Kodi addon that accesses the PBS NewsHour content through the youtube addon. This addon can be found on my github. The addon works pretty well, but it includes links to all NewsHour’s content, and I only want the full episodes. I am guessing I could have modified this addon to get what I wanted, but I really wanted to build my own addon from scratch.

The addon I built is available on my github. To build my addon, I used this tutorial, and some code from this github repository. Below I describe how the addon works. I only describe the file default.py because this file does the majority of the work, and I found the linked tutorials did a good job explaining the other files.

I start by importing libraries that I will use. Most these libraries are used for scraping content off the web. I then create some basic variables to describe the addon’s name (addonID), its name in kodi (base_url), the number used to refer to it (addon_handle - I am not sure how this number is used), and current arguments sent to my addon (args).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
import zlib
import json
import sys
import urlparse
import xbmc
import xbmcgui
import xbmcplugin

import urllib2
import re

addonID = 'plugin.video.pbsnewshour'

base_url = sys.argv[0]
addon_handle = int(sys.argv[1])
args = urlparse.parse_qs(sys.argv[2][1:])

The next function, getRequest, gathers html from a website (specified by the variable url). The dictionary httpHeaders tells the website a little about myself, and how I want the html. I use urllib2 to get a compressed version of the html, which is decompressed using zlib.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# -----------  Create some functions for fetching videos ---------------
# https://github.com/learningit/Kodi-plugins-source/blob/master/script.module.t1mlib/lib/t1mlib.py
UTF8 = 'utf-8'
USERAGENT = """Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 \
            (KHTML, like Gecko) Chrome/41.0.2272.101 Safari/537.36"""
httpHeaders = {'User-Agent': USERAGENT,
               'Accept': "application/json, text/javascript, text/html,*/*",
               'Accept-Encoding': 'gzip,deflate,sdch',
               'Accept-Language': 'en-US,en;q=0.8'
               }


def getRequest(url, udata=None, headers=httpHeaders):
    req = urllib2.Request(url.encode(UTF8), udata, headers)
    try:
        response = urllib2.urlopen(req)
        page = response.read()
        if response.info().getheader('Content-Encoding') == 'gzip':
            page = zlib.decompress(page, zlib.MAX_WBITS + 16)
        response.close()
    except Exception:
        page = ""
        xbmc.log(msg='REQUEST ERROR', level=xbmc.LOGDEBUG)
    return(page)

The hardest part of building this addon was finding video links. I was able to find a github repo with code for identifying links to PBS’s videos, but PBS initially posts their videos on youtube. I watch PBS NewsHour the morning after it airs, so I needed a way to watch these youtube links. I started this post by saying I wanted to avoid using Kodi’s youtube addon, but I punted and decided to use the youtube addon to play these links. Below is a function for finding the youtube id of a video.

1
2
3
4
5
def deal_with_youtube(html):
    vid_num = re.compile('<span class="youtubeid">(.+?)</span>',
                         re.DOTALL).search(html)
    url = vid_num.group(1)
    return url

This next function actually fetches the videos (the hard part of building this addon). This function fetches the html of the website that has PBS’s video. It then searches the html for “coveplayerid,” which is PBS’s name for the video. I use this name to create a url that will play the video. I get the html associated with this new url, and search it for a json file that contains the video. I grab this json file, and viola I have the video’s url! In the final part of the code, I request a higher version of the video than PBS would give me by default.

If I fail to find “coveplayerid,” then I know this is a video with a youtube link, so I grab the youtube id. Some pages have a coveplayerid class, but no actual coveplayerid. I also detect these cases and find the youtube id when it occurs.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# https://github.com/learningit/Kodi-plugins-source/blob/master/plugin.video.thinktv/resources/lib/scraper.py
# modified from link above
def getAddonVideo(url, udata=None, headers=httpHeaders):
    html = getRequest(url)

    vid_num = re.compile('<span class="coveplayerid">(.+?)</span>',
                         re.DOTALL).search(html)
    if vid_num:
        vid_num = vid_num.group(1)
        if 'youtube' in vid_num:
            return deal_with_youtube(html)
        pg = getRequest('http://player.pbs.org/viralplayer/%s/' % (vid_num))
        query = """PBS.videoData =.+?recommended_encoding.+?'url'.+?'(.+?)'"""
        urls = re.compile(query, re.DOTALL).search(pg)

        url = urls.groups()
        pg = getRequest('%s?format=json' % url)
        url = json.loads(pg)['url']
    else:  # weekend links are initially posted as youtube vids
        deal_with_youtube(html)

    url = url.replace('800k', '2500k')
    if 'hd-1080p' in url:
        url = url.split('-hls-', 1)[0]
        url = url+'-hls-6500k.m3u8'
    return url

This next function identifies full episodes that have aired in the past week. It’s the meat of the addon. The function gets the html of PBS NewsHour’s page, and finds all links in a side-bar where PBS lists their past week’s episodes. I loop through the links and create a menu item for each one. These menu items are python objects that Kodi can display to users. The items include a label/title (the name of the episode), an image, and a url that Kodi can use to find the video url.

The most important part of this listing is the url I create. This url gives Kodi all the information I just described, associates the link with an addon, and tells Kodi that the link is playable. In the final part of the function, I pass the list of links to Kodi.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# -------------- Create list of videos --------------------
# http://kodi.wiki/view/HOW-TO:Video_addon
def list_videos(url='http://www.pbs.org/newshour/videos/'):
    html = getRequest(url)

    query = """<div class='sw-pic maxwidth'>.+?href='(.+?)'.+?src="(.+?)".+?title="(.+?)" """
    videos = re.compile(query, re.DOTALL).findall(html)

    listing = []
    for vids in videos:
        list_item = xbmcgui.ListItem(label=vids[2],
                                     thumbnailImage=vids[1])
        list_item.setInfo('video', {'title': vids[2]})
        list_item.setProperty('IsPlayable', 'true')

        url = ("%s?action=%s&title=%s&url=%s&thumbnail=%s"
               % (base_url, 'play', vids[2], vids[0], vids[1]))

        listing.append((url, list_item, False))

    # Add list to Kodi.
    xbmcplugin.addDirectoryItems(addon_handle, listing, len(listing))
    xbmcplugin.endOfDirectory(handle=addon_handle, succeeded=True)

Okay, thats the hard part. The rest of the code implements the functions I just described. The function below is executed when a user chooses to play a video. It gets the url of the video, and gives this to the xbmc function that will play the video. The only hiccup here is I check whether the link is for the standard PBS video type or not. If it is, then I give the link directly to Kodi. If it’s not, then this is a youtube link and I launch the youtube plugin with my youtube video id.

1
2
3
4
5
6
7
8
9
def play_video(path):
    path = getAddonVideo(path)
    if '00k' in path:
        play_item = xbmcgui.ListItem(path=path)
        xbmcplugin.setResolvedUrl(addon_handle, True, listitem=play_item)
    else:  # deal with youtube links
        path = 'plugin://plugin.video.youtube/?action=play_video&videoid=' + path
        play_item = xbmcgui.ListItem(path=path)
        xbmcplugin.setResolvedUrl(addon_handle, True, listitem=play_item)

This final function is launched whenever a user calls the addon or executes an action in the addon (thats why I call the function in the final line of code here). params is an empty dictionary if the addon is being opened. params being empty causes the addon to call list_videos, creating the list of episodes that PBS has aired in the past week. If the user selects one of the episodes, then router is called again, but this time the argument is the url of the selected item. This url is passed to the play_video function, which plays the video for the user!

1
2
3
4
5
6
7
8
9
10
11
12
13
def router():
    params = dict(args)

    if params:
        if params['action'][0] == 'play':
            play_video(params['url'][0])
        else:
            raise ValueError('Invalid paramstring: {0}!'.format(params))
    else:
        list_videos()


router()

That’s my addon! I hope this tutorial helps people create future Kodi addons. Definitely reach out if you have questions. Also, make sure to check out the NewsHour soon and often. It’s the bomb.

Sifting the Overflow

In January 2017, I started a fellowship at Insight Data Science. Insight is a 7 week program for helping academics transition from academia to careers in data science. In the first 4 weeks, fellows build data science products, and fellows present these products to different companies in the last 3 weeks.

At Insight, I built Sifting the Overflow, a chrome extension which you can install from the google chrome store. Sifting the Overflow identifies the most helpful parts of answers to questions about the programming language Python on StackOverflow.com. To created Sifting the Overflow, I trained a recurrent neural net (RNN) to identify “helpful” answers, and when you use the browser extension on a stackoverflow page, this RNN rates the helpfulness of each sentence of each answer. The sentences that my model believes to be helpful are highlighted so that users can quickly find the most helpful parts of these pages.

I wrote a quick post here about how I built Sifting the Overflow, so check it out if you’re interested. The code is also available on my github.

Simulating the Monty Hall Problem

I’ve been hearing about the Monty Hall problem for years and its never quite made sense to me, so I decided to program up a quick simulation.

In the Monty Hall problem, there is a car behind one of three doors. There are goats behind the other two doors. The contestant picks one of the three doors. Monty Hall (the game show host) then reveals that one of the two unchosen doors has a goat behind it. The question is whether the constestant should change the door they picked or keep their choice.

My first intuition was that it doesn’t matter whether the contestant changes their choice because its equally probable that the car is behind either of the two unopened doors, but I’ve been told this is incorrect! Instead, the contestant is more likely to win the car if they change their choice.

How can this be? Well, I decided to create a simple simulation of the Monty Hall problem in order to prove to myself that there really is an advantage to changing the chosen door and (hopefully) gain an intuition into how this works.

Below I’ve written my little simulation. A jupyter notebook with this code is available on my github.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
import random
import copy
import numpy as np

start_vect = [1,0,0] #doors

samples = 5000 #number of simulations to run

change, no_change = [],[]
for i in range(samples):

    #shuffle data
    vect = copy.copy(start_vect)
    random.shuffle(vect)

    #make choice
    choice = vect.pop(random.randint(0,2))
    no_change.append(choice) #outcome if do not change choice

    #show bad door
    try:
        bad = vect.pop(int(np.where(np.array(vect)==0)[0]))
    except:
        bad = vect.pop(0)

    change.append(vect) #outcome if change choice

Here I plot the results

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('ggplot')

plt.bar([0.5,1.5],[np.mean(change),np.mean(no_change)],width=1.0)
plt.xlim((0,3))
plt.ylim((0,1))
plt.ylabel('Proportion Correct Choice')
plt.xticks((1.0,2.0),['Change Choice', 'Do not change choice'])

import scipy.stats as stats
obs = np.array([[np.sum(change), np.sum(no_change)], [samples, samples]])
print('Probability of choosing correctly if change choice: %0.2f' % np.mean(change))
print('Probability of choosing correctly if do not change choice: %0.2f' % np.mean(no_change))
print('Probability of difference arising from chance: %0.5f' % stats.chi2_contingency(obs)[1])
Probability of choosing correctly if change choice: 0.67
Probability of choosing correctly if do not change choice: 0.33
Probability of difference arising from chance: 0.00000

Clearly, the contestant should change their choice!

So now, just to make sure I am not crazy, I decided to simulate the Monty Hall problem with the contestant choosing what door to open after Monty Hall opens a door with a goat.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
change, no_change = [],[]
for i in range(samples):
    #shuffle data
    vect = copy.copy(start_vect)
    random.shuffle(vect)

    #show bad door
    bad = vect.pop(int(np.where(np.array(vect)==0)[0][0]))

    #make choice
    choice = vect.pop(random.randint(0,1))
    no_change.append(choice)

    change.append(vect)
1
2
3
4
5
6
7
8
9
10
plt.bar([0.5,1.5],[np.mean(change),np.mean(no_change)],width=1.0)
plt.xlim((0,3))
plt.ylim((0,1))
plt.ylabel('Proportion Correct Choice')
plt.xticks((1.0,2.0),['Change Choice', 'Do not change choice'])

obs = np.array([[np.sum(change), np.sum(no_change)], [samples, samples]])
print('Probability of choosing correctly if change choice: %0.2f' % np.mean(change))
print('Probability of choosing correctly if do not change choice: %0.2f' % np.mean(no_change))
print('Probability of difference arising from chance: %0.5f' % stats.chi2_contingency(obs)[1])
Probability of choosing correctly if change choice: 0.51
Probability of choosing correctly if do not change choice: 0.49
Probability of difference arising from chance: 0.57546

Now, there is clearly no difference between whether the contestant changes their choice or not.

So what is different about these two scenarios?

In the first scenario, the contestant makes a choice before Monty Hall reveals which of the two unchosen options is incorrect. Here’s the intution I’ve gained by doing this - because Monty Hall cannot reveal what is behind the chosen door, when Monty Hall reveals what is behind one of the unchosen doors, this has no impact on how likely the car is to appear behind the chosen door. Yet, the probability that the car is behind the revealed door drops to 0 (because Monty Hall shows there’s a goat behind it), and the total probability must be conserved so the second unchosen door receives any belief that the car was behind the revealed door! Thus, the unchosen and unrevealed door becomes 66% likely to contain the car! I am still not 100% convinced of this new intuition, but it seems correct given these simulations!

SFN 2016 Presentation

I recently presented at the annual meeting of the society for neuroscience, so I wanted to do a quick post describing my findings.

The reinforcement learning literature postulates that we go in and out of exploratory states in order to learn about our environments and maximize the reward we gain in these environments. For example, you might try different foods in order to find the food you most prefer. But, not all novelty seeking behavior results from reward maximization. For example, I often read new books. Maybe reading a new book triggers a reward circuit response, but it certainly doesn’t lead to immediate rewards.

In this poster we used a free viewing task to examine whether an animal would exhibit a novelty preference when it was not associated with any possible rewards. We found the animal looked at (payed attention to) novel items more often than he looked at familiar items, but this preference for paying attention to novel items fluctuated over time. Sometimes the animal had a large preference for looking at the novel items and sometimes he had no preference for novels items.

Neurons that we recorded in the dlPFC and area 7a encoded whether the animal was currently in a state where he prefered looking at novel items or not and this encoding persisted across the entire trial period. Importantly, while neurons in these areas also encoded whether the animal was currently looking at a novel item or not, this encoding was distinct from the encoding of the current preference state. These results demonstrate that the animal had simultaneous neural codes representing whether he was acutely attending to novel items and his general preference for attending to novel items or not. Importantly, these neural codes existed even though there were no explicit reward associations.

PCA Tutorial

Principal Component Analysis (PCA) is an important method for dimensionality reduction and data cleaning. I have used PCA in the past on this blog for estimating the latent variables that underlie player statistics. For example, I might have two features: average number of offensive rebounds and average number of defensive rebounds. The two features are highly correlated because a latent variable, the player’s rebounding ability, explains common variance in the two features. PCA is a method for extracting these latent variables that explain common variance across features.

In this tutorial I generate fake data in order to help gain insight into the mechanics underlying PCA.

Below I create my first feature by sampling from a normal distribution. I create a second feature by adding a noisy normal distribution to the first feature multiplied by two. Because I generated the data here, I know it’s composed to two latent variables, and PCA should be able to identify these latent variables.

I generate the data and plot it below.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import numpy as np, matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('ggplot')

np.random.seed(1) #make sure we're all working with the same numbers

X = np.random.normal(0.0,2.0,[100,1])
X = [X,X*2+np.random.normal(0.0,8.0,[100,1])]
X = np.squeeze(X)

plt.plot(X[0],X[1],'o')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('Raw Data')
plt.axis([-6,6,-30,30]);

The first step before doing PCA is to normalize the data. This centers each feature (each feature will have a mean of 0) and divides data by its standard deviation (changing the standard deviation to 1). Normalizing the data puts all features on the same scale. Having features on the same scale is important because features might be more or less variable because of measurement rather than the latent variables producing the feature. For example, in basketball, points are often accumulated in sets of 2s and 3s, while rebounds are accumulated one at a time. The nature of basketball puts points and rebounds on a different scales, but this doesn’t mean that the latent variables scoring ability and rebounding ability are more or less variable.

Below I normalize and plot the data.

1
2
3
4
5
6
7
8
9
import scipy.stats as stats

X = stats.mstats.zscore(X,axis=1)

plt.plot(X[0],X[1],'o')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('Standardized Data')
plt.axis([-4,4,-4,4]);

After standardizing the data, I need to find the eigenvectors and eigenvalues. The eigenvectors point in the direction of a component and eigenvalues represent the amount of variance explained by the component. Below, I plot the standardized data with the eigenvectors ploted with their eigenvalues as the vectors distance from the origin.

As you can see, the blue eigenvector is longer and points in the direction with the most variability. The purple eigenvector is shorter and points in the direction with less variability.

As expected, one component explains far more variability than the other component (becaus both my features share variance from a single latent gaussian distribution).

1
2
3
4
5
6
7
8
9
10
C = np.dot(X,np.transpose(X))/(np.shape(X)[1]-1);
[V,PC] = np.linalg.eig(C)

plt.plot(X[0],X[1],'o')
plt.plot([0,PC[0,0]*V[0]],[0,PC[1,0]*V[0]],'o-')
plt.plot([0,PC[0,1]*V[1]],[0,PC[1,1]*V[1]],'o-')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('Standardized Data with Eigenvectors')
plt.axis([-4,4,-4,4]);

Next I order the eigenvectors according to the magnitude of their eigenvalues. This orders the components so that the components that explain more variability occur first. I then transform the data so that they’re axis aligned. This means the first component explain variability on the x-axis and the second component explains variance on the y-axis.

1
2
3
4
5
6
7
8
9
10
11
12
13
indices = np.argsort(-1*V)
V = V[indices]
PC = PC[indices,:]

X_rotated = np.dot(X.T,PC)

plt.plot(X_rotated.T[0],X_rotated.T[1],'o')
plt.plot([0,PC[1,0]*V[0]],[0,0],'o-')
plt.plot([0,0],[0,PC[1,1]*V[1]],'o-')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('Data Projected into PC space')
plt.axis([-4,4,-4,4]);

Finally, just to make sure the PCA was done correctly, I will call PCA from the sklearn library, run it, and make sure it produces the same results as my analysis.

1
2
3
4
5
6
7
from sklearn.decomposition import PCA

pca = PCA() #create PCA object
test = pca.fit_transform(X.T) #pull out principle components

print(stats.stats.pearsonr(X_rotated.T[0],test.T[0]))
print(stats.stats.pearsonr(X_rotated.T[1],test.T[1]))
(-1.0, 0.0)
(-1.0, 0.0)

Attention in a Convolutional Neural Net

This summer I had the pleasure of attending the Brains, Minds, and Machines summer course at the Marine Biology Laboratory. While there, I saw cool research, met awesome scientists, and completed an independent project. In this blog post, I describe my project.

In 2012, Krizhevsky et al. released a convolutional neural network that completely blew away the field at the imagenet challenge. This model is called “Alexnet,” and 2012 marks the beginning of neural networks’ resurgence in the machine learning community.

Alexnet’s domination was not only exciting for the machine learning community. It was also exciting for the visual neuroscience community whose descriptions of the visual system closely matched alexnet (e.g., HMAX). Jim DiCarlo gave an awesome talk at the summer course describing his research comparing the output of neurons in the visual system and the output of “neurons” in alexnet (you can find the article here).

I find the similarities between the visual system and convolutional neural networks exciting, but check out the depictions of alexnet and the visual system above. Alexnet is depicted in the upper image. The visual system is depicted in the lower image. Comparing the two images is not fair, but the visual system is obviously vastly more complex than alexnet.

In my project, I applied a known complexity of the biological visual system to a convolutional neural network. Specifically, I incoporated visual attention into the network. Visual attention refers to our ability to focus cognitive processing onto a subset of the environment. Check out this video for an incredibly 90s demonstration of visual attention.

In this post, I demonstrate that implementing a basic version of visual attention in a convolutional neural net improves performance of the CNN, but only when classifying noisy images, and not when classifying relatively noiseless images.

Code for everything described in this post can be found on my github page. In creating this model, I cribbed code from both Jacob Gildenblat and this implementation of alexnet.

I implemented my model using the Keras library with a Theano backend, and I tested my model on the MNIST database. The MNIST database is composed of images of handwritten numbers. The task is to design a model that can accurately guess what number is written in the image. This is a relatively easy task, and the best models are over 99% accurate.

I chose MNIST because its an easy problem, which allows me to use a small network. A small network is both easy to train and easy to understand, which is good for an exploratory project like this one.

Above, I depict my model. This model has two convolutional layers. Following the convolutional layers is a feature averaging layer which borrows methods from a recent paper out of the Torralba lab and computes the average activity of units covering each location. The output of this feature averaging layer is then passed along to a fully connected layer. The fully connected layer “guesses” what the most likely digit is. My goal when I first created this network was to use this “guess” to guide where the model focused processing (i.e., attention), but I found guided models are irratic during training.

Instead, my current model directs attention to all locations that are predictive of all digits. I haven’t toyed too much with inbetween models - models that direct attention to locations that are predictive of the N most likely digits.

So what does it mean to “direct attention” in this model. Here, directing attention means that neurons covering “attended” locations are more active than neurons covering the unattended locations. I apply attention to the input of the second convolutional layer. The attentionally weighted signal passes through the second convolutional layer and passes onto the feature averaging layer. The feature averaging layer feeds to the fully connected layer, which then produces a final guess about what digit is present.

I first tested this model on the plain MNIST set. For testing, I wanted to compare my model to a model without attention. My comparison model is the same as the model with attention except that the attention directing signal is a matrix of ones - meaning that it doesn’t have any effect on the model’s activity. I use this comparison model because it has the same architecture as the model with attention.

I depict the results of my attentional and comparison models below. On the X-axis is the test phase (10k trials) following each training epoch (60k trials). On the Y-axis is percent accuracy during the test phase. I did 3 training runs with both sets of models. All models gave fairly similar results, which led to small error bars (these depict standard error). The results are … dissapointing. As you can see both the model with attention and the comparison model perform similarly. There might be an initial impact of attention, but this impact is slight.

This result was a little dissapointing (since I’m an attention researcher and consider attention an important part of cognition), but it might not be so surprising given the task. If I gave you the task of naming digits, this task would be virtually effortless; probably so effortless that you would not have to pay very much attention to the task. You could probably talk on the phone or text while doing this task. Basically, I might have failed to find an effect of attention because this task is so easy that it does not require attention.

I decided to try my network when the task was a little more difficult. To make the task more difficult, I added random noise to each image (thank you to Nancy Kanwisher for the suggestion). This trick of adding noise to images is one that’s frequently done in psychophysical attention expeirments, so it would be fitting if it worked here.

The figure above depicts model performance on noisy images. The models are the as before, but this time the model with attention is far superior to the comparison model. Good news for attention researchers! This work suggests that visual attentional mechanisms similar to those in the brain may be beneficial in convolutional neural networks, and this effect is particularly strong with the images are noisy.

This work bears superficial similarity to recent language translation and question answering models. Models like the cited one report using a biologically inspired version of attention, and I agree they do, but they do not use attention in the same way that I am here. I believe this difference demonstrates a problem with what we call “attention.” Attention is not a single cognitive process. Instead, its a family of cognitive processes that we’ve simply given the same name. Thats not to say these forms of attention are completely distinct, but they likely involve different information transformations and probably even different brain regions.

Revisting NBA Career Predictions From Rookie Performance...again

Now that the NBA season is done, we have complete data from this year’s NBA rookies. In the past I have tried to predict NBA rookies’ future performance using regression models. In this post I am again trying to predict rookies’ future performance, but now using using a classification approach. When using a classification approach, I predict whether player X will be a “great,” “average,” or “poor” player rather than predicting exactly how productive player X will be.

Much of this post re-uses code from the previous posts, so I skim over some of the repeated code.

As usual, I will post all code as a jupyter notebook on my github.

1
2
3
4
5
6
7
#import some libraries and tell ipython we want inline figures rather than interactive figures. 
import matplotlib.pyplot as plt, pandas as pd, numpy as np, matplotlib as mpl

from __future__ import print_function

%matplotlib inline
plt.style.use('ggplot') #im addicted to ggplot. so pretty.

Load the data. Reminder - this data is available on my github.

1
2
3
4
5
6
7
8
rookie_df = pd.read_pickle('nba_bballref_rookie_stats_2016_Apr_16.pkl') #here's the rookie year data

rook_games = rookie_df['Career Games']>50 #only attempting to predict players that have played at least 50 games
rook_year = rookie_df['Year']>1980 #only attempting to predict players from after 1980

#remove rookies from before 1980 and who have played less than 50 games. I also remove some features that seem irrelevant or unfair
rookie_df_games = rookie_df[rook_games & rook_year] #only players with more than 50 games. 
rookie_df_drop = rookie_df_games.drop(['Year','Career Games','Name'],1)

Load more data, and normalize the data for the PCA transformation.

1
2
3
4
5
6
7
8
from sklearn.preprocessing import StandardScaler

df = pd.read_pickle('nba_bballref_career_stats_2016_Apr_15.pkl')
df = df[df['G']>50]
df_drop = df.drop(['Year','Name','G','GS','MP','FG','FGA','FG%','3P','2P','FT','TRB','PTS','ORtg','DRtg','PER','TS%','3PAr','FTr','ORB%','DRB%','TRB%','AST%','STL%','BLK%','TOV%','USG%','OWS','DWS','WS','WS/48','OBPM','DBPM','BPM','VORP'],1)
X = df_drop.as_matrix() #take data out of dataframe
ScaleModel = StandardScaler().fit(X) #make sure each feature has 0 mean and unit variance. 
X = ScaleModel.transform(X)

In the past I used k-means to group players according to their performance (see my post on grouping players for more info). Here, I use a gaussian mixture model (GMM) to group the players. I use the GMM model because it assigns each player a “soft” label rather than a “hard” label. By soft label I mean that a player simultaneously belongs to several groups. For instance, Russell Westbrook belongs to both my “point guard” group and my “scorers” group. K-means uses hard labels where each player can only belong to one group. I think the GMM model provides a more accurate representation of players, so I’ve decided to use it in this post. Maybe in a future post I will spend more time describing it.

For anyone wondering, the GMM groupings looked pretty similar to the k-means groupings.

1
2
3
4
5
6
7
8
9
10
11
12
13
from sklearn.mixture import GMM
from sklearn.decomposition import PCA

reduced_model = PCA(n_components=5, whiten=True).fit(X)
reduced_data = reduced_model.transform(X) #transform data into the 5 PCA components space

g = GMM(n_components=6).fit(reduced_data) #6 clusters. like the k-means model
new_labels = g.predict(reduced_data)

predictions = g.predict_proba(reduced_data) #generate values describing "how much" each player belongs to each group 
for x in np.unique(new_labels):
    Label = 'Category%d' % x
    df[Label] = predictions[:,x]

In this past I have attempted to predict win shares per 48 minutes. I am using win shares as a dependent variable again, but I want to categorize players.

Below I create a histogram of players’ win shares per 48.

I split players into 4 groups which I will refer to as “bad,” “below average,” “above average,” and “great”: Poor players are the bottom 10% in win shares per 48, Below average are the 10-50th percentiles, Above average and 50-90th percentiles, Great are the top 10%. This assignment scheme is relatively arbitrary; the model performs similarly with different assignment schemes.

1
2
3
4
5
6
7
8
plt.hist(df['WS/48']);
df['perf_cat'] = 0
df.loc[df['WS/48'] < np.percentile(df['WS/48'],10),'perf_cat'] = 1 #category 1 players are bottom 10%
df.loc[(df['WS/48'] < np.percentile(df['WS/48'],50)) & (df['WS/48'] >= np.percentile(df['WS/48'],10)),'perf_cat'] = 2
df.loc[(df['WS/48'] < np.percentile(df['WS/48'],90)) & (df['WS/48'] >= np.percentile(df['WS/48'],50)),'perf_cat'] = 3
df.loc[df['WS/48'] >= np.percentile(df['WS/48'],90),'perf_cat'] = 4 #category 4 players are top 10%
perc_in_cat = [np.mean(df['perf_cat']==x) for x in np.unique(df['perf_cat'])];
perc_in_cat #print % of palyers in each category as a sanity check
[0.096314496314496317,
 0.40196560196560199,
 0.39950859950859952,
 0.10221130221130222]

My goal is to use rookie year performance to classify players into these 4 categories. I have a big matrix with lots of data about rookie year performance, but the reason that I grouped player using the GMM is because I suspect that players in the different groups have different “paths” to success. I am including the groupings in my classification model and computing interaction terms. The interaction terms will allow rookie performance to produce different predictions for the different groups.

By including interaction terms, I include quite a few predictor features. I’ve printed the number of predictor features and the number of predicted players below.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
from sklearn import preprocessing

df_drop = df[df['Year']>1980]
for x in np.unique(new_labels):
    Label = 'Category%d' % x
    rookie_df_drop[Label] = df_drop[Label] #give rookies the groupings produced by the GMM model

X = rookie_df_drop.as_matrix() #take data out of dataframe   

poly = preprocessing.PolynomialFeatures(2,interaction_only=True) #create interaction terms.
X = poly.fit_transform(X)

Career_data = df[df['Year']>1980]
Y = Career_data['perf_cat'] #get predictor data
print(np.shape(X))
print(np.shape(Y))
(1703, 1432)
(1703,)

Now that I have all the features, it’s time to try and predict which players will be poor, below average, above average, and great. To create these predictions, I will use a logistic regression model.

Because I have so many predictors, correlation between predicting features and over-fitting the data are major concerns. I use regularization and cross-validation to combat these issues.

Specifically, I am using l2 regularization and k-fold 5 cross-validation. Within the cross-validation, I am trying to estimate how much regularization is appropriate.

Some important notes - I am using “balanced” weights which tells the model that worse to incorrectly predict the poor and great players than the below average and above average players. I do this because I don’t want the model to completely ignore the less frequent classifications. Second, I use the multi_class multinomial because it limits the number of models I have to fit.

1
2
3
4
5
6
7
8
9
from sklearn import linear_model
from sklearn.metrics import accuracy_score

logreg = linear_model.LogisticRegressionCV(Cs=[0.0008], cv=5, penalty='l2',n_jobs=-1, class_weight='balanced',
                                           max_iter=15000, multi_class='multinomial')

est = logreg.fit(X, Y)
score = accuracy_score(Y,est.predict(X)) #calculate the % correct 
print(score)
0.738109219025

Okay, the model did pretty well, but lets look at where the errors are coming from. To visualize the models accuracy, I am using a confusion matrix. In a confusion matrix, every item on the diagnonal is a correctly classified item. Every item off the diagonal is incorrectly classified. The color bar’s axis is the percent correct. So the dark blue squares represent cells with more items.

It seems the model is best at predicting poor players and great players. It makes more errors when trying to predict the more average players.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(Y, est.predict(X))

def plot_confusion_matrix(cm, title='Confusion matrix', cmap=plt.cm.Blues):
    plt.imshow(cm, interpolation='nearest', cmap=cmap,vmin=0.0, vmax=1.0)
    plt.title(title)
    plt.colorbar()
    tick_marks = np.arange(len(np.unique(df['perf_cat'])))
    plt.xticks(tick_marks, np.unique(df['perf_cat']))
    plt.yticks(tick_marks, np.unique(df['perf_cat']))
    plt.tight_layout()
    plt.ylabel('True label')
    plt.xlabel('Predicted label')

cm_normalized = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
plot_confusion_matrix(cm_normalized, title='Normalized confusion matrix')

Lets look at what the model predicts for this year’s rookies. Below I modified two functions that I wrote for a previous post. The first function finds a particular year’s draft picks. The second function produces predictions for each draft pick.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
def gather_draftData(Year):

    import urllib2
    from bs4 import BeautifulSoup
    import pandas as pd
    import numpy as np

    draft_len = 30

    def convert_float(val):
        try:
            return float(val)
        except ValueError:
            return np.nan

    url = 'http://www.basketball-reference.com/draft/NBA_'+str(Year)+'.html'
    html = urllib2.urlopen(url)
    soup = BeautifulSoup(html,"lxml")

    draft_num = [soup.findAll('tbody')[0].findAll('tr')[i].findAll('td')[0].text for i in range(draft_len)]
    draft_nam = [soup.findAll('tbody')[0].findAll('tr')[i].findAll('td')[3].text for i in range(draft_len)]

    draft_df = pd.DataFrame([draft_num,draft_nam]).T
    draft_df.columns = ['Number','Name']
    df.index = range(np.size(df,0))
    return draft_df

def player_prediction__regressionModel(PlayerName):

    clust_df = pd.read_pickle('nba_bballref_career_stats_2016_Apr_15.pkl')
    clust_df = clust_df[clust_df['Name']==PlayerName]
    clust_df = clust_df.drop(['Year','Name','G','GS','MP','FG','FGA','FG%','3P','2P','FT','TRB','PTS','ORtg','DRtg','PER','TS%','3PAr','FTr','ORB%','DRB%','TRB%','AST%','STL%','BLK%','TOV%','USG%','OWS','DWS','WS','WS/48','OBPM','DBPM','BPM','VORP'],1)
    new_vect = ScaleModel.transform(clust_df.as_matrix().reshape(1,-1))
    reduced_data = reduced_model.transform(new_vect)
    predictions = g.predict_proba(reduced_data)
    for x in np.unique(new_labels):
        Label = 'Category%d' % x
        clust_df[Label] = predictions[:,x]

    Predrookie_df = pd.read_pickle('nba_bballref_rookie_stats_2016_Apr_16.pkl')
    Predrookie_df = Predrookie_df[Predrookie_df['Name']==PlayerName]
    Predrookie_df = Predrookie_df.drop(['Year','Career Games','Name'],1)
    for x in np.unique(new_labels):
        Label = 'Category%d' % x
        Predrookie_df[Label] = clust_df[Label] #give rookies the groupings produced by the GMM model
    predX = Predrookie_df.as_matrix() #take data out of dataframe
    predX = poly.fit_transform(predX)
    predictions2 = est.predict_proba(predX)
    return {'Name':PlayerName,'Group':predictions,'Prediction':predictions2[0]}

Below I create a plot depicting the model’s predictions. On the y-axis are the four classifications. On the x-axis are the players from the 2015 draft. Each cell in the plot is the probability of a player belonging to one of the classifications. Again, dark blue means a cell or more likely. Good news for us T-Wolves fans! The model loves KAT.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
draft_df = gather_draftData(2015)

draft_df['Name'][14] =  'Kelly Oubre Jr.' #annoying name inconsistencies 

plt.subplots(figsize=(14,6));

draft_df = draft_df.drop(25, 0) #spurs' 1st round pick has not played yet

predictions = []
for name in draft_df['Name']:
    draft_num = draft_df[draft_df['Name']==name]['Number']
    predict_dict = player_prediction__regressionModel(name)
    predictions.append(predict_dict['Prediction'])

plt.imshow(np.array(predictions).T, interpolation='nearest', cmap=plt.cm.Blues,vmin=0.0, vmax=1.0)
plt.title('Predicting Future Performance of 2015-16 Rookies')
plt.colorbar(shrink=0.25)
tick_marks = np.arange(len(np.unique(df['perf_cat'])))
plt.xticks(range(0,29),draft_df['Name'],rotation=90)
plt.yticks(range(0,4), ['Poor','Below Average','Above Average','Great'])
plt.tight_layout()
plt.ylabel('Prediction')
plt.xlabel('Draft Position');

Creating Videos of NBA Action With Sportsvu Data

All basketball teams have a camera system called SportVU installed in their arenas. These camera systems track players and the ball throughout a basketball game.

The data produced by sportsvu camera systems used to be freely available on NBA.com, but was recently removed (I have no idea why). Luckily, the data for about 600 games are available on neilmj’s github. In this post, I show how to create a video recreation of a given basketball play using the sportsvu data.

This code is also available as a jupyter notebook on my github.

1
2
3
4
5
#import some libraries
import matplotlib.pyplot as plt, pandas as pd, numpy as np, matplotlib as mpl
from __future__ import print_function

mpl.rcParams['font.family'] = ['Bitstream Vera Sans']

The data is provided as a json. Here’s how to import the python json library and load the data. I’m a T-Wolves fan, so the game I chose is a wolves game.

1
2
3
import json #import json library
json_data = open('/home/dan-laptop/github/BasketballData/2016.NBA.Raw.SportVU.Game.Logs/0021500594.json') #import the data from wherever you saved it.
data = json.load(json_data) #load the data

Let’s take a quick look at the data. It’s a dictionary with three keys: gamedate, gameid, and events. Gamedate and gameid are the date of this game and its specific id number, respectively. Events is the structure with data we’re interested in.

1
data.keys()
[u'gamedate', u'gameid', u'events']

Lets take a look at the first event. The first event has an associated eventid number. We will use these later. There’s also data for each player on the visiting and home team. We will use these later too. Finally, and most importantly, there’s the “moments.” There are 25 moments for each second of the “event” (the data is sampled at 25hz).

1
data['events'][0].keys()
[u'eventId', u'visitor', u'moments', u'home']

Here’s the first moment of the first event. The first number is the quarter. The second number is the time of the event in milliseconds. The third number is the number of seconds left in the quarter (the 1st quarter hasn’t started yet, so 12 * 60 = 720). The fourth number is the number of seconds left on the shot clock. I am not sure what fourth number (None) represents.

The final matrix is 11x5 matrix. The first row describes the ball. The first two columns are the teamID and the playerID of the ball (-1 for both because the ball does not belong to a team and is not a player). The 3rd and 4th columns are xy coordinates of the ball. The final column is the height of the ball (z coordinate).

The next 10 rows describe the 10 players on the court. The first 5 players belong to the home team and the last 5 players belong to the visiting team. Each player has his teamID, playerID, xy&z coordinates (although I don’t think players’ z coordinates ever change).

1
data['events'][0]['moments'][0]
[1,
 1452903036782,
 720.0,
 24.0,
 None,
 [[-1, -1, 44.16456, 26.34142, 5.74423],
  [1610612760, 201142, 45.46259, 32.01456, 0.0],
  [1610612760, 201566, 10.39347, 24.77219, 0.0],
  [1610612760, 201586, 25.86087, 25.55881, 0.0],
  [1610612760, 203460, 47.28525, 17.76225, 0.0],
  [1610612760, 203500, 43.68634, 26.63098, 0.0],
  [1610612750, 708, 55.6401, 25.55583, 0.0],
  [1610612750, 2419, 47.95942, 31.66328, 0.0],
  [1610612750, 201937, 67.28725, 25.10267, 0.0],
  [1610612750, 203952, 47.28525, 17.76225, 0.0],
  [1610612750, 1626157, 49.46814, 24.24193, 0.0]]]

Alright, so we have the sportsvu data, but its not clear what each event is. Luckily, the NBA also provides play by play (pbp) data. I write a function for acquiring play by play game data. This function collects (and trims) the play by play data for a given sportsvu data set.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
def acquire_gameData(data):
    import requests
    header_data = { #I pulled this header from the py goldsberry library
        'Accept-Encoding': 'gzip, deflate, sdch',
        'Accept-Language': 'en-US,en;q=0.8',
        'Upgrade-Insecure-Requests': '1',
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64)'\
        ' AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.82 '\
        'Safari/537.36',
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9'\
        ',image/webp,*/*;q=0.8',
        'Cache-Control': 'max-age=0',
        'Connection': 'keep-alive'
    }
    game_url = 'http://stats.nba.com/stats/playbyplayv2?EndPeriod=0&EndRange=0&GameID='+data['gameid']+\
                '&RangeType=0&StartPeriod=0&StartRange=0' #address for querying the data
    response = requests.get(game_url,headers = header_data) #go get the data
    headers = response.json()['resultSets'][0]['headers'] #get headers of data
    gameData = response.json()['resultSets'][0]['rowSet'] #get actual data from json object
    df = pd.DataFrame(gameData, columns=headers) #turn the data into a pandas dataframe
    df = df[[df.columns[1], df.columns[2],df.columns[7],df.columns[9],df.columns[18]]] #there's a ton of data here, so I trim  it doown
    df['TEAM'] = df['PLAYER1_TEAM_ABBREVIATION']
    df = df.drop('PLAYER1_TEAM_ABBREVIATION', 1)
    return df

Below I show what the play by play data looks like. There’s a column for event number (eventnum). These event numbers match up with the event numbers from the sportsvu data, so we will use this later for seeking out specific plays in the sportsvu data. There’s a column for the event type (eventmsgtype). This column has a number describing what occured in the play. I list these number codes in the comments below.

There’s also short text descriptions of the plays in the home description and visitor description columns. Finally, I use the team column to represent the primary team involved in a play.

I stole the idea of using play by play data from Raji Shah.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
df = acquire_gameData(data)
df.head()
#EVENTMSGTYPE
#1 - Make 
#2 - Miss 
#3 - Free Throw 
#4 - Rebound 
#5 - out of bounds / Turnover / Steal 
#6 - Personal Foul 
#7 - Violation 
#8 - Substitution 
#9 - Timeout 
#10 - Jumpball 
#12 - Start Q1? 
#13 - Start Q2?
EVENTNUM EVENTMSGTYPE HOMEDESCRIPTION VISITORDESCRIPTION TEAM
0 0 12 None None None
1 1 10 Jump Ball Adams vs. Towns: Tip to Ibaka None OKC
2 2 5 Westbrook Out of Bounds Lost Ball Turnover (P1... None OKC
3 3 2 None MISS Wiggins 16' Jump Shot MIN
4 4 4 Westbrook REBOUND (Off:0 Def:1) None OKC

When viewing the videos, its nice to know what players are on the court. I like to depict this by labeling each player with their number. Here I create a dictionary that contains each player’s id number (these are assigned by nba.com) as the key and their jersey number as the associated value.

1
2
3
4
5
player_fields = data['events'][0]['home']['players'][0].keys()
home_players = pd.DataFrame(data=[i for i in data['events'][0]['home']['players']], columns=player_fields)
away_players = pd.DataFrame(data=[i for i in data['events'][0]['visitor']['players']], columns=player_fields)
players = pd.merge(home_players, away_players, how='outer')
jerseydict = dict(zip(players.playerid.values, players.jersey.values))

Alright, almost there! Below I write some functions for creating the actual video! First, there’s a short function for placing an image of the basketball court beneath our depiction of players moving around. This image is from gmf05’s github, but I will provide it on mine too.

Much of this code is either straight from gmf05’s github or slightly modified.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# Animation function / loop
def draw_court(axis):
    import matplotlib.image as mpimg
    img = mpimg.imread('./nba_court_T.png') #read image. I got this image from gmf05's github.
    plt.imshow(img,extent=axis, zorder=0) #show the image. 

def animate(n): #matplotlib's animation function loops through a function n times that draws a different frame on each iteration
    for i,ii in enumerate(player_xy[n]): #loop through all the players
        player_circ[i].center = (ii[1], ii[2]) #change each players xy position
        player_text[i].set_text(str(jerseydict[ii[0]])) #draw the text for each player. 
        player_text[i].set_x(ii[1]) #set the text x position
        player_text[i].set_y(ii[2]) #set text y position
    ball_circ.center = (ball_xy[n,0],ball_xy[n,1]) #change ball xy position
    ball_circ.radius = 1.1 #i could change the size of the ball according to its height, but chose to keep this constant
    return tuple(player_text) + tuple(player_circ) + (ball_circ,)

def init(): #this is what matplotlib's animation will create before drawing the first frame. 
    for i in range(10): #set up players
        player_text[i].set_text('')
        ax.add_patch(player_circ[i])
    ax.add_patch(ball_circ) #create ball
    ax.axis('off') #turn off axis
    dx = 5
    plt.xlim([0-dx,100+dx]) #set axis
    plt.ylim([0-dx,50+dx])
    return tuple(player_text) + tuple(player_circ) + (ball_circ,)

The event that I want to depict is event 41. In this event, Karl Anthony Towns misses a shot, grabs his own rebounds, and puts it back in.

1
df[37:38]
EVENTNUM EVENTMSGTYPE HOMEDESCRIPTION VISITORDESCRIPTION TEAM
37 41 1 None Towns 1' Layup (2 PTS) MIN

We need to find where event 41 is in the sportsvu data structure, so I created a function for finding the location of a particular event. I then create a matrix with position data for the ball and a matrix with position data for each player for event 41.

1
2
3
4
5
6
7
8
9
10
11
12
#the order of events does not match up, so we have to use the eventIds. This loop finds the correct event for a given id#.
search_id = 41
def find_moment(search_id):
    for i,events in enumerate(data['events']):
        if events['eventId'] == str(search_id):
            finder = i
            break
    return finder

event_num = find_moment(search_id)
ball_xy = np.array([x[5][0][2:5] for x in data['events'][event_num]['moments']]) #create matrix of ball data
player_xy = np.array([np.array(x[5][1:])[:,1:4] for x in data['events'][event_num]['moments']]) #create matrix of player data

Okay. We’re actually there! Now we get to create the video. We have to create figure and axes objects for the animation to draw on. Then I place a picture of the basketball court on this plot. Finally, I create the circle and text objects that will move around throughout the video (depicting the ball and players). The location of these objects are then updated in the animation loop.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import matplotlib.animation as animation

fig = plt.figure(figsize=(15,7.5)) #create figure object
ax = plt.gca() #create axis object

draw_court([0,100,0,50]) #draw the court
player_text = range(10) #create player text vector
player_circ = range(10) #create player circle vector
ball_circ = plt.Circle((0,0), 1.1, color=[1, 0.4, 0]) #create circle object for bal
for i in range(10): #create circle object and text object for each player
    col=['w','k'] if i<5 else ['k','w'] #color scheme
    player_circ[i] = plt.Circle((0,0), 2.2, facecolor=col[0],edgecolor='k') #player circle
    player_text[i] = ax.text(0,0,'',color=col[1],ha='center',va='center') #player jersey # (text)

ani = animation.FuncAnimation(fig, animate, frames=np.arange(0,np.size(ball_xy,0)), init_func=init, blit=True, interval=5, repeat=False,\
                             save_count=0) #function for making video
ani.save('Event_%d.mp4' % (search_id),dpi=100,fps=25) #function for saving video
plt.close('all') #close the plot

I’ve been told this video does not work for all users. I’ve also posted it on youtube.

NBA Shot Charts: Updated

For some reason I recently got it in my head that I wanted to go back and create more NBA shot charts. My previous shotcharts used colored circles to depict the frequency and effectiveness of shots at different locations. This is an extremely efficient method of representing shooting profiles, but I thought it would be fun to create shot charts that represent a player’s shooting profile continously across the court rather than in discrete hexagons.

By depicting the shooting data continously, I lose the ability to represent one dimenion - I can no longer use the size of circles to depict shot frequency at a location. Nonetheless, I thought it would be fun to create these charts.

I explain how to create them below. I’ve also included the ability to compare a player’s shooting performance to the league average.

In my previous shot charts, I query nba.com’s API when creating a players shot chart, but querying nba.com’s API for every shot taken in 2015-16 takes a little while (for computing league average), so I’ve uploaded this data to my github and call the league data as a file rather than querying nba.com API.

This code is also available as a jupyter notebook on my github.

1
2
3
#import some libraries and tell ipython we want inline figures rather than interactive figures. 
%matplotlib inline
import matplotlib.pyplot as plt, pandas as pd, numpy as np, matplotlib as mpl

Here, I create a function for querying shooting data from NBA.com’s API. This is the same function I used in my previous post regarding shot charts.

You can find a player’s ID number by going to the players nba.com page and looking at the page address. There is a python library that you can use for querying player IDs (and other data from the nba.com API), but I’ve found this library to be a little shaky.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
def aqcuire_shootingData(PlayerID,Season):
    import requests
    header_data = { #I pulled this header from the py goldsberry library
        'Accept-Encoding': 'gzip, deflate, sdch',
        'Accept-Language': 'en-US,en;q=0.8',
        'Upgrade-Insecure-Requests': '1',
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64)'\
        ' AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.82 '\
        'Safari/537.36',
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9'\
        ',image/webp,*/*;q=0.8',
        'Cache-Control': 'max-age=0',
        'Connection': 'keep-alive'
    }
    shot_chart_url = 'http://stats.nba.com/stats/shotchartdetail?CFID=33&CFPARAMS='+Season+'&ContextFilter='\
                    '&ContextMeasure=FGA&DateFrom=&DateTo=&GameID=&GameSegment=&LastNGames=0&LeagueID='\
                    '00&Location=&MeasureType=Base&Month=0&OpponentTeamID=0&Outcome=&PaceAdjust='\
                    'N&PerMode=PerGame&Period=0&PlayerID='+PlayerID+'&PlusMinus=N&Position=&Rank='\
                    'N&RookieYear=&Season='+Season+'&SeasonSegment=&SeasonType=Regular+Season&TeamID='\
                    '0&VsConference=&VsDivision=&mode=Advanced&showDetails=0&showShots=1&showZones=0'
    response = requests.get(shot_chart_url,headers = header_data)
    headers = response.json()['resultSets'][0]['headers']
    shots = response.json()['resultSets'][0]['rowSet']
    shot_df = pd.DataFrame(shots, columns=headers)
    return shot_df

Create a function for drawing the nba court. This function was taken directly from Savvas Tjortjoglou’s post on shot charts.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
def draw_court(ax=None, color='black', lw=2, outer_lines=False):
    from matplotlib.patches import Circle, Rectangle, Arc
    if ax is None:
        ax = plt.gca()
    hoop = Circle((0, 0), radius=7.5, linewidth=lw, color=color, fill=False)
    backboard = Rectangle((-30, -7.5), 60, -1, linewidth=lw, color=color)
    outer_box = Rectangle((-80, -47.5), 160, 190, linewidth=lw, color=color,
                          fill=False)
    inner_box = Rectangle((-60, -47.5), 120, 190, linewidth=lw, color=color,
                          fill=False)
    top_free_throw = Arc((0, 142.5), 120, 120, theta1=0, theta2=180,
                         linewidth=lw, color=color, fill=False)
    bottom_free_throw = Arc((0, 142.5), 120, 120, theta1=180, theta2=0,
                            linewidth=lw, color=color, linestyle='dashed')
    restricted = Arc((0, 0), 80, 80, theta1=0, theta2=180, linewidth=lw,
                     color=color)
    corner_three_a = Rectangle((-220, -47.5), 0, 140, linewidth=lw,
                               color=color)
    corner_three_b = Rectangle((220, -47.5), 0, 140, linewidth=lw, color=color)
    three_arc = Arc((0, 0), 475, 475, theta1=22, theta2=158, linewidth=lw,
                    color=color)
    center_outer_arc = Arc((0, 422.5), 120, 120, theta1=180, theta2=0,
                           linewidth=lw, color=color)
    center_inner_arc = Arc((0, 422.5), 40, 40, theta1=180, theta2=0,
                           linewidth=lw, color=color)
    court_elements = [hoop, backboard, outer_box, inner_box, top_free_throw,
                      bottom_free_throw, restricted, corner_three_a,
                      corner_three_b, three_arc, center_outer_arc,
                      center_inner_arc]
    if outer_lines:
        outer_lines = Rectangle((-250, -47.5), 500, 470, linewidth=lw,
                                color=color, fill=False)
        court_elements.append(outer_lines)

    for element in court_elements:
        ax.add_patch(element)

    ax.set_xticklabels([])
    ax.set_yticklabels([])
    ax.set_xticks([])
    ax.set_yticks([])
    return ax

Write a function for acquiring each player’s picture. This isn’t essential, but it makes things look nicer. This function takes a playerID number and the amount to zoom in on an image as the inputs. It by default places the image at the location 500,500.

1
2
3
4
5
6
7
8
9
def acquire_playerPic(PlayerID, zoom, offset=(500,500)):
    from matplotlib import  offsetbox as osb
    import urllib
    pic = urllib.urlretrieve("http://stats.nba.com/media/players/230x185/"+PlayerID+".png",PlayerID+".png")
    player_pic = plt.imread(pic[0])
    img = osb.OffsetImage(player_pic, zoom)
    #img.set_offset(offset)
    img = osb.AnnotationBbox(img, offset,xycoords='data',pad=0.0, box_alignment=(1,0), frameon=False)
    return img

Here is where things get a little complicated. Below I write a function that divides the shooting data into a 25x25 matrix. Each shot taken within the xy coordinates encompassed by a given bin counts towards the shot count in that bin. In this way, the method I am using here is very similar to my previous hexbins (circles). So the difference just comes down to I present the data rather than how I preprocess it.

This function takes a dataframe with a vector of shot locations in the X plane, a vector with shot locations in the Y plane, a vector with shot type (2 pointer or 3 pointer), and a vector with ones for made shots and zeros for missed shots. The function by default bins the data into a 25x25 matrix, but the number of bins is editable. The 25x25 bins are then expanded to encompass a 500x500 space.

The output is a dictionary containing matrices for shots made, attempted, and points scored in each bin location. The dictionary also has the player’s ID number.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
def shooting_matrices(df,bins=25):
    from math import floor

    df['SHOT_TYPE2'] = [int(x[0][0]) for x in df['SHOT_TYPE']] #create a vector with whether the shot is a 2 or 3 pointer
    points_matrix = np.zeros((bins,bins)) #create a matrix to fill with shooting data. 

    shot_attempts, xtest, ytest, p = plt.hist2d(df[df['LOC_Y']<425.1]['LOC_X'], #use histtd to bin the data. These are attempts
                                                df[df['LOC_Y']<425.1]['LOC_Y'],
                                                bins=bins,range=[[-250,250],[-25,400]]); #i limit the range of the bins because I don't care about super far away shots and I want the bins standardized across players
    plt.close()

    shot_made, xtest2, ytest2, p = plt.hist2d(df[(df['LOC_Y']<425.1) & (df['SHOT_MADE_FLAG']==1)]['LOC_X'], #again use hist 2d to bin made shots
                                    df[(df['LOC_Y']<425.1) & (df['SHOT_MADE_FLAG']==1)]['LOC_Y'],
                                    bins=bins,range=[[-250,250],[-25,400]]);
    plt.close()
    differy = np.diff(ytest)[0] #get the leading yedge
    differx = np.diff(xtest)[0] #get the leading xedge
    for i,(x,y) in enumerate(zip(df['LOC_X'],df['LOC_Y'])):
        if x >= 250 or x <= -250 or y <= -25.1 or y >= 400: continue
        points_matrix[int(floor(np.divide(x+250,differx))),int(floor(np.divide(y+25,differy)))] += np.float(df['SHOT_MADE_FLAG'][i]*df['SHOT_TYPE2'][i])
        #loop through all the shots and tally the points made in each bin location.

    shot_attempts = np.repeat(shot_attempts,500/bins,axis=0) #repeat the shot attempts matrix so that it fills all xy points
    shot_attempts = np.repeat(shot_attempts,500/bins,axis=1)
    shot_made = np.repeat(shot_made,500/bins,axis=0) #repeat shot made so that it fills all xy points (rather than just bin locations)
    shot_made = np.repeat(shot_made,500/bins,axis=1)
    points_matrix = np.repeat(points_matrix,500/bins,axis=0) #again repeat with points
    points_matrix = np.repeat(points_matrix,500/bins,axis=1)
    return {'attempted':shot_attempts,'made':shot_made,'points':points_matrix,'id':str(np.unique(df['PLAYER_ID'])[0])}

Below I load the league average data. I also have the code that I used to originally download the data and to preprocess it.

1
2
3
4
5
6
7
8
9
import pickle

#df = aqcuire_shootingData('0','2015-16') #here is how I acquired data about every shot taken in 2015-16
#df2 = pd.read_pickle('nba_shots_201516_2016_Apr_27.pkl') #here is how you can read all the league shot data
#league_shotDict = shooting_matrices(df2) #turn the shot data into the shooting matrix
#pickle.dump(league_shotDict, open('league_shotDictionary_2016.pkl', 'wb' )) #save the data

#I should make it so this is the plot size by default, but people can change it if they want. this would be slower.
league_shotDict = pickle.load(open('league_shotDictionary_2016.pkl', 'rb' )) #read in the a precreated shot chart for the entire league

I really like playing with the different color maps, so here is a new color map I created for these shot charts.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
cmap = plt.cm.CMRmap_r #start with the CMR map in reverse. 

maxer = 0.6 #max value to take in the CMR map

the_range = np.arange(0,maxer+0.1,maxer/4) #divide the map into 4 values
the_range2 = [0.0,0.25,0.5,0.75,1.0] #or use these values

mapper = [cmap(x) for x in the_range] #grab color values for this dictionary
cdict = {'red':[],'green':[],'blue':[]} #fill teh values into a color dictionary
for item,place in zip(mapper,the_range2):
    cdict['red'].append((place,item[0], item[0]))
    cdict['green'].append((place,item[1],item[1]))
    cdict['blue'].append((place,item[2],item[2]))

mymap = mpl.colors.LinearSegmentedColormap('my_colormap', cdict, 1024) #linearly interpolate between color values

Below, I write a function for creating the nba shot charts. The function takes a dictionary with martrices for shots attempted, made, and points scored. The matrices should be 500x500. By default, the shot chart depicts the number of shots taken across locations, but it can also depict the number of shots made, field goal percentage, and point scored across locations.

The function uses a gaussian kernel with standard deviation of 5 to smooth the data (make it look pretty). Again, this is editable. By default the function plots a players raw data, but it will plot how a player compares to league average if the input includes a matrix of league average data.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
def create_shotChart(shotDict,fig_type='attempted',smooth=5,league_shotDict=[],mymap=mymap):
    from scipy.ndimage.filters import gaussian_filter

    if fig_type == 'fg': #how to treat the data if depicting fg percentage
        interest_measure = shotDict['made']/shotDict['attempted']
        #interest_measure[np.isnan(interest_measure)] = np.nanmean(interest_measure)
        #interest_measure = np.nan_to_num(interest_measure) #replace places where divide by 0 with a 0
    else:
        interest_measure = shotDict[fig_type] #else take the data from dictionary. 

    if league_shotDict: #if we have league data, we have to select the relevant league data. 
        if fig_type == 'fg':
            league = league_shotDict['made']/league_shotDict['attempted']
            league = np.nan_to_num(league)
            interest_measure[np.isfinite(interest_measure)] += -league[np.isfinite(interest_measure)] #compare league data and invidual player's data
            interest_measure = np.nan_to_num(interest_measure) #replace places where divide by 0 with a 0
            maxer = 0 + 1.5*np.std(interest_measure) #min and max values for color map
            minner = 0- 1.5*np.std(interest_measure)
        else:
            player_percent = interest_measure/np.sum([x[::20] for x in player_shotDict[fig_type][::20]]) #standardize data before comparing
            league_percent = league_shotDict[fig_type]/np.sum([x[::20] for x in league_shotDict[fig_type][::20]]) #standardize league data
            interest_measure = player_percent-league_percent #compare league and individual data
            maxer = np.mean(interest_measure) + 1.5*np.std(interest_measure) #compute max and min values for color map
            minner = np.mean(interest_measure) - 1.5*np.std(interest_measure)

        cmap = 'bwr' #use bwr color map if comparing to league average
        label = ['<Avg','Avg', '>Avg'] #color map legend label

    else:
        cmap = mymap #else use my color map
        interest_measure = np.nan_to_num(interest_measure) #replace places where divide by 0 with a 0
        maxer = np.mean(interest_measure) + 1.5*np.std(interest_measure) #compute max for colormap
        minner = 0
        label = ['Less','','More'] #color map legend label

    ppr_smooth = gaussian_filter(interest_measure,smooth) #smooth the data

    fig = plt.figure(figsize=(12,7),frameon=False)#(12,7)
    ax = fig.add_axes([0.1, 0.1, 0.8, 0.8]) #where to place the plot within the figure
    draw_court(outer_lines=False) #draw court
    ax.set_xlim(-250,250)
    ax.set_ylim(400, -25)

    ax2 = fig.add_axes(ax.get_position(), frameon=False)

    colrange = mpl.colors.Normalize(vmin=minner, vmax=maxer, clip=False) #standardize color range
    ax2.imshow(ppr_smooth.T,cmap=cmap,norm=colrange,alpha=0.7,aspect='auto') #plot data
    ax2.set_xticklabels([])
    ax2.set_yticklabels([])
    ax2.set_xticks([])
    ax2.set_xlim(0, 500)
    ax2.set_ylim(500, 0)
    ax2.set_yticks([]);

    ax3 = fig.add_axes([0.92, 0.1, 0.02, 0.8]) #place colormap legend
    cb = mpl.colorbar.ColorbarBase(ax3,cmap=cmap, orientation='vertical')
    if fig_type == 'fg': #colormap label
        cb.set_label('Field Goal Percentage')
    else:
        cb.set_label('Shots '+fig_type)

    cb.set_ticks([0,0.5,1.0])
    ax3.set_yticklabels(label,rotation=45);

    zoom = np.float(12)/(12.0*2) #place player pic
    img = acquire_playerPic(player_shotDict['id'], zoom)
    ax2.add_artist(img)

    plt.show()
    return ax

Alright, thats that. Now lets create some plots. I am a t-wolves fan, so I will plot data from Karl Anthony Towns.

First, here is the default plot - attempts.

1
2
3
df = aqcuire_shootingData('1626157','2015-16')
player_shotDict = shooting_matrices(df)
create_shotChart(player_shotDict);

Here’s KAT’s shots made

1
2
3
df = aqcuire_shootingData('1626157','2015-16')
player_shotDict = shooting_matrices(df)
create_shotChart(player_shotDict,fig_type='made');

Here’s field goal percentage. I don’t like this one too much. It’s hard to use similar scales for attempts and field goal percentage even though I’m using standard deviations rather than absolute scales.

1
2
3
df = aqcuire_shootingData('1626157','2015-16')
player_shotDict = shooting_matrices(df)
create_shotChart(player_shotDict, fig_type='fg');

Here’s points across the court.

1
2
3
df = aqcuire_shootingData('1626157','2015-16')
player_shotDict = shooting_matrices(df)
create_shotChart(player_shotDict, fig_type='points');

Here’s how KAT’s attempts compare to the league average. You can see the twolve’s midrange heavy offense.

1
2
3
df = aqcuire_shootingData('1626157','2015-16')
player_shotDict = shooting_matrices(df)
create_shotChart(player_shotDict, league_shotDict=league_shotDict);

How KAT’s shots made compares to league average.

1
2
3
df = aqcuire_shootingData('1626157','2015-16')
player_shotDict = shooting_matrices(df)
create_shotChart(player_shotDict, fig_type='made',league_shotDict=league_shotDict);

How KAT’s field goal percentage compares to league average. Again, the scale on these is not too good.

1
2
3
df = aqcuire_shootingData('1626157','2015-16')
player_shotDict = shooting_matrices(df)
create_shotChart(player_shotDict, fig_type='fg',league_shotDict=league_shotDict);

And here is how KAT’s points compare to league average.

1
2
3
df = aqcuire_shootingData('1626157','2015-16')
player_shotDict = shooting_matrices(df)
create_shotChart(player_shotDict, fig_type='points',league_shotDict=league_shotDict);

An Introduction to Neural Networks: Part 2

In a previous post, I described how to do backpropogation with a 1-layer neural network. I’ve written this post assuming some familiarity with the previous post.

When first created, 1-layer neural networks brought about quite a bit of excitement, but this excitement quickly dissipated when researchers realized that 1-layer neural networks could only solve a limited set of problems.

Researchers knew that adding an extra layer to the neural networks enabled neural networks to solve much more complex problems, but they didn’t know how to train these more complex networks.

In the previous post, I described “backpropogation,” but this wasn’t the portion of backpropogation that really changed the history of neural networks. What really changed neural networks is backpropogation with an extra layer. This extra layer enabled researchers to train more complex networks. The extra layer(s) is(are) called the hidden layer(s). In this post, I will describe backpropogation with a hidden layer.

To describe backpropogation with a hidden layer, I will demonstrate how neural networks can solve the XOR problem.

In this example of the XOR problem there are four items. Each item is defined by two values. If these two values are the same, then the item belongs to one group (blue here). If the two values are different, then the item belongs to another group (red here).

Below, I have depicted the XOR problem. The goal is to find a model that can distinguish between the blue and red groups based on an item’s values.

This code is also available as a jupyter notebook on my github.

1
2
3
4
5
6
7
8
9
10
import numpy as np #import important libraries.
from matplotlib import pyplot as plt
import pandas as pd
%matplotlib inline

plt.plot([0,1],[0,1],'bo')
plt.plot([0,1],[1,0],'ro')
plt.ylabel('Value 2')
plt.xlabel('Value 1')
plt.axis([-0.5,1.5,-0.5,1.5]);

Again, each item has two values. An item’s first value is represented on the x-axis. An items second value is represented on the y-axis. The red items belong to one category and the blue items belong to another.

This is a non-linear problem because no linear function can segregate the groups. For instance, a horizontal line could segregate the upper and lower items and a vertical line could segregate the left and right items, but no single linear function can segregate the red and blue items.

We need a non-linear function to seperate the groups, and neural networks can emulate a non-linear function that segregates them.

While this problem may seem relatively simple, it gave the initial neural networks quite a hard time. In fact, this is the problem that depleted much of the original enthusiasm for neural networks.

Neural networks can easily solve this problem, but they require an extra layer. Below I depict a network with an extra layer (a 2-layer network). To depict the network, I use a repository available on my github.

1
2
3
4
5
6
7
from visualise_neural_network import NeuralNetwork

network = NeuralNetwork() #create neural network object
network.add_layer(2,['Input 1','Input 2']) #input layer with names
network.add_layer(2,['Hidden 1','Hidden 2']) #hidden layer with names
network.add_layer(1,['Output']) #output layer with name
network.draw()

Notice that this network now has 5 total neurons. The two units at the bottom are the input layer. The activity of input units is the value of the inputs (same as the inputs in my previous post). The two units in the middle are the hidden layer. The activity of hidden units are calculated in the same manner as the output units from my previous post. The unit at the top is the output layer. The activity of this unit is found in the same manner as in my previous post, but the activity of the hidden units replaces the input units.

Thus, when the neural network makes its guess, the only difference is we have to compute an extra layer’s activity.

The goal of this network is for the output unit to have an activity of 0 when presented with an item from the blue group (inputs are same) and to have an activity of 1 when presented with an item from the red group (inputs are different).

One additional aspect of neural networks that I haven’t discussed is each non-input unit can have a bias. You can think about bias as a propensity for the unit to become active or not to become active. For instance, a unit with a postitive bias is more likely to be active than a unit with no bias.

I will implement bias as an extra line feeding into each unit. The weight of this line is the bias, and the bias line is always active, meaning this bias is always present.

Below, I seed this 3-layer neural network with a random set of weights.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
np.random.seed(seed=10) #seed random number generator for reproducibility

Weights_2 = np.random.rand(1,3)-0.5*2 #connections between hidden and output
Weights_1 = np.random.rand(2,3)-0.5*2 #connections between input and hidden

Weight_Dict = {'Weights_1':Weights_1,'Weights_2':Weights_2} #place weights in a dictionary

Train_Set = [[1.0,1.0],[0.0,0.0],[0.0,1.0],[1.0,0.0]] #train set

network = NeuralNetwork()
network.add_layer(2,['Input 1','Input 2'],
                  [[round(x,2) for x in Weight_Dict['Weights_1'][0][:2]],
                   [round(x,2) for x in Weight_Dict['Weights_1'][1][:2]]])
#add input layer with names and weights leaving the input neurons
network.add_layer(2,[round(Weight_Dict['Weights_1'][0][2],2),round(Weight_Dict['Weights_1'][1][2],2)],
                  [round(x,2) for x in Weight_Dict['Weights_2'][0][:2]])
#add hidden layer with names (each units' bias) and weights leaving the hidden units
network.add_layer(1,[round(Weight_Dict['Weights_2'][0][2],2)])
#add output layer with name (the output unit's bias)
network.draw()

Above we have out network. The depiction of and are confusing. -0.8 belongs to . -0.5 belongs to .

Lets go through one example of our network receiving an input and making a guess. Lets say the input is [0 1]. This means and . The correct answer in this case is 1.

First, we have to calculate ’s input. Remember we can write input as

with the a bias we can rewrite it as

Specifically for

Remember the first term in the equation above is the bias term. Lets see what this looks like in code.

1
2
3
Input = np.array([0,1])
net_Hidden = np.dot(np.append(Input,1.0),Weights_1.T) #append the bias input
print net_Hidden
[-1.27669634 -1.07035845]

Note that by using np.dot, I can calculate both hidden unit’s input in a single line of code.

Next, we have to find the activity of units in the hidden layer.

I will translate input into activity with a logistic function, as I did in the previous post.

Lets see what this looks like in code.

1
2
3
4
5
def logistic(x): #each neuron has a logistic activation function
    return 1.0/(1+np.exp(-x))

Hidden_Units = logistic(net_Hidden)
print Hidden_Units
[ 0.2181131   0.25533492]

So far so good, the logistic function has transformed the negative inputs into values near 0.

Now we have to compute the output unit’s acitivity.

plugging in the numbers

Now the code for computing and the Output unit’s activity.

1
2
3
4
5
6
net_Output = np.dot(np.append(Hidden_Units,1.0),Weights_2.T)
print 'net_Output'
print net_Output
Output = logistic(net_Output)
print 'Output'
print Output
net_Output
[-0.66626595]
Output
[ 0.33933346]

Okay, thats the network’s guess for one input…. no where near the correct answer (1). Let’s look at what the network predicts for the other input patterns. Below I create a feedfoward, 1-layer neural network and plot the neural nets’ guesses to the four input patterns.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
def layer_InputOutput(Inputs,Weights): #find a layers input and activity
    Inputs_with_bias = np.append(Inputs,1.0) #input 1 for each unit's bias
    return logistic(np.dot(Inputs_with_bias,Weights.T))

def neural_net(Input,Weights_1,Weights_2,Training=False): #this function creates and runs the neural net    

    target = 1 #set target value
    if np.array(Input[0])==np.array([Input[1]]): target = 0 #change target value if needed

    #forward pass
    Hidden_Units = layer_InputOutput(Input,Weights_1) #find hidden unit activity
    Output = layer_InputOutput(Hidden_Units,Weights_2) #find Output layer actiity

    return {'output':Output,'target':target,'input':Input} #record trial output

Train_Set = [[1.0,1.0],[0.0,1.0],[1.0,0.0],[0.0,0.0]] #the four input patterns
tempdict = {'output':[],'target':[],'input':[]} #data dictionary
temp = [neural_net(Input,Weights_1,Weights_2) for Input in Train_Set] #get the data
[tempdict[key].append([temp[x][key] for x in range(len(temp))]) for key in tempdict] #combine all the output dictionaries

plotter = np.ones((2,2))*np.reshape(np.array(tempdict['output']),(2,2))
plt.pcolor(plotter,vmin=0,vmax=1,cmap=plt.cm.bwr)
plt.colorbar(ticks=[0,0.25,0.5,0.75,1]);
plt.xlabel('Input 1')
plt.ylabel('Input 2')
plt.xticks([0.5,1.5], ['0','1'])
plt.yticks([0.5,1.5], ['0','1']);

In the plot above, I have Input 1 on the x-axis and Input 2 on the y-axis. So if the Input is [0,0], the network produces the activity depicted in the lower left square. If the Input is [1,0], the network produces the activity depicted in the lower right square. If the network produces an output of 0, then the square will be blue. If the network produces an output of 1, then the square will be red. As you can see, the network produces all output between 0.25 and 0.5… no where near the correct answers.

So how do we update the weights in order to reduce the error between our guess and the correct answer?

First, we will do backpropogation between the output and hidden layers. This is exactly the same as backpropogation in the previous post.

In the previous post I described how our goal was to decrease error by changing the weights between units. This is the equation we used to describe changes in error with changes in the weights. The equation below expresses changes in error with changes to weights between the and the Output unit.

Now multiply this weight adjustment by the learning rate.

Finally, we apply the weight adjustment to .

Now lets do the same thing, but for both the weights and in the code.

1
2
3
4
5
6
7
8
alpha = 0.5 #learning rate
target = 1 #target outpu

error = target - Output #amount of error
delta_out = np.atleast_2d(error*(Output*(1-Output))) #first two terms of error by weight derivative

Hidden_Units = np.append(Hidden_Units,1.0) #add an input of 1 for the bias
print Weights_2 + alpha*np.outer(delta_out,Hidden_Units) #apply weight change
[[-0.21252673 -0.96033892 -0.29229558]]

The hidden layer changes things when we do backpropogation. Above, we computed the new weights using the output unit’s error. Now, we want to find how adjusting a weight changes the error, but this weight connects an input to the hidden layer rather than connecting to the output layer. This means we have to propogate the error backwards to the hidden layer.

We will describe backpropogation for the line connecting and as

Pretty similar. We just replaced Output with . The interpretation (starting with the final term and moving left) is that changing the changes ’s input. Changing ’s input changes ’s activity. Changing ’s activity changes the error. This last assertion (the first term) is where things get complicated. Lets take a closer look at this first term

Changing ’s activity changes changes the input to the Output unit. Changing the output unit’s input changes the error. hmmmm still not quite there yet. Lets look at how changes to the output unit’s input changes the error.

You can probably see where this is going. Changing the output unit’s input changes the output unit’s activity. Changing the output unit’s activity changes error. There we go.

Okay, this got a bit heavy, but here comes some good news. Compare the two terms of the equation above to the first two terms of our original backpropogation equation. They’re the same! Now lets look at (the second term from the first equation after our new backpropogation equation).

Again, I am glossing over how to derive these partial derivatives. For a more complete explantion, I recommend Chapter 8 of Rumelhart and McClelland’s PDP book. Nonetheless, this means we can take the output of our function delta_output multiplied by and we have the first term of our backpropogation equation! We want to be the weight used in the forward pass. Not the updated weight.

The second two terms from our backpropogation equation are the same as in our original backpropogation equation.

- this is specific to logistic activation functions.

and

Lets try and write this out.

It’s not short, but its doable. Let’s plug in the numbers.

Not too bad. Now lets see the code.

1
2
3
4
delta_hidden = delta_out.dot(Weights_2)*(Hidden_Units*(1-Hidden_Units)) #find delta portion of weight update

delta_hidden = np.delete(delta_hidden,2) #remove the bias input
print Weights_1 + alpha*np.outer(delta_hidden,np.append(Input,1.0)) #append bias input and multiply input by delta portion
[[-0.25119612 -0.50149299 -0.77809147]
 [-0.80193714 -0.23946929 -0.84467792]]

Alright! Lets implement all of this into a single model and train the model on the XOR problem. Below I create a neural network that includes both a forward pass and an optional backpropogation pass.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
def neural_net(Input,Weights_1,Weights_2,Training=False): #this function creates and runs the neural net    

    target = 1 #set target value
    if np.array(Input[0])==np.array([Input[1]]): target = 0 #change target value if needed

    #forward pass
    Hidden_Units = layer_InputOutput(Input,Weights_1) #find hidden unit activity
    Output = layer_InputOutput(Hidden_Units,Weights_2) #find Output layer actiity

    if Training == True:
        alpha = 0.5 #learning rate

        Weights_2 = np.atleast_2d(Weights_2) #make sure this weight vector is 2d.

        error = target - Output #error
        delta_out = np.atleast_2d(error*(Output*(1-Output))) #delta between output and hidden

        Hidden_Units = np.append(Hidden_Units,1.0) #append an input for the bias
        delta_hidden = delta_out.dot(np.atleast_2d(Weights_2))*(Hidden_Units*(1-Hidden_Units)) #delta between hidden and input

        Weights_2 += alpha*np.outer(delta_out,Hidden_Units) #update weights

        delta_hidden = np.delete(delta_hidden,2) #remove bias activity
        Weights_1 += alpha*np.outer(delta_hidden,np.append(Input,1.0))  #update weights

    if Training == False:
        return {'output':Output,'target':target,'input':Input} #record trial output
    elif Training == True:
        return {'Weights_1':Weights_1,'Weights_2':Weights_2,'target':target,'output':Output,'error':error}

Okay, thats the network. Below, I train the network until its answers are very close to the correct answer.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
from random import choice
np.random.seed(seed=10) #seed random number generator for reproducibility

Weights_2 = np.random.rand(1,3)-0.5*2 #connections between hidden and output
Weights_1 = np.random.rand(2,3)-0.5*2 #connections between input and hidden

Weight_Dict = {'Weights_1':Weights_1,'Weights_2':Weights_2}

Train_Set = [[1.0,1.0],[0.0,0.0],[0.0,1.0],[1.0,0.0]] #train set

Error = []
while True: #train the neural net
    Train_Dict = neural_net(choice(Train_Set),Weight_Dict['Weights_1'],Weight_Dict['Weights_2'],Training=True)

    Error.append(abs(Train_Dict['error']))
    if len(Error) > 6 and np.mean(Error[-10:]) < 0.025: break #tell the code to stop iterating when recent mean error is small

Lets see how error changed across training

1
2
3
4
Error_vec = np.array(Error)[:,0]
plt.plot(Error_vec)
plt.ylabel('Error')
plt.xlabel('Iteration #');

Really cool. The network start with volatile error - sometimes being nearly correct ans sometimes being completely incorrect. Then After about 5000 iterations, the network starts down the slow path of perfecting an answer scheme. Below, I create a plot depicting the networks’ activity for the different input patterns.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Weights_1 = Weight_Dict['Weights_1']
Weights_2 = Weight_Dict['Weights_2']

Train_Set = [[1.0,1.0],[0.0,1.0],[1.0,0.0],[0.0,0.0]] #train set

tempdict = {'output':[],'target':[],'input':[]} #data dictionary
temp = [neural_net(Input,Weights_1,Weights_2) for Input in Train_Set] #get the data
[tempdict[key].append([temp[x][key] for x in range(len(temp))]) for key in tempdict] #combine all the output dictionaries

plotter = np.ones((2,2))*np.reshape(np.array(tempdict['output']),(2,2))
plt.pcolor(plotter,vmin=0,vmax=1,cmap=plt.cm.bwr)
plt.colorbar(ticks=[0,0.25,0.5,0.75,1]);
plt.xlabel('Input 1')
plt.ylabel('Input 2')
plt.xticks([0.5,1.5], ['0','1'])
plt.yticks([0.5,1.5], ['0','1']);

Again, the Input 1 value is on the x-axis and the Input 2 value is on the y-axis. As you can see, the network guesses 1 when the inputs are different and it guesses 0 when the inputs are the same. Perfect! Below I depict the network with these correct weights.

1
2
3
4
5
6
7
8
9
10
Weight_Dict = {'Weights_1':Weights_1,'Weights_2':Weights_2}

network = NeuralNetwork()
network.add_layer(2,['Input 1','Input 2'],
                  [[round(x,2) for x in Weight_Dict['Weights_1'][0][:2]],
                   [round(x,2) for x in Weight_Dict['Weights_1'][1][:2]]])
network.add_layer(2,[round(Weight_Dict['Weights_1'][0][2],2),round(Weight_Dict['Weights_1'][1][2],2)],
                  [round(x,2) for x in Weight_Dict['Weights_2'][:2][0]])
network.add_layer(1,[round(Weight_Dict['Weights_2'][0][2],2)])
network.draw()

The network finds a pretty cool solution. Both hidden units are relatively active, but one hidden unit sends a strong postitive signal and the other sends a strong negative signal. The output unit has a negative bias, so if neither input is on, it will have an activity around 0. If both Input units are on, then the hidden unit that sends a postitive signal will be inhibited, and the output unit will have activity near 0. Otherwise, the hidden unit with a positive signal gives the output unit an acitivty near 1.

This is all well and good, but if you try to train this network with random weights you might find that it produces an incorrect set of weights sometimes. This is because the network runs into a local minima. A local minima is an instance when any change in the weights would increase the error, so the network is left with a sub-optimal set of weights.

Below I hand-pick of set of weights that produce a local optima.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
Weights_2 = np.array([-4.5,5.3,-0.8]) #connections between hidden and output
Weights_1 = np.array([[-2.0,9.2,2.0],
                     [4.3,8.8,-0.1]])#connections between input and hidden

Weight_Dict = {'Weights_1':Weights_1,'Weights_2':Weights_2}

network = NeuralNetwork()
network.add_layer(2,['Input 1','Input 2'],
                  [[round(x,2) for x in Weight_Dict['Weights_1'][0][:2]],
                   [round(x,2) for x in Weight_Dict['Weights_1'][1][:2]]])
network.add_layer(2,[round(Weight_Dict['Weights_1'][0][2],2),round(Weight_Dict['Weights_1'][1][2],2)],
                  [round(x,2) for x in Weight_Dict['Weights_2'][:2]])
network.add_layer(1,[round(Weight_Dict['Weights_2'][2],2)])
network.draw()

Using these weights as the start of the training set, lets see what the network will do with training.

1
2
3
4
5
6
7
8
9
10
11
12
13
Train_Set = [[1.0,1.0],[0.0,0.0],[0.0,1.0],[1.0,0.0]] #train set

Error = []
while True:
    Train_Dict = neural_net(choice(Train_Set),Weight_Dict['Weights_1'],Weight_Dict['Weights_2'],Training=True)

    Error.append(abs(Train_Dict['error']))
    if len(Error) > 6 and np.mean(Error[-10:]) < 0.025: break

Error_vec = np.array(Error)[:]
plt.plot(Error_vec)
plt.ylabel('Error')
plt.xlabel('Iteration #');

As you can see the network never reduces error. Let’s see how the network answers to the different input patterns.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Weights_1 = Weight_Dict['Weights_1']
Weights_2 = Weight_Dict['Weights_2']

Train_Set = [[1.0,1.0],[0.0,1.0],[1.0,0.0],[0.0,0.0]] #train set

tempdict = {'output':[],'target':[],'input':[]} #data dictionary
temp = [neural_net(Input,Weights_1,Weights_2) for Input in Train_Set] #get the data
[tempdict[key].append([temp[x][key] for x in range(len(temp))]) for key in tempdict] #combine all the output dictionaries

plotter = np.ones((2,2))*np.reshape(np.array(tempdict['output']),(2,2))
plt.pcolor(plotter,vmin=0,vmax=1,cmap=plt.cm.bwr)
plt.colorbar(ticks=[0,0.25,0.5,0.75,1]);
plt.xlabel('Input 1')
plt.ylabel('Input 2')
plt.xticks([0.5,1.5], ['0','1'])
plt.yticks([0.5,1.5], ['0','1']);

Looks like the network produces the correct answer in some cases but not others. The network is particularly confused when Inputs 2 is 0. Below I depict the weights after “training.” As you can see, they have not changed too much from where the weights started before training.

1
2
3
4
5
6
7
8
9
10
11
12
13
Weights_1 = Weight_Dict['Weights_1']
Weights_2 = Weight_Dict['Weights_2']

Weight_Dict = {'Weights_1':Weights_1,'Weights_2':Weights_2}

network = NeuralNetwork()
network.add_layer(2,['Input 1','Input 2'],
                  [[round(x,2) for x in Weight_Dict['Weights_1'][0][:2]],
                   [round(x,2) for x in Weight_Dict['Weights_1'][1][:2]]])
network.add_layer(2,[round(Weight_Dict['Weights_1'][0][2],2),round(Weight_Dict['Weights_1'][1][2],2)],
                  [round(x,2) for x in Weight_Dict['Weights_2'][:2]])
network.add_layer(1,[round(Weight_Dict['Weights_2'][2],2)])
network.draw()

This network was unable to push itself out of the local optima. While local optima are a problem, they’re are a couple things we can do to avoid them. First, we should always train a network multiple times with different random weights in order to test for local optima. If the network continually finds local optima, then we can increase the learning rate. By increasing the learning rate, the network can escape local optima in some cases. This should be done with care though as too big of a learning rate can also prevent finding the global minima.

Alright, that’s it. Obviously the neural network behind alpha go is much more complex than this one, but I would guess that while alpha go is much larger the basic computations underlying it are similar.

Hopefully these posts have given you an idea for how neural networks function and why they’re so cool!