## Representing discrete events as continuous rate data using Python

Notes on Theoretical Neuroscience

29 Sep 2019
modified: 06 Oct 2019

In the study of binary or count data, discrete events can be mapped to continuous variables for visualization, analysis, and modeling. Python’s standard suite of data science libraries provide a number of functions to effect these transformations. In this post, we apply some very simple functions from the Pandas and Numpy libraries to transform neural spike train records from discrete events signifying action potentials to firing rates in Hertz.

The transformations described below largely follow the approach outlined in Theoretical Neuroscience (Dayan & Abbott, 2001). We apply these methods to spike train recordings provided through the book’s website hosted by University College of London (http://www.gatsby.ucl.ac.uk/~dayan/book/exercises.html). The spike train data were collected by Rob de Ruyter van Steveninck and the recordings reflect action potentials from a fly H1 neuron in response to white noise stimuli.

This post demonstrates methods for transformating discrete events to continuous representations as follows:

2. Visualize the raw spike train data as discrete events using a rug plot.
3. Transformations of the spike train as firing rates (Hertz) over contiguous bins
4. Transformations of the spike train using window functions (convolution methods) using rectangular, gaussian, and causal kernels

Libraries
We’ll need to load the following python libraries:

We use the seaborn function .set() to render plots with seaborn defaults globally for better visualization. We also set the rcParams from the matplotlib library for larger plots.

We start by downloading and formatting the data using the urllib and scipy packages. With the urllib package we can open a connection to the remote web url where the data is stored. Once we have established a connection, we use the .read() method to load the data into our workspace.

The data is stored as a .mat file, Matlab’s proprietary data format. To convert this to python, simply save the file to a convenient location on your local computer. Then load the data in native python formatting using the loadmat function from the scipy.io library.

Set up
From the book website, we learn that the observation period for the neural recording spanned twenty minutes, and the sampling rate over this period was 500 Hertz. This means that each observation of the sample represents a span of two milliseconds (2 ms). We will use this information to create a dataframe index.

The data consist of a single spike train representing a 20 minute recording of neural spiking in fly H1 in response to white noise stimuli. These values along with metadata are read into our workspace as a python dictionary. The spike train observations and the white noise stimuli are values in the dictionary with keys labels as ‘rho’ and ‘stim’, respectively. We extract them into a pandas dataframe using our observation index as follows:

Our dataframe has two columns and an index representing the time from the start of the observation period in milliseconds. For faster execution, we filter the dataframe to contain only the first three seconds of the sample.

Data Transformations
Rug plot
The rug plot will produce vertical lines at times supplied to the seaborn function rugplot(). We filter the dataframe to those times when a spike event occurred, and pass the index to the rugplot() function using the .pipe() method from the pandas library. Since we will soon stack our different visualizations vertically (see below), we extend the limits of the x-axis to the full three second interval.

The rugplot represents the data as discrete events over the sample period.

Contiguous bins
We first transform the event data to Hertz using contiguous bins 100 ms in width. We again filter the data to those observations when a spike occurred and pass the index to seaborn’s distplot() function. Since the level of the estimate reflects the count of spikes within a 100 ms bin, we need to divide the count by the bin width in order to estimate firing rate in Hertz. We do this by re-labeling the y-axis to the appropriate scale.

Rectangular window
Instead of static contiguous bins that make the rate estimate highly dependent on the position of the bins, we could estimate firing rate using a sliding window across the span of our observation period. The rectangular window used with contiguous bins above is the simplest shape, but here we slide the window so that right and left boundaries move in step over the observation period. Transforming a function by sliding a window and integrating at each point is called convolution, and the shape of the window is called the kernel.

In the case of a rectangular sliding window, we represent the estimate of the firing rate at time $t$ as $\hat r(t) = \sum_{i=1}^n{w(t-t_i)}$, where $t_i$ denotes the time of a spike event and:

Computing convolution using window functions is very simple using pandas dataframe method .rolling(). The first parameter of the .rolling() method represents the width of the window; the center parameter is True if the window is centered; and min_periods parameter allows you to perform the estimation for intervals when the number of observations is smaller than the window width (ie., the window opens from one observation to the number specified by your window width near the beginning of your sample). The .rolling() method should be followed by an aggregation; in our case, we use the .sum() method to count the number of spikes at each centered window.

A window of 51 gives 25 bins to either side of center. Since each bin equals 2 ms, the total window width is 102 ms. After counting spikes over this window at each time point, we divide by 51*(2ms) to get the estimated spikes per millisecond; then we divide further by 1ms (1/1000) to get rate in Hertz.

Gaussian window
We use a gaussian kernel for smoother rate estimates. This firing rate is estimated as:

The gaussian window width should be as large as the data since the gaussian is supported from negative infinity to positive infinity. The .rolling() method with a gaussian kernel requires a standard deviation (std) argument in the .sum() method to determine the width of the window.

The rolling method with win_type = “gaussian” produces an unnormalized gaussian sum. That is, the weight of each observation falls off according to the gaussian pdf, but the area under the curve does not sum to 1. The parameter representing std should be adjusted according to the resolution of the sampling rate. We can apply the normalization after summing (in this case using the unadjusted std). The sum after normalization represents the estimated firing rate at each 2ms bin. To get the rate in Hertz, we divide further by the duration of the bin (2ms)

Causal kernel (alpha function)
Lastly, we consider a causal kernel that estimates the firing rate at each time point using only information available from samples prior to the estimated time. In this case, we first define a new function alpha_fn. The function’s call signature includes the event data to be transformed, the sampling period, and a parameter to control the window width. More precisely, the alpha function is defined as $w(\tau) = [\alpha^2 \tau \exp (-\alpha \tau)]_{+}$. Using this window, the firing rate is estimated as:

We code this function in python as:

To use this function, we call the rolling method as before, but we no longer center the kernel about the point to be estimated. Instead of using the .sum() method, we use the .apply() method with a call to our custom alpha function. As with the gaussian kernel, we will need to divide the result by the sample period in order to recover our rate in Hertz.

We can more easily compare these different methods by stacking these plots.