Information processing in neural and gene regulatory networks

Speaker: Gašper Tkačik
IST Austria
Information processing in neural and gene regulatory networks
Location: A1.100 TU Delft
Date: 22-03-2017

Author: Kristian Blom

On the 22nd of March I visited a seminar given by Gašper Tkačik, a theoretical physicist who is interested in using statistical physics and information theory to explain phenomena related to the cell. The most fundamental principle that underlies all the research that dr. Tkačik conducts is that information processing networks have evolved or adapted to maximize the information transmitted from their inputs to the outputs, given the biophysical noise and resource constraints.

Dr. Tkačik showed us multiple examples of his research during his talk. For now I’d like to focus on the most interesting one (from my point of view), which is about reading the positional code in early development. It is commonly known that a morphogen gradient in early development generates different cell types in distinct spatial orders. This is called the French flag model. Despite decades of biological study, a quantitative answer to how much appositional information there is in an expression pattern remained unanswered. Therefore Dr. Tkačik to look at the French flag model from an information theory point of view and asked the following question: How much information is there in spatial patterns of gene expression? Using the gap genes in the Drosophila embryo he measured the amount of information in bits. I will now discuss shortly how one can measure the information contained in gap genes.

Figure 1: Normalized dorsal profiles of fluorescence intensity, which we identify as Hb expression level g, from 24 embryos selected in a 38- to 48-min time interval after the beginning of nuclear cycle 14. Considering all points with g = 0.1, 0.5, or 0.9 (Left) , yields conditional distributions with probability densities P(x|g) (Right). Note that these distributions are much more sharply concentrated than the uniform distribution P(x) shown in light gray. Image adapted from: Dubuis, J.O.; Tkačik, G.; Wieschaus, E.F.; Gregor, T; Bialek, W. PNAS, 2013, 110 (41), pp 16301-16308

We start by looking at the early stages of Drosphila development. At this stage most cells are similar in morphology, so we do not have any information about the position of cell when we neglect gene expression information. Mathematically we can say that the position of the cell is drawn from a distribution of possibilities P(x). If we know take into account the gene expression levels, our uncertainty in position is reduced.  Looking specifically at the expression levels of the hunchback gap (Hb) gene (figure 1), one can see that a certain expression level (g) is not a unique indicator for the position of the cell along the posterior/anterior axis.  Instead there is a range of positions that have the value g. Let P(x|g) be the conditional probability that a cell with expression level g is located at position x.

We define the entropy  of our two probability distributions as:

The information gain due to an observation of the hg expression level at on cell is now given by

From this point I will leave the mathematical expressions as it is, but I challenge you to get a firm understanding of why the final expression represents the information gain. After a small adaption to the final formula, Dr. Tkačik  used that result to make a ‘’direct’’ measurement of the amount of information carried in the gap genes. Using this method he found that individual genes carry almost two bits of information. In the extension of this result he also found that four gap genes carry enough information to define a cell’s location within an error bar of ~1% along the anterior/posterior axis of the embryo. How cool is that!

Although the talk went a bit fast, the content was really good. During the talk I was reminded of the lectures we had during evolutionary & developmental biology (evodevo), since it was this course where I got familiar with the gap genes in drosophila development. Therefore I decided to inform one of the evodevo teachers with the content of this talk, because it might be of good use in the future for them. Although it sounds a bit cliché, afterwards I was again (it happens on a regular basis) astonished by the fact that nanobiology is a really strong field of science. What Dr. Tkačik did fits very well into our program because he used mathematics, especially information theory, to understand why those gap genes function the way they do. For me it was really a wakeup call to keep questioning myself: Why? If one keeps asking this again and again, I think at some point you will find yourself in the fields of mathematics and physics where the answer will be waiting for you to be found.


Identification of slowly reacting variables in a dynamic system using the Wasserstein metric

Speaker: Prof. Dr. Sjoerd Verduyn Lunel
Applied Analysis
Identification of slowly reacting variables in a dynamic system using the Wasserstein metric
Utrecht University
Date: 2016-0
Author: Romano van Genderen

Of the seminars given during the Dutch National Mathematics Symposium I visited, I chose to share the seminar given by professor Verduyn Lunel with you. I chose this particular professor because I have already once visited one of his lectures on real single-variable analysis, and this topic because it has the most practical applications.

He first started slowly with the basics. He explained that a time series is a series of measurements of the same thing, done once every time interval t. A time series can be generated through measurement of a physical phenomenon or by a simpler model function. As an example, which was also recently mentioned in Computational Science, he used the Newton-Raphson method. This formula defines a variable x_{n+1} based on its previous value x_{n}. This also happens in another well-known formula, namely Hénon’s equation.

Next, he explained about attractors. When you have a time series, it has an attractor. This is a point or set of points A, where if the time series gets in the vicinity of A,in a set of points called V, it will never leave V. In the case of the Newton-Ralphson method, it is the root of the function, in the case of Hénon’s equation, a complicated 2-dimensional shape. Every time series has an attractor.

But the question he asked, can you predict the attractor when you have a time series. Kennel et al (1992) showed that this is possible by grouping specific terms in the time series into vectors and adding a lag between the terms. For example, the time series (1, 2, 3, 4, 5, 6) can be grouped like (1, 4), (2, 5), (3, 6). These are vectors in R^2, but sometimes other dimensions or lags are required.

So now you have a set of vectors {v_1, … v_n} ∈ R^n. Next, you should put the distance between these vectors inside a matrix. So now you have the matrix:


You can plot the columns of these matrices in a vector space. This leads to a point cloud. The fact that you have transformed a time series into a point cloud allows the most innovative mathematical object in this seminar to be used, the Wasserstein metric. This metric is a specific way to assign “distance” or “difference” between two point clouds. The way to visualise the Wasserstein distance is to think of all points in a point cloud as small heaps of sand. In that case, the Wasserstein distance is the least amount of work to be done on the heaps in the configuration of the first point cloud to push them into the second configuration. If these point clouds are very similar, you need just a little work to push the heaps of sand. But if they differ a lot, a lot of work is needed. So a low Wasserstein distance means a low difference between the two point clouds. The parameters in the model equation change the Wasserstein distance. And because the Wasserstein distance is related to the attractor as shown before, the attractor shifts position when the parameters change.


Fig 1. Image explaining the principle behind the Wasserstein metric, the gray heaps u are moved to the darker heaps w. Scott Cohen, Stanford University.

Now for all these things to come together. I showed that you can change a time series into a point cloud. You can calculate an objective distance or a measure of difference between two point clouds using the Wasserstein metric. So you can objectively say how different two time series are. This has many practical uses. But because prof. Verduyn Lunel was already a bit short in time, he only mentioned two.

The first is if you use a specific point in an MRI as your time series. In this case, you can objectively see the difference between two points in time. Instead of letting a doctor guess if something is severe or not, risking potential bias or human mistakes, you now have a simple number stating how much difference there is between a sick and a healthy patient. This also helps set a simple number for when intervention is needed.

The second use, which was a case study prof Verduyn Lunel participated in at the Academic Medical Centre in Amsterdam, was to distinguish between asthma and Chronic Obstructive Pulmonary Disease. Using time samples of patients breathing into what he called “Digital Noses”, he could distinguish between the patients having asthma and COPD, even noticing that one patient was suffering from both, a fact that could not have been observed earlier.