Neural ODEs with Euler's Method

Jekyll logo
Figure 1: Neural ODEs. Source: David Duvenaud et al.

In classical data modelling problems, we have $N$ pairs of data $(x_{i}, y_{i})$ that are used in training of the model. To make a prediction, we feed a new data point $x^{*}$ into the model to find $y^{*}$. We assume that there is an inherent relationship between pairs of $(x,y)$. In other words, there is an unknown function $F$ that maps between domains $X$ and $Y$. The machine learning approaches can be used to approximate the function $F$.

Jekyll logo
Figure 2: Neural networks as function approximators. Source: Jonty Sinai

How would we like to approximate $F$?

We can try to approximate the function $F$ with curve fitting or neural networks. By using neural networks, we assume that there is a continuous and differentiable relationship between $x$ and $y$. Therefore, our approximation of $F$ is differentiable. What about modeling derivative of $F$ instead of itself? In terms of parametric efficiency, modeling the derivative may be practical.

\[F(x)=\int f(x) dx \textrm{, where } F^{\prime}(x)=f\] \[\frac{d y}{d x} \approx f(x, y)\]

We are planning to approximate an ODE that governs the function F which is $f$. Solving the approximated ODE is equivalent to function approximation of $F$. Numerical methods can be utilized to solve an approximated ODE. In this post, we will use a simple numerical method which is Euler’s Method. The source of the idea is the beautiful blog of “Understanding Neural ODEs” by Jonty Sinai.

What is Euler’s Method?

Euler Method is simply re-writing the basic definition of the gradient at a point with a fixed step size \(\delta\).

\[\frac{d y}{d x} \approx \frac{y(x+\delta)-y(x)}{\delta}\] \[y(x+\delta)=y(x)+\delta \frac{d y}{d x}\]

The algorithm for the Euler’s Method is following:

  • define \(f=(x,y)\) derivative of $F(x)$
  • input $x_{0}$ and $y_{0}$
  • input step size \(\delta\) and number of steps \(n\)
  • for \(j\) from 1 to \(n\):
    • $m=f(x_{0},y_{0})$
    • $y_{1}$ = $y_{0}+\delta * m $
    • $x_{1}=x_{0}+h$
    • $x_{0}=x_{1}$
    • $y_{0}=y_{1}$

It is iteratively using the gradient at the current point to decide in which direction to move in and by how much.


def euler(givenfunction,x0,y0,h,n,i):
    m=givenfunction(x0,y0)
    y1=y0+h*m
    x1=x0+h
    i=i+1
    if i==n:
        xs.append(round(x1,2))
        ys.append(round(y1,5))
        return (x1,y1)
    else:
        xs.append(round(x1,2))
        ys.append(round(y1,5))
        return euler(givenfunction,x1,y1,h,n,i)

For the given equation and initial condition

$y^{\prime}=-sin(x)$ with $y(0)=1$

def f_prime(x,y):
    y_prime = -np.sin(x)
    return y_prime

Pick the values for $\delta$ and $n$ then run the algorithm with

delta=0.1
x0=0
y0=1
n=len(np.arange(0, 5.0, delta))
xs=[x0]
ys=[y0]

_=euler(f_prime,x0,y0,delta,n,i=1)

Also, find the analytical solution to compare with Euler’s Method. Notice that we cannot do the comparison in a real problem we do not have the $f$ itself.

$y(x)=cos(x)$

def f(x):
    y=np.cos(x)
    return y
    
x_range = np.arange(0.0, 5.0, 0.02)

plt.figure(figsize=(10,5))
plt.plot(xs,ys,'bo',label='Euler\'s method')
plt.plot(x_range, f(x_range), 'k', label='Analytical Solution')
plt.legend()
plt.show()
plt.show()

Jekyll logo
Figure 3: Euler's Method vs Real Trajectory.

As we see in the figure above, Euler’s Method gives a good approximation of the function $F$ by using the ODE that describes the rate of change.

Approximating the governing ODE with a NN

We will try to approximate derivative of unknown function $y=F(x)$.

\[y(t)=1+\frac{1}{2} \mathbf{e}^{-4 t}-\frac{1}{2} \mathbf{e}^{-2 t}\] \[\frac{d y}{d x}=f(x, y)=a\]

In the approximation, the value $a$ is the only free parameter that should be optimized. We will again utilize Euler’s Method for our forward pass on our data.

  • Select $\delta$ as stepsize.
  • Choose $k$ ordered data pairs in our dataset.
  • Create a regularly spaced computation trajectory between first and last data point where each interval has a length of $\delta$.
  • Compute predictions along the computation trajectory with the ODE approximation and Euler’s Method.
  • Measure the loss at $k$ datapoints
  • Perform backpropagation with your optimizer.

Now, let’s see all of the steps in code. You can go over the steps in Google Colab.


import torch

#function F
def Func(x):
    
    y = 1+0.5*np.exp(-4*x)-0.5*np.exp(-2*x)
    return y


#dataset
delta=0.1
x = np.arange(-1,4,delta)
y = Func(x)

#sample sorted sequences in dataset
N = 1000
seq_len=5
data = np.empty((N,2,seq_len))
for i in range(N):
    mask = np.sort(np.random.choice(np.arange(0,len(x),1), seq_len, replace=False))
    X = x[mask]
    Y = y[mask]
    data[i]= np.stack((X,Y))

#neural network for approximating f which is derivative of F
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim





class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(2, 4) 
        self.fc2 = nn.Linear(4, 8)
        self.fc3 = nn.Linear(8, 4)
        self.fc4 = nn.Linear(4, 1)
        
    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.relu(self.fc3(x))
        x = (self.fc4(x))
        return x

    

#initialize the network
net = Net()
print(net)

#initialize loss function and optimizer
criterion = nn.MSELoss()
optimizer = optim.Adam(net.parameters())



#training
N=1000
epochs=10
bs = 1
for epoch in range(epochs):
  for index in range(N):
    optimizer.zero_grad()
    Ys=torch.zeros((1,seq_len-1))
    data_point_X = data[index,[0]]
    data_point_Y = data[index,[1]]

    X = torch.Tensor(data_point_X[:,0])
    Y = torch.Tensor(data_point_Y[:,0])
    XY = torch.stack([X,Y]).view(bs,2).detach()

    for j in range(data_point_X.shape[1]-1):
      for i in np.arange(data_point_X[:,j][0],data_point_X[:,j+1][0],delta):

        m_net_out = net(XY.detach())
        Y=(Y.detach()+delta*m_net_out[0])
        X=(X+delta)
        XY = torch.stack([X,Y]).view(bs,2)
      Ys[0,j]=Y
    loss = criterion(Ys,torch.Tensor(data_point_Y[:,1:seq_len]))
    print("loss",loss.item())
    loss.backward()
    optimizer.step()



#inference
x = np.arange(-1,4,delta)
y0 = Func(-1)
X = torch.Tensor([x[0]])
Y = torch.Tensor([y0])

XY = torch.stack([X,Y]).view(bs,2).detach()
Ys=torch.zeros((1,len(x)))
net.eval()
Ys[0,0]=y0
with torch.no_grad():
  for i in range(len(x)-1):
      m_net_out = net(XY)
      Y=(Y+delta*m_net_out[0])
      X=(X+delta)
      XY = torch.stack([X,Y]).view(bs,2)
      Ys[0,i+1]=Y
        

#real trajectory
x_compare = np.arange(-1,4,0.01)
y_compare = Func(x_compare) 

plt.figure(figsize=(10,7))
plt.plot(x_compare,y_compare,'k',label='Real trajectory')
plt.plot(x[0], Ys.numpy().reshape(len(x))[0], 'r*', label='Initial condition', ms=10)
plt.plot(x[1:], Ys.numpy().reshape(len(x))[1:], 'bo', label='Predictions')
plt.legend()
plt.show()

Jekyll logo
Figure 4: Neural ODE with Euler's Method vs Real Trajectory.

As we can see above, we inferred the function itself by approximating the governing ODE and solving it numerically with Euler’s method.

Sources:

  1. Jonty Sinai, “Understanding Neural ODEs”.
  2. Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, David Duvenaud. Neural Ordinary Differential Equations, 2018.
  3. Mikhail Surtsukov, “Neural Ordinary Differential Equations”
Written on August 25, 2020