Mathematics in Deep Learning ---Syllabus

October 20, 2019

Welcome to the zoo!

A tentative list of topics to be covered in this course.

The course will be conducted with a mixture of regular lectures, seminar style presentation and discussion.

Participants of this course are expected to present certain relevant concepts from suggested reading assignments, and arrange the presentation in a certain uniform style.

basic multi-layer neural networks (NNs)
convolutional Neural networks (CNNs)
residual neural networks (ResNets)
generative adversarial networks (GANs) and the Wasserstein GANs
recurrent neural network (RNN)
LSTMs
convolutional Neural networks for graphs
encoder-decoders
reinforcement learning (value iteration, policy iteration)

Algorithmic components:

stochastic gradient descent algorithms
backpropagation and automatic differentiation
computational graphs
search algorithms: Monte-Carlo Tree Search used in AlphaGo
classical multi-level algorithms: multigrid methods

Approximation theory:

classical multi-resolution analysis (wavelet)
compressive sensing

The curse of dimensionality!

single layer universal approximation theory
multi-layer neural network approximation theory
over-parameterization and generalization
linear algebra: defining notions of distances from data matrices
manifold learning
transfer learning
adversarial attacks
differential privacy

Random graphs and random matrices:

Optimization algorithms and theories:

efficient algorithms for finding a/the minimum of a loss function
mini-batch and minimizing variances
duality, saddle point problems
primal-dual type splitting algorithms
Nesterov’s algorithm
how to select your mini-batches?
which loss function?
the vanishing gradient problem of using Sigmoids
cross-entropy

Training issues

The problem with Sigmoids: Sigmoid “saturates” by approaching 1, as the input increases. (ReLU just keeps increasing) When Sigmoid(x) is very close to 1, it’s gradient is very close to 0, and gives little information for gradient descent algorithms.
The dying ReLu problem: ReLU neurons becomes inactive (only outputs zero regardless of inputs. Little theoretical results about this phen omenon).
- Remedy type 1: modify the network architecture, and possibly replace the activation function
- Remedy type 2: introduce additional training steps – normalization techniques and introducing “dropouts”
- Remedy type 3: initialization
batch normalization (Cf. Ioffe and Szegedy, 2015): It is a technique that inserts layers into the deep neural network that transform the output for the batch to be zero mean unit variance.
weight initialization: carefully choose random initial weights with suitable variances (Cf: Hanin and Rolnick, How to start training: the effect of initialization and architecture)

Initialization for training neural networks

For ReLU networks: Karniadakis proposes randomize asymmetric initialization

Dynamical system:

stability
automatic step size control
optimal control of dynamical systems
Decoupling algorithms
Impulse method

Optimal transport theory and algorithms:

Comparing probability densities
Earth mover’s distances “convexify”
The Benamou-Brenier fluid formulation and related algorithms
Sinkhorn algorithm
Information Geometry

Mathematics in Deep Learning ---Syllabus

Algorithmic components:

Approximation theory:

Random graphs and random matrices:

Optimization algorithms and theories:

Training issues

Initialization for training neural networks

Dynamical system:

Optimal transport theory and algorithms:

Mean field games

Some novel applications:

Deep learning for scientific computing:

Mathematics in Deep Learning ---Syllabus

Deep learning architectures and the related applications:

Algorithmic components:

Approximation theory:

Random graphs and random matrices:

Optimization algorithms and theories:

Training issues

Initialization for training neural networks

Dynamical system:

Optimal transport theory and algorithms:

Mean field games

Some novel applications:

Deep learning for scientific computing: