Welcome to the zoo!
A tentative list of topics to be covered in this course.
The course will be conducted with a mixture of regular lectures, seminar style presentation and discussion.
Participants of this course are expected to present certain relevant concepts from suggested reading assignments, and arrange the presentation in a certain uniform style.
Algorithmic components:
- stochastic gradient descent algorithms
- backpropagation and automatic differentiation
- computational graphs
- search algorithms: Monte-Carlo Tree Search used in AlphaGo
- classical multi-level algorithms: multigrid methods
Approximation theory:
- classical multi-resolution analysis (wavelet)
- compressive sensing
The curse of dimensionality!
Random graphs and random matrices:
Optimization algorithms and theories:
Training issues
-
The problem with Sigmoids: Sigmoid “saturates” by approaching 1, as the input increases. (ReLU just keeps increasing) When Sigmoid(x) is very close to 1, it’s gradient is very close to 0, and gives little information for gradient descent algorithms.
- The dying ReLu problem: ReLU neurons becomes inactive (only outputs zero regardless of inputs. Little theoretical results about this phen
omenon).
- Remedy type 1: modify the network architecture, and possibly replace the activation function
- Remedy type 2: introduce additional training steps – normalization techniques and introducing “dropouts”
- Remedy type 3: initialization
- batch normalization (Cf. Ioffe and Szegedy, 2015): It is a technique that inserts layers into the deep neural network that transform the output for the batch to be zero mean unit variance.
- weight initialization: carefully choose random initial weights with suitable variances (Cf: Hanin and Rolnick, How to start training: the effect of initialization and architecture)
Initialization for training neural networks
- For ReLU networks: Karniadakis proposes randomize asymmetric initialization
Dynamical system:
- stability
- automatic step size control
- optimal control of dynamical systems
- Decoupling algorithms
- Impulse method
Optimal transport theory and algorithms:
Mean field games
Some novel applications:
- Beyond image processing and facial recognition
Deep learning for scientific computing: