Introduction
The last decade has seen great experimental progress being made in machine learning, spearheaded by deep learning methods that make use of so-called deep neural networks. Many challenging, high-dimensional tasks that were previously beyond reach have become feasible with remarkably simple (in the algorithmic sense) techniques coupled with modern computational resources. Particularly in the fields of computer vision and natural language processing, deep learning is currently the go-to tool.
Machine learning is usually positioned under the umbrella of artificial intelligence. While artificial intelligence is a fairly nebulous and broad term, the term machine learning refers specifically to algorithms that can improve their performance at a given task when presented with more data about the problem. Machine learning algorithms are used in a wide variety of applications such as speech recognition and computer vision, where it is difficult or impossible to develop conventional algorithms to perform the required task.These learning algorithms generally start from very general model that is then trained based in sample data to learn how to perform a specific task. When the underlying model being used is a (deep) neural network then we speak about deep learning.
The idea of using computational models that are inspired by the workings of biological neurons dates as far back as [1]. Over the following decades more of the ingredients that we think of a standard today were added: training the network on data was tried by [2], [3] originated the ancestor of our current convolutional neural networks and [4] introduced backpropagation as a training mechanism for neural networks.
However, non of these efforts led to a breakthrough in the use of neural networks in practice and such models were mainly regarded as a academic curiosity during this time period. This state of affairs only changed at the start of the new millennium with the appearance of programmable GPUs (Graphics Processing Unit), while initially designed with rendering 3D graphics in mind these devices could be leveraged for other purposes, such as by [5] for neural networks. This eventually led to such breakthroughs as on the ImageNet [6] image classification challenge by [7], where neural networks managed to dominate other, more traditional, techniques. These events can be thought of as the start of the modern deep learning era.
Despite more than a decade of impressive experimental results, theoretical understanding of why deep learning works as well as it does is lacking. This presents an opportunity both for understanding and improvement, particularly for mathematicians. This course presents an introduction to neural networks from the point of view of a mathematician. We cover the basic vocabulary and functioning of neural networks in The Basics. In chapter Deep Learning we look at deep neural networks and the associates techniques that allow them to work. Equivariance covers a novel application of geometry to neural networks. We will discuss how the theory of Lie groups and homogeneous spaces can be leveraged to endow neural networks with certain structural symmetries, i.e. make them equivariant under certain geometric transformations.
General references
Smets, Bart M. N. (2024). "Mathematics of Neural Networks". arXiv:2403.04807 [cs.LG].
References
- Cite error: Invalid
<ref>
tag; no text was provided for refs namedmcculloch1943logical
- Cite error: Invalid
<ref>
tag; no text was provided for refs namedivakhnenko1966cybernetic
- Cite error: Invalid
<ref>
tag; no text was provided for refs namedfukushima1987neural
- Cite error: Invalid
<ref>
tag; no text was provided for refs namedlecun1989backpropagation
- Cite error: Invalid
<ref>
tag; no text was provided for refs namedoh2004gpu
- Cite error: Invalid
<ref>
tag; no text was provided for refs nameddeng2009imagenet
- Cite error: Invalid
<ref>
tag; no text was provided for refs namedkrizhevsky2012imagenet