50 First Dates with Deep Learning


Remember the 2004 movie 50 First Dates (Adam Sandler as Henry and Drew Barrymore as Lucy) in which Lucy is amnesic and Henry has to introduce himself every morning so she can remember him. Well, in some ways our Deep Learning systems suffer from that type of a problem: while excellent for single tasks, their learning is not easily transferable to new or multiple tasks.

The current approach to accomplish that (finetuning through backpropagation) is prone to “forgetting” problems, inefficient, and may require tremendous amounts of new data. Transferring models to new tasks is complicated and may suffer from catastrophic forgetting. Even when models designed for multiple tasks learning (e.g. distillation) are used, they require significant training data for all tasks. The solution is a new architecture proposed by Google DeepMind and is known as Progressive Neural Networks.

Unlike finetuning method which is limited in retaining prior knowledge post initialization, progressive networks carry the pretrained models during training. The lateral connections provided by pretrained models carry forward previously learned features such that knowledge is retained across layers representing new tasks. But in addition to preserving previous knowledge, they are also capable of new learning.

Thus progressive networks can learn continuously and display lifelong learning by amassing previous knowledge and turning it into what humans call experience.

The results of the study conducted by DeepMind show that while the system’s transfer performance is as efficient as previous methods (e.g. finetuning), it doesn’t suffer from the problems such as catastrophic forgetting.

The model layers introduced to accomplish new tasks get input from other previous (or parallel) tasks as well as receiving input from its own layers. This is accomplished by introducing “adapters” – the non-linear lateral connectors. I view that as a level of abstraction which enables meta-cognition across columns of multiple tasks. This enables knowledge retention and transfer across new tasks.

In a recent paper (September 2016) DeepMind demonstrates the application of progressive networks in complex reinforcement learning domains. To take a deeper dive you can download the paper here.