Continual Learning requires Rethinking Learning Architectures
Learning is an active process. An animal acquires knowledge about the world and changes its behaviour as a result. By nature, this process is continual, as animals gather new information throughout their lifespans to increase their chances of survival.
In contrast, in artificial intelligence (AI), we often treat learning as a static problem. Whether it’s supervised, unsupervised, or reinforcement learning, the objective is to find one optimal solution for the problem at hand. This narrow view is so deeply ingrained that we were forced to coin a tautology: continual learning. We essentially invented a new term to describe what learning was supposed to be in the first place. This simplification enabled tremendous progress in algorithm development, but it sidestepped the fundamental problem AI was intended to solve.
While the community’s recent shift towards continual learning is a long-overdue homecoming, the methods and approaches used are stuck in the static learning era—just like the name suggests. For example, popular approaches like replay augmentation and elastic weight consolidation (EWC) are commonly used to prevent forgetting. But these methods fail to acquire new information rapidly and focus overly on the quality of old solutions. Conversely, plasticity injection methods reset parts of the neural network to enable continual learning but risk overwriting previous information, and they are slow to learn new patterns from experience data. To solve continual learning, we cannot just ‘patch’ existing algorithms; we must fundamentally rethink our learning architectures.
The Problem with the Current Learning Architectures: The Stability-Plasticity Dilemma
Continual learning means constant adaptation. This seemingly simple definition, however, points to a major challenge at the heart of the field.
When we learn, we implicitly expect the knowledge to remain useful. For example, looking both ways before crossing the street is a behaviour that remains useful across nearly all contexts. This property of learning is called generalization in AI; we expect the agent to perform well in unseen situations that resemble those it has experienced before. We call these recurring patterns regularities. For the agent to adapt faster, it must retain knowledge of regularities for a long time; it must be stable.
Conversely, adaptation requires the agent to learn from new experiential data. The agent must be plastic to absorb new pieces of knowledge and new patterns about the world. Some information is transient and situational. For example, details of a particular road closure are unimportant once it is reopened. But the agent has to adapt its behaviour as long as the road is closed.
This tension to persistently adapt to new experiential data while retaining previously learnt information is the main challenge that all continual learning agents face—known as the stability-plasticity dilemma.

It is hopeless to expect current architectures—whether single large models, new optimizers, or fancy activations—to solve these contradictory objectives. A single parametric approximator, such as a neural network, cannot be both highly stable and highly plastic at the same time [3, 4, 5]. It is because learning in such approximators is distributed across neurons in several layers. When optimizing for plasticity, we risk overwriting previously useful neurons, leading to catastrophic forgetting. On the contrary, if the updates preserve previously learnt patterns, learning from new data will be ineffective.
Moreover, parametric approximators are slow learners; they must see any individual experience several times to learn from it, making it impossible to adapt instantly to a single rare event, such as a road closure.
Inspiration from Mother Nature: Complementary Learning Systems
Nature solved this problem for us eons ago. Squirrels, for example, constantly adapt their food-caching strategy to prevent it from being stolen.
The complementary learning systems theory suggests that the brain employs two distinct learning systems with distinct functions [3]. The first system, the neocortex, is responsible for learning slow, structured, and generalized representations. Because it must extract regularities across lifetimes of data, learning in this system is slow. This slowness, perhaps, prevents catastrophic forgetting, but it also prevents the system from learning from rare events. The second system, the hippocampus, functions as a fast, episodic learner that rapidly learns from new, rare experiences, enabling instant adaptation.
These two systems, when combined, allow for a good trade-off between stability and plasticity, enabling biological agents to adapt instantly while retaining recurring patterns.
A Solution: Dual Learning Architectures
First, the AI community must look to neuroscience and cognitive science for inspiration. Second—and most importantly—we must embrace dual learning architectures. Within this framework, two systems must complement one another, differing in learning speeds, objectives, and representations.
Resembling the neocortex, the first learning system—the permanent learning system—should learn general, structured predictions and retain them for a long time. This system optimizes the stability component of the stability-plasticity trade-off. A parametric approximator, such as a deep neural network, fits this role perfectly.
Conversely, the second learning system—the transient learning system—must learn from individual experiences, adapting instantly to nuances that a slow parametric learner would miss. For this plasticity-centred system, a non-parametric approximator is the ideal candidate. Approaches such as k-nearest neighbours, index hash tables, or episodic memory buffers allow the agent to store and learn from specific experiences while facilitating rapid adaptation.
One way to accomplish this is to allow the information to flow into the transient system first, before consolidating it into the permanent system. This strategy ensures that when a rare event occurs, the highly plastic transient system absorbs it instantly, adapting the agent’s behaviour. Moreover, by shielding the permanent system from abrupt changes that the raw online experience demands, we ensure that previously learned knowledge stays intact while slowly learning new ones.
This strategy, when applied to estimating value functions, empirically improved the stability-plasticity trade-off in continual reinforcement learning, leading to better performance across several benchmarks [1, 2]. For instance, agents using two value functions showed both higher retention of previously made predictions and faster adaptation to new tasks than those using a single value function. This evidence validates the hypothesis that using two complementary learning systems is highly suitable for continual learning in AI, like in biological systems.
If the AI community is serious about continual learning, we must abandon the exclusive pursuit of monolithic models. The dual learning architecture offers a promising path to balance stability and plasticity, and the solution to continual learning lies in it.
Acknowledgements: This post is inspired by numerous discussions with my Ph.D. supervisor, Doina Precup. And thanks to Khurram Javed for his valuable comments on the initial draft.

