House · MILA Montreal
The Montreal Institute for Learning Algorithms—the laboratory where Bengio held the line in francophone Canada for thirty years. Three of the heaviest weapons of the deep-learning era—the neural language model, the attention mechanism, and the generative adversarial network—were forged here.
Bengio Returns to Montreal
In 1993, Yoshua Bengio (1964–) joined the Department of Computer Science and Operations Research (DIRO) at the Université de Montréal as an assistant professor. Born in Paris, raised in Montreal, he had just finished postdoctoral years at MIT and Bell Labs—the latter being where Yann LeCun (1960–) had built the LeNet convolutional network. Bengio's faith in neural networks carried over from that Bell Labs collaboration.
Back in Montreal, he built a small group named LISA (Laboratoire d'Informatique des Systèmes Adaptatifs, the Adaptive Systems Informatics Laboratory). The name would expand, move, and reorganize over twenty years until it became today's MILA. But from the 1990s into the early 2000s, LISA was one of the few labs anywhere still doing neural-network research seriously. The Université de Montréal is a francophone research university, separated from anglophone academic centers by a layer of cultural distance—and that distance, in years when the mainstream sneered at neural networks, let LISA quietly keep doing its work.
Neural Language Models and Early Work
In 2003, Bengio with Réjean Ducharme, Pascal Vincent, and Christian Jauvin published A Neural Probabilistic Language Model in the Journal of Machine Learning Research. This paper is now widely treated as the source of the "word embedding" idea. In an era when statistical NLP was still dominated by n-grams, the paper used a shared low-dimensional distributed representation to learn word semantics and language-model probabilities at the same time. A decade later, Mikolov's word2vec would essentially be its engineering simplification.
LISA went on to produce a string of foundational works: Pascal Vincent et al.'s 2008 Denoising Autoencoder; Hugo Larochelle's research on layer-wise pre-training of deep networks; Aaron Courville's work on probabilistic deep models. Bengio's 2009 long-form survey Learning Deep Architectures for AI in Foundations and Trends in Machine Learning was at the time the most systematic textbook-style summary of deep learning.
By around 2010, LISA's research had broadened from neural language models to representation learning, probabilistic deep networks, and the marriage of graphical models with neural networks. Bengio, Geoffrey Hinton (1947–), and LeCun formed the friendship of "the Canadian three of deep learning." They met regularly under the CIFAR NCAP program—and that close cooperation was crucial to the renaissance of deep learning after 2006.
The GAN Flash
In June 2014, in a nondescript bar in Montreal, Ian Goodfellow (1985–) was arguing with friends about a problem in generative modeling. He was then a doctoral student of Bengio and Aaron Courville, working on probabilistic generative models. That night the idea came: let two neural networks compete with each other—one generates fake samples, the other distinguishes real from fake, and the two play to a Nash equilibrium.
He went home and stayed up the night writing the code, getting the first version to run. A month later he, Bengio, Courville, and others submitted Generative Adversarial Nets to NeurIPS 2014. GAN became one of the most influential ideas in generative modeling in the 2010s—StyleGAN, CycleGAN, BigGAN, and almost the entire image-generation field before the rise of diffusion models lived under the GAN paradigm.
In 2016, Goodfellow, Bengio, and Courville co-authored Deep Learning (the "flower book"), the most influential graduate textbook of the deep-learning era. By 2025 it had been translated into nearly twenty languages and adopted in universities worldwide. Most of the writing was done at MILA.
Attention—Another Underrated Light
In September 2014, another of Bengio's PhD students, Dzmitry Bahdanau, with Kyunghyun Cho (then a postdoc in Montreal, later a professor at NYU) and Bengio, published Neural Machine Translation by Jointly Learning to Align and Translate. The paper introduced into neural machine translation a joint mechanism of "alignment plus translation"—later abbreviated as "attention."
This was the first landmark appearance of the attention mechanism in deep learning. Three years later, Vaswani and colleagues at Google proposed the Transformer and put attention on the throne; five years after that, ChatGPT made the word a household term. Many do not realize that the line begins in this 2014 Montreal paper for machine translation. Bengio has said in many forums that the work he believes is "most underrated" in his own research history is this one.
Mid-2010s Montreal produced a string of works that shaped where deep learning went next: Junyoung Chung et al.'s empirical study of GRUs; Bahdanau and Cho's RNN encoder–decoder architecture; Vincent Dumoulin's research on generative modeling and style transfer; Aaron Courville's exploration of multimodal deep learning. Together these put Montreal among the "three great cities of deep learning" by the mid-2010s, alongside Toronto and Google Brain/DeepMind.
MILA Stands Up, and Industry Arrives
By the mid-2010s, LISA had outgrown itself. In 2017 the Université de Montréal, McGill University, HEC Montréal, and Polytechnique Montréal jointly merged Bengio's LISA with several related groups to formally found MILA (the Montreal Institute for Learning Algorithms / Institut québécois d'intelligence artificielle). MILA is an independent non-profit research institute; Bengio became its scientific director.
Canada's Pan-Canadian AI Strategy committed 125 million Canadian dollars in total to MILA, Vector, and Amii, with additional funding from the Quebec provincial government. From there MILA entered a period of large-scale expansion: by 2024 it had more than 130 faculty, over 1,300 students and researchers, and was one of the largest academic deep-learning research centers in the world.
Around MILA, Montreal grew a dense AI startup ecosystem. In 2017 Bengio co-founded Element AI, which ServiceNow acquired for about 230 million dollars in 2020. Hugo Larochelle has long held a position at Google Brain Montreal and is a key bridge between MILA and Google. Aaron Courville, Pascal Vincent, and many others hold dual academic-industry appointments. Microsoft, Google, Meta, ServiceNow, Samsung, and others have all set up research branches around MILA.
Turing Award and an Ethical Turn
On March 27, 2018, the ACM announced that the year's Turing Award would go to Yoshua Bengio, Geoffrey Hinton, and Yann LeCun "for conceptual and engineering breakthroughs that have made deep neural networks a critical component of computing." It was the highest recognition computer science could offer to deep learning.
After the prize, Bengio's research direction took a visible turn. On one front, he pressed harder on more fundamental directions—causal inference, "System 2" deep learning, and world models—arguing that current deep learning is only "System 1," fast, perceptual, pattern-matching, and that real intelligence still requires slow thinking, reasoning, and causality. On another, he poured energy into the ethics and social impact of AI.
In December 2018, Bengio led the publication, with scholars from Toronto, Montreal, and Europe, of the Montreal Declaration for Responsible Development of Artificial Intelligence. With its ten core principles—well-being, autonomy, privacy, solidarity, democracy, equity, responsibility, and sustainability—it became one of the most-cited normative documents in AI ethics.
From 2023, Bengio joined Hinton, Stuart Russell, and others in signing several open letters calling for "a pause on giant AI experiments." Together with Hinton, he turned from a pure founder of deep learning into one of the most credentialed whistle-blowers on AI risk. MILA in turn became a central position for AI safety and responsible AI research in Canada.
MILA as an Institution
By 2026, MILA's place in the global AI map is uniquely its own: it is one of the few top AI centers that still calls itself a "laboratory" rather than an "industry research arm"; its work matches Google DeepMind and FAIR in citations and influence; it keeps a deliberate distance from the dominant anglophone capital narrative, more willing to bring francophone humanism and European ethics into AI discussions.
In its student lineage, MILA has produced a long line of researchers who shape contemporary AI: Ian Goodfellow (1985–) (GAN; later Apple, DeepMind); Bahdanau (attention; Element AI / ServiceNow); Cho (NYU; neural machine translation); Larochelle (Google Brain Montreal); Pascal Vincent (Meta FAIR); David Krueger (Cambridge; AI safety); Yann Dauphin (Meta); Sherjil Ozair (DeepMind); Caglar Gulcehre (DeepMind); Bengio's brother Samy Bengio (Apple, head of AI/ML research).
Montreal's story mirrors Toronto's: Toronto has Hinton, Montreal has Bengio; Toronto incubated AlexNet and Cohere, Montreal incubated GAN and Element AI; Toronto consolidated its forces at the Vector Institute, Montreal at MILA. Together with Edmonton's Amii, the three cities form Canada's "AI iron triangle"—a sample of how a national-scale, long-line bet on a "cold-bench school" can yield world-class returns.
Historian's Note
MILA is, for AI, a model of "small country, deep furrow." Montreal is not a global financial capital; Quebec accounts for less than 20% of Canada's GDP. And yet, over thirty years, Bengio and his colleagues turned this place into a deep-learning summit on par with Silicon Valley, Cambridge, and Beijing. Two ingredients matter. First, betting on directions the mainstream mocks and waiting patiently—the neural language model of the 1990s went almost unnoticed; the 2014 attention mechanism was buried in a small machine-translation crowd; today they are the two cornerstones of large models. Second, refusing academic success as the end goal—after winning the Turing Award, Bengio turned and wrote the Montreal Declaration, lending academic authority to a public agenda. A laboratory that, even at its zenith, still keeps reflection and restraint—a rare character in any era. Montreal's gift to AI is not only GAN and attention but a posture: doing research while examining research. That may be the most necessary house rule for AI in the twenty-first century.
Eyewitness Accounts
Call for contributions
If you have worked or studied at LISA, MILA, Element AI, or any AI-related group at the Université de Montréal, please contribute on GitHub.
References
- Bengio, Y., Ducharme, R., Vincent, P., & Jauvin, C. (2003). "A Neural Probabilistic Language Model." JMLR, 3: 1137–1155.
- Bengio, Y. (2009). "Learning Deep Architectures for AI." Foundations and Trends in Machine Learning, 2(1): 1–127.
- Vincent, P., Larochelle, H., Bengio, Y., & Manzagol, P.-A. (2008). "Extracting and Composing Robust Features with Denoising Autoencoders." ICML 2008.
- Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). "Generative Adversarial Nets." NeurIPS 2014.
- Bahdanau, D., Cho, K., & Bengio, Y. (2015). "Neural Machine Translation by Jointly Learning to Align and Translate." ICLR 2015 (arXiv:1409.0473, 2014).
- Cho, K., van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). "Learning Phrase Representations Using RNN Encoder–Decoder for Statistical Machine Translation." EMNLP 2014.
- Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press.
- ACM (2018). "ACM A.M. Turing Award Citation: Yoshua Bengio, Geoffrey Hinton, Yann LeCun." https://amturing.acm.org/
- Montreal Declaration for Responsible Development of Artificial Intelligence (2018). https://www.montrealdeclaration-responsibleai.com/
- MILA. Annual Reports 2017–2024. https://mila.quebec/
- CIFAR. Pan-Canadian Artificial Intelligence Strategy. https://cifar.ca/
- Sejnowski, T. J. (2018). The Deep Learning Revolution. MIT Press.