Biography · Andrew Ng
He did not invent the large model. He turned a high wall into a public staircase.

A Boy Across Three Maps
Andrew Ng (1976–) was born in London in 1976. His father is from Hong Kong, his mother from a Hong Kong family of doctors. As a child he moved between Hong Kong, Singapore, and finally North America. As a boy at Raffles Institution in Singapore he was already fixated on algorithms and geometry.
In 1992 he went to the United States to study at Carnegie Mellon University (CMU) — one of America's holy places of AI, carrying the tradition laid down by Allen Newell (1927–1992), Herbert Simon (1916–2001), and John McCarthy (1927–2011). He read computer science, statistics, and economics, and graduated with Highest Distinction in 1997. He moved on to MIT for a master's, completing a thesis on reinforcement learning in 1998.
For his doctorate he chose the West Coast and the University of California, Berkeley, working under Michael I. Jordan (1956–), the standard-bearer of Bayesian networks and graphical models. In 2002 he received his PhD with a dissertation titled Shaping and Policy Search in Reinforcement Learning. The dissertation's signature case was an autonomous helicopter, which he taught by reinforcement learning to fly inverted and to roll — footage of the small helicopter looping above the Stanford campus would be replayed many times over.
That year he was twenty-six. He left Berkeley, crossed the Bay, and joined Stanford for good.
Stanford's Lectern
From 2002 onward Ng held a post in Stanford's Department of Computer Science and threw himself into the machine-learning course CS229. The course was at first only for graduate students, drawing perhaps a hundred each year. He spoke clearly, his blackboard work was clean, his derivations patient — support vector machines, maximum likelihood, EM, probabilistic graphical models — he could turn the most mathematical material into something that read like a story.
From 2003 the Stanford School of Engineering experimented with public lectures. In 2008 the videos of CS229 were freely posted on YouTube and on Stanford Engineering's SEE platform. For the first time, the machine-learning course of a top university opened to the entire internet. Engineers from India, China, and Eastern Europe stayed up before dawn watching the progress bar and turning eighty-minute English lectures into notes they then translated into their own languages.
Ng saw that the demand from far away was greater than he had imagined.
In the autumn of 2011, with Daphne Koller — Jordan's peer at Stanford and a leader in probabilistic graphical models — he ran an experiment: put three Stanford courses, Machine Learning, Databases, and Artificial Intelligence, on a simple website, free, open, with assignments and grading. More than 100,000 people enrolled. A new word was born: MOOC (Massive Open Online Course).
In early 2012 the two co-founded Coursera. Within half a year, registered users worldwide passed a million. The boundary of education had been redrawn.
Google Brain and the Cat
Around 2010 Ng began to turn his attention to neural networks, long out of fashion. Geoffrey Hinton (1947–) at Toronto, Yann LeCun (1960–) at NYU, and Yoshua Bengio (1964–) at Montreal were still at it, but industry rarely answered.
In 2009 Ng and his student Rajat Raina published Large-scale Deep Unsupervised Learning using Graphics Processors, one of the earliest papers to systematically demonstrate that GPUs could substantially accelerate the training of deep networks. It came three years before AlexNet, and already wrote out the script of "compute plus deep network."
In 2011 Ng walked into Google X. He proposed to Jeff Dean, then a Google Fellow, and to the researcher Greg Corrado, that they build a distributed deep-learning system spanning thousands of machines and tens of thousands of CPU cores, to see what neural networks could learn given enough compute. The internal codename was Google Brain.
In June 2012 they announced the result: a nine-layer sparse autoencoder trained on 1,000 machines and 16,000 CPU cores with ten million images randomly clipped from YouTube. After three days, a single neuron in the network had spontaneously learned to fire strongly for the face of a cat. The unsupervised "cat" became front-page news. It did not solve intelligence, but it showed the world that a sufficiently large network, sufficient data, and sufficient compute could, out of chaos, grow concepts.
Google Brain went on to give birth to TensorFlow, the Transformer, and an entire infrastructure, and in 2023 merged with DeepMind into Google DeepMind.
Beijing, Baidu, and Apollo
In May 2014 Ng announced that he would join Baidu as Chief Scientist and head of Baidu Research. China's internet giants were placing their first large-scale bets on AI. He set up Baidu Research America in Sunnyvale, built up a speech and deep-learning team in Beijing, and led the architecture of Baidu Brain.
He pushed two lines that had lasting effect: first, the end-to-end speech-recognition system Deep Speech (2014) and Deep Speech 2 (2015), replacing the traditional speech pipeline with a single deep network; second, the autonomous-driving platform Apollo, with Baidu open-sourcing the entire driving software stack in 2017. Apollo's open strategy would later become an important technical foundation for the Chinese automotive industry.
In March 2017 Ng left Baidu. He published an open letter titled "Opening a new chapter of my work in AI" — he intended to give the next stretch of his life back to education and entrepreneurship.
Ten Thousand Kinds of Student
After Baidu he did three things, returning from different directions to a single theme: AI must no longer belong to the few.
First, deeplearning.ai. In August 2017 he launched the Deep Learning Specialization on Coursera — five courses from neural-network basics to sequence and convolutional models. Within a year more than 250,000 had enrolled; within years it passed several million. The companion booklet Machine Learning Yearning was offered free as a pocket book for newcomers.
Second, Landing AI, founded in December 2017 to bring deep learning into manufacturing — visual defect inspection, industrial quality control. Its target was not Silicon Valley giants but factory floors in places like Foxconn.
Third, AI Fund, founded in 2018 as an early-stage fund for AI application teams. He believed the next wave of value would come not from larger models but from embedding models in the small details of every industry.
After GPT-4 came out in 2023, he immediately launched ChatGPT Prompt Engineering for Developers on deeplearning.ai, in cooperation with Sam Altman (1985–)'s OpenAI. The free course attracted more than 400,000 learners in two weeks — once again, he had got the latest technology to the public ahead of everyone else.
The Teacher as Amplifier
Ng is not the author of Attention Is All You Need, not the designer of AlphaGo, not the creator of the GPT line. In every wave he was an early participant, but rarely the inventor at the peak.
But open the CV of any deep-learning practitioner, and you are likely to read: "I started with Andrew Ng's CS229 / Deep Learning Specialization." A 2020 Kaggle survey of practitioners has long placed his courses among the top entry routes for data scientists worldwide. Among China's first generation of large-model engineers, many grew up reading his Stanford blackboard.
He turned a study behind high walls into a course for the public. The leverage of one teacher multiplied by ten million students runs deeper than any single paper.
Selected Works
| Year | Work | Significance |
|---|---|---|
| 2002 | Shaping and Policy Search in Reinforcement Learning (PhD thesis) | Reinforcement learning applied to autonomous helicopters |
| 2009 | "Large-scale Deep Unsupervised Learning using Graphics Processors" (with Rajat Raina), ICML | Systematic demonstration of GPU-accelerated deep-network training |
| 2011 | Stanford CS229 / Coursera Machine Learning | The machine-learning starting point for millions worldwide |
| 2012 | "Building High-level Features Using Large Scale Unsupervised Learning" (the Google Brain "cat"), ICML | Milestone of large-scale distributed unsupervised learning |
| 2014 | Deep Speech (with the Baidu team) | End-to-end deep neural-network speech recognition |
| 2017 | Deep Learning Specialization (deeplearning.ai) | The standard worldwide introduction to deep learning |
| 2018 | Machine Learning Yearning | Free practical handbook for machine learning |
Historian's Note
Historian's Note
Andrew Ng is not the inventor of the LLM, nor the man who wrote down the line of Attention Is All You Need. In every revolution he was an early participant rather than the figure at the highest point. But he did another thing — he opened a course of an aristocratic university to engineers on the other side of the earth, watching the progress bar before dawn; he wrote out the script of "GPU-trained neural networks" three years ahead of time; he lowered the entrance to deep learning from a doctoral dissertation to a video course one could finish in a week; at Baidu, in manufacturing, in venture capital, he proved again and again that the application layer is where AI truly lands. Papers decay in citation; courses do not. If Geoffrey Hinton (1947–), Yann LeCun (1960–), and Yoshua Bengio (1964–) are the patriarchs of the temple of deep learning, Ng is the abbot who opens that temple to the faithful from the four directions. An age needs its inventors, but it also needs its teachers; the teacher's influence is often longer, for it extends through ten million students into countless future inventions.
Eyewitness Accounts
Call for contributions
If you took Ng's CS229 or Deep Learning Specialization, or worked with him at Google Brain, Baidu, or Landing AI, please contribute on GitHub.
References
- Ng, A. Y. (2002). Shaping and Policy Search in Reinforcement Learning. Ph.D. Dissertation, UC Berkeley.
- Raina, R., Madhavan, A., & Ng, A. Y. (2009). "Large-scale Deep Unsupervised Learning using Graphics Processors." Proceedings of ICML.
- Le, Q. V., Ranzato, M., Monga, R., Devin, M., Chen, K., Corrado, G., Dean, J., & Ng, A. Y. (2012). "Building High-level Features Using Large Scale Unsupervised Learning." Proceedings of ICML.
- Hannun, A., Case, C., Casper, J., Catanzaro, B., Diamos, G., Elsen, E., et al. (2014). "Deep Speech: Scaling up End-to-end Speech Recognition." arXiv:1412.5567.
- Ng, A. (2017). "Opening a new chapter of my work in AI." Personal Letter, March 2017. https://medium.com/@andrewng
- Ng, A. (2018). Machine Learning Yearning. deeplearning.ai. https://www.deeplearning.ai/machine-learning-yearning/
- Coursera (2012). "Stanford University, Daphne Koller, Andrew Ng Launch Coursera." Press release, April 2012.
- Markoff, J. (2012). "How Many Computers to Identify a Cat? 16,000." The New York Times, June 25, 2012.
- Metz, C. (2017). "Andrew Ng, AI Pioneer, Leaves Baidu." Wired, March 22, 2017.