On December 12, 2024, the third installment of our AGI Icons series was held in Vancouver. The event featured Jeff Dean, Chief Scientist of Google DeepMind and Google Research, and Jonathan Siddharth, Co-Founder and CEO of Turing.
Turing’s AGI Icons series is designed to bring together AI’s leading minds and host deep dialogues on the biggest barriers and achievements driving innovation. The conversation with Dean and Siddharth provided an exclusive look into Google’s latest model, Gemini 2.0 Flash, and the ways it's poised to accelerate AGI advancements.
Siddharth dove into the heart of the conversation: Dean’s journey with neural networks that culminated in Gemini 2.0.
“You’ve had a fascinating history with neural networks that goes quite far back. Can you talk us through what got you [started] on your journey?”
Jeff Dean’s fascination with neural networks began during his undergraduate studies at the University of Minnesota in 1990.
“I was intrigued because they seemed like this really nice general learning mechanism that could solve problems that we couldn't solve any other way. I felt like if we could just get more computation, we could train bigger and bigger neural networks.”
Inspired by his thesis, Dean undertook a project exploring parallel training strategies on a 32-processor hypercube machine.
“I was naïve,” Dean admitted. “I thought training neural networks on 32 processors would revolutionize everything. What we really needed was a million times more compute power.”
Though the technology of the era was insufficient, these experiments laid the foundation for his future work in model and data parallelism,
Although he admits his efforts “weren’t very practical” for real-world applications at the time, he kept the concepts “in the back of my head.”
Fast-forward to 2011, when Dean was at Google. “I bumped into Andrew Ng because he had signed on as a one-day-a-week consultant.”
When asked what he was doing there, Andrew replied “I don’t know yet, but my students at Stanford are starting to play with neural networks and getting good results.” This immediately sparked Dean’s interest.
“That was when I started the Google Brain effort. And one of our first things we did was build a parallel and distributed system for training neural networks.”
Partnering with Andrew Ng, Dean’s focus on large-scale neural network training using Google’s infrastructure marked a turning point in AI history.
Dean recounted how Google Brain tackled neural networks at scale, despite the limitations of hardware at the time.
“We didn’t have GPUs in our data centers, so we had lots and lots of CPUs,” Dean explained. “We used 2,000 computers, 16,000 cores to train interesting unsupervised computer vision models, speech recognition acoustic models, word embedding systems, and eventually Long short-term memory (LSTM) for sequence-to-sequence models”
The foundational principle was simple: “We want to train very large models using lots and lots of compute and more data.”
This mantra resonated with Siddharth, whose mission is to solve the next bottleneck in AI advancement – moving beyond compute with expert data and human intelligence. “Bigger model, more data, better results.”
Dean agreed. That mantra became the bedrock of modern AI advancements.
“I feel like if the scaling laws could be represented by a person, it would probably be you,” Jonathan joked. “You saw the power of neural networks early, the power of scaling up compute early, distributed training, DisBelief…”
Dean laughed, recalling the origins of DisBelief, an internal distributed training system at Google. “Yeah, and it was also a little bit of a double meaning because some people I talked to within Google were very skeptical that neural networks would work.”
“So I said, ‘Ah, we’ll call it DisBelief, and we’ll show them.’”
Jonathan nodded, reminiscing about the era when neural networks weren’t always the go-to choice.
“When I was taking Andrew Ng’s class at Stanford in the mid-2000s, the belief was that for text classification, you probably wanted to use a support vector machine.” Siddharth continued, “Neural networks were—well, there was that phase, right?”
The discussion then shifted to DeepMind and how their work complemented Google Brain’s efforts and culminated in the creation of Gemini.
Initially, the two teams operated independently. Dean explained, “Within the Brain team, we focused on large-scale training, scaling things up, attacking a bunch of practical problems in vision, speech recognition, and language. DeepMind had focused a bit more on much smaller-scale models, reinforcement learning as a way of learning to do things that you couldn’t easily do with supervised learning.”
As their research agendas converged, merging efforts became the logical step. Dean’s leadership helped unify these teams into Google DeepMind, leveraging their combined strengths to develop state-of-the-art multimodal models.
"Instead of multiple independent efforts where we fragmented our compute and ideas, we decided to work on one unified model that’s multimodal. By combining our efforts, we’ve achieved something greater than the sum of its parts."
This unified approach became Gemini, Google’s flagship multimodal AI model. Siddharth then used the opportunity to transition into the Gemini announcement, asking, “We heard about Gemini 2.0 Flash. Is it number one on SWE-bench verified now?”
“Yeah, I believe it is,” Dean nodded. “We have this expression around our coffee machine in the micro kitchen at work: Such a good model.”
“So what’s new in Gemini 2.0 Flash?” Siddharth probed on behalf of the audience.
“One is we’re announcing the 2.0 series of Gemini and coming out with the Flash model that people can use today,” he said. “The Gemini 2.0 Flash model outperforms on a bunch of academic benchmarks, at the latency and speed characteristics of 1.5 Flash but with improved quality.”
Dean then highlighted several innovations that set Gemini 2.0 apart:
In essence: “These models can take in what you’re sensing and augment your knowledge in a natural and fluid way,” Dean explained.
The conversation then shifted to broader AI challenges and opportunities. Looking ahead, Dean outlined the key frontiers for AI. “You’ll see more interleaved processing and models thinking about what to do next, trying different approaches and learning from failed attempts,” he said.
He also underscored the need for modular and sparse model architectures, advocating for designs inspired by the brain’s organic and efficient structure. Ultra-sparse models, which activate only the necessary parameters during inference, represent a path to greater efficiency.
Siddharth and Dean also discussed the role of software engineering in advancing AI. “As AI gets better at coding, what’s your advice for engineers?” posed Siddharth.
Dean’s response was reassuring: "Some people are worried that AI is going to write software and there won’t be as much need for software engineers. I don’t subscribe to that at all. If anything, AI will enable engineers to be much more productive and spend time solving higher-order problems."
The conversation moved to the topic of ethical AI, focusing on the value of human-generated data and the need for fairness, transparency, and proper recognition of contributors.
Audience members raised questions about the implications of using human data to train LLMs and the challenge of ensuring equitable value distribution. Dean emphasized the importance of balancing innovation with responsibility:
“We need community-driven solutions to ensure fairness,” he said. He noted Google’s opt-out policy as a step in the right direction, but stressed that the industry must do more to balance innovation with responsibility. He also highlighted the need for community-driven solutions to recognize and reward data contributions.
One audience member asked: "There must be a lot of data folks working hard to prepare the data for models like Gemini. How do we ensure they get the recognition they deserve in this large community?"
Dean’s response underscored the collective nature of building cutting-edge AI models like Gemini: "The Gemini models to-date are the efforts of an amazing team at Google from all across Google DeepMind, Google Research, and our infrastructure teams. Some work on data, others on model architecture, and many others on infrastructure software or post-training processes—all of which are crucial."
He added: "It’s important to recognize that creating these models takes a collective effort, and we also rely on external partners like Turing for data curation and labeling."
Siddharth emphasized this point further, noting, "Data is the foundation of everything we do, and it’s important to ensure its value is shared equitably."
In a surprise announcement, Siddharth announced the creation of the Jeff Dean Award, which will recognize engineering excellence among Turing’s global community of developers.
“This award will spotlight those pushing the boundaries of what’s possible,” Siddharth explained.This honor celebrates engineers who have made significant contributions to advancing AI and software development.
“It’s an honor to be part of a community that’s driving meaningful change.” Dean expressed his gratitude, emphasizing the importance of fostering a worldwide community of innovators. “I love the ethos of bringing people together all over the world to work on awesome stuff."
As Siddharth noted, “You have such a unique vantage point to see where AI is headed. Where do you see the AI trajectory for the future that you can predict?”
Dean reflected on the extraordinary potential of AI.
Dean’s vision for the future of AI was ambitious yet pragmatic, painting a picture of systems that are increasingly capable, efficient, and collaborative.
“You’ll see a lot more interleaved processing,” Dean began, “where models think about what to do next, try different approaches, and backtrack when something doesn’t work. Achieving robust multi-step reasoning is one of the big challenges ahead. Right now, models might break down a task into five steps and succeed 75% of the time. But what we really want is for systems to break a task into 50 steps and succeed 90% of the time.”
He emphasized that building this level of reliability will unlock entirely new possibilities. “The ability to handle truly complex, multi-step tasks with consistent success—that’s when these systems will feel transformative in their problem-solving abilities.”
Dean concluded with a call for continued collaboration between engineers, researchers, and the broader AI community. “We’re just scratching the surface of what these systems can do,” he said. “The future lies in enabling people to accomplish complex tasks with ease and creativity. AI will augment human intelligence, helping us solve problems we’ve only dreamed of.”
The conversation encapsulated the ethos of Turing’s AGI Icons series—bringing together thought leaders to illuminate the challenges and opportunities shaping AI.
As Siddharth aptly summarized: “This is just the beginning. The best is yet to come.”
To gain additional AGI insights, check out our previous Icons Series recaps with OpenAI CEO, Sam Altman and Quora CEO, Adam D’Angelo.
Talk to one of our solutions architects and start innovating with AI-powered talent.