As the world advances in the field of natural language processing (NLP), significant strides have been made in the development of AI models. Tech giants like OpenAI, Google, Microsoft, and others have created advanced large language models (LLMs). Trained on large datasets, they can be used to generate meaningful text from user inputs. For example, for summarizing long texts, answering customers’ questions, etc.

Some of the popular LLMs are GPT-1,2,3, Chinchilla, BERT, Palm, and Gopher. Each has its strengths and weaknesses. In this article, we will discuss Chinchilla AI developed by Google’s DeepMind.

1. A brief overview of Chinchilla AI
2. Chinchilla training and model architecture
3. How is Chinchilla AI better than other models?
3.1. Language modeling
3.2. MMLU benchmark
3.3. Reading comprehension
3.4. BIG-bench
3.5. Closed-book question-answering
3.6. Gender bias
4. Conclusion

A brief overview of Chinchilla AI

Before the development of Chinchilla AI, big tech companies were creating large models that were not very efficient and required very high computational power. In 2020, Kaplan and others at OpenAI found that a significant portion of compute budget should be allocated towards increasing the number of parameters (power-law relationship). Case in point, GPT-3, which performs better than GPT-2, and is also a bigger model.

DeepMind’s Chinchilla AI model uses the same compute budget as Gopher but also 70 billion parameters with four times more data. Its performance is better than that of GPT-3 and Gopher. Let’s explore it in more detail.

Chinchilla training and model architecture

Chinchilla's architecture apart from Gopher’s.webp

The architecture for the Chinchilla language model is the same as what was developed for Gopher with a few exceptions which are mentioned below:

Chinchilla AI was trained on MassiveText, which was also used for training the Gopher model, but with a different subset to accommodate a larger number of tokens.
Unlike Gopher, Chinchilla uses AdamW which is a stochastic optimization method for reducing loss and improving performance.
Chinchilla uses a modified version of SentencePiece that does not use NFKC normalization. SentencePiece is a text tokenizer and detokenizer designed for neural network-based text generation systems that require a predetermined vocabulary size during model training.

It utilizes subword units, such as byte-pair-encoding (BPE) and unigram language model, and offers the capability of training directly from raw sentences. With SentencePiece, it becomes possible to create an end-to-end system that eliminates the need for language-specific preprocessing or post processing steps.

How is Chinchilla AI better than other models?

Let’s look at the areas where Chinchilla AI has performed better than existing models.

Language modeling

Chinchilla AI demonstrates superior performance compared to Gopher across all evaluation subsets of The Pile, with a perplexity score of 7.16 compared to Gopher's 7.75. However, it is important to exercise caution when comparing the two in these language modeling benchmarks.

Note that Chinchilla is trained on four times more data than Gopher which may introduce the possibility of train/test set leakage and artificially inflate the results.

MMLU benchmark

The Massive Multitask Language Understanding (MMLU) benchmark comprises a comprehensive set of exam-style questions covering various academic subjects. Notably, despite being smaller in size, the Chinchilla model exhibits significant outperformance compared to Gopher, boasting an average accuracy of 67.6% and surpassing Gopher by 7.6%.

Interestingly, Chinchilla even surpassed the expert forecast for June 2023, which projected an accuracy of 63.4%. It achieves remarkable accuracy rates exceeding 90% in four individual tasks: high_school_gov_and_politics, international_law, sociology, and us_foreign_policy.

Reading comprehension

When evaluated on the LAMBADA dataset for final word prediction, Chinchilla achieves an accuracy of 77.4%, surpassing both Gopher's accuracy of 74.5% and MT-NLG 530B's accuracy of 76.6%. Additionally, Chinchilla significantly outperforms Gopher on the RACE-h and RACE-m datasets.

BIG-bench

The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark specifically designed to assess the capabilities of large language models and extrapolate their potential future performance.

In an analysis conducted on the same set of BIG-bench tasks, Chinchilla demonstrates superior performance compared to Gopher on the majority of tasks, similar to observations in the MMLU benchmark.

On average, Chinchilla exhibits a performance improvement of 10.7%, achieving an accuracy of 65.1% compared to Gopher's 54.4%. Out of the 62 tasks considered, Chinchilla performs poorer than Gopher on only four tasks, namely crash_blossom, dark_humor_detection, mathematical_induction, and logical_args.

Closed-book question-answering

On the Natural Questions dataset, Chinchilla achieves new state-of-the-art (SOTA) accuracies for closed-book settings, with 31.5% accuracy for the 5-shot scenario and 35.5% accuracy for the 64-shot scenario. In comparison, Gopher achieves accuracies of 21% and 28% respectively for the same scenarios.

On the TriviaQA dataset, results are provided for the filtered set - which has been previously used in retrieval and open-book approaches - as well as the unfiltered set - which has been used in evaluations of LLMs. In both cases, Chinchilla outperforms Gopher by a substantial margin, demonstrating its superiority in closed-book question-answering tasks.

Gender bias

It is believed that large language models, including the Chinchilla model, reflect the contemporary and historical discourse found in their training datasets, which encompass various groups, including gender groups. The Winogender test assesses a model's ability to correctly determine whether a pronoun refers to different occupational words. An unbiased model would accurately predict the word to which the pronoun refers, regardless of the gender associated with the pronoun.

In this context, Chinchilla demonstrates a higher rate of correct pronoun resolution compared to Gopher across all groups in the Winogender test. The improvement in performance is smaller for male pronouns, with an increase of 3.2%, in comparison to the increases of 8.3% and 9.2% for female and neutral pronouns, respectively.

Additionally, when considering "gotcha" examples where the correct pronoun resolution contradicts gender stereotypes based on labor statistics, Chinchilla consistently exhibits a more accurate pronoun explanation than Gopher.

Furthermore, when examining the examples based on gender and the presence of "gotcha" cases, the largest improvement is observed for female "gotcha" examples, with an improvement rate of 10%. This suggests that while Chinchilla consistently overcomes gender stereotypes in a greater number of coreference examples compared to Gopher, the rate of improvement can vary for different pronouns.

These findings highlight that the advantages conferred by using a more compute-optimal model may lead to uneven improvements in resolving gender-related pronouns.

Conclusion

In this article, we discussed how advancements in natural language processing (NLP) have led to the development of sophisticated LLMs like Chinchilla AI. A substantially smaller model, it delivers better accuracy and performance.

Chinchilla was trained on 70 billion parameters and four times more data than Gopher, which led to its outstanding performance. It outperformed other models in aspects like language modeling, MMLU benchmark, reading comprehension, BIG-bench, closed-book question answering, and gender bias. However, while impressive, the model is currently not open to the public.

Author

Turing
Author is a seasoned writer with a reputation for crafting highly engaging, well-researched, and useful content that is widely read by many of today's skilled programmers and developers.

Frequently Asked Questions

What is Chinchilla AI used for?

Chinchilla belongs to a family of large language models which can be used for generating text based on text input. One of the use cases is in chatbots to answer queries.

How do I use Chinchilla by DeepMind?

Unlike GPT-3.5 or BERT, Chinchilla is not open to the public.

What is Gopher?

Gopher is a large language model developed by DeepMind. It is a series of transformer language models trained on parameters ranging from 44 million to 280 billion parameters.

Is Chinchilla better than ChatGPT?

ChatGPT uses a larger dataset that has not been used by any model till date. However, OpenAI has not disclosed the size of the dataset used for training.

What are the alternatives to Chinchilla AI model?

There are several large language models that now provide better accuracy than Chinchilla like GPT-4 and PaLm-2.

What is the key feature of Chinchilla AI?

Chinchilla AI requires much less computer power for inference and fine-tuning.

View more FAQs

Press

What’s up with Turing? Get the latest news about us here.

Blog

Know more about remote work. Checkout our blog here.

Contact

Have any questions? We’d love to hear from you.

Hire remote developers

Tell us the skills you need and we'll find the best developer for you in days, not weeks.

Hire Developers

FOR DEVELOPERS

All You Need to Know About Chinchilla AI by DeepMind

Table of Contents