Companies are creating larger and larger language processing systems, although they still suffer from the same disadvantages – they can generate toxic, biased and inaccurate text. Experts objected to the increase in language models and argued that the software does not understand the language and simply repeats the patterns observed in the training data. According to experts, more time and effort should be devoted to inventing new algorithms that are smaller in size and require less computing, instead of simply increasing the size of existing architectures.
Nevertheless, the text processing and generation system developed by Google researchers based on a converter with 540 billion parameters shows that the performance of language models can still improve with increasing size.
“We evaluated the Pathways Language Model (PaLM) on hundreds of language understanding and generation tasks. It provides state—of-the-art performance in several steps for most tasks,” experts from Google Research said.
PaLM coped better with a wide range of tasks, from answering questions and reading comprehension to reasoning based on common sense, than OpenAI GPT-3, Nvidia and Microsoft Megatron-Turing NLG systems, as well as DeepMind Chinchilla and Gopher language models. PaLM was trained using 6144 chips in two Cloud TPU v4 modules, which is by far the largest configuration of Google’s training system.
Despite PaLM’s capabilities, the system still generates offensive and false text and reflects bias in its training data. For example, Muslims are more often associated with stereotypes about violence or terrorism. Like other language models, PaLM was trained on text taken from the Internet. In addition, 50% of the training data comes from conversations on social networks.
According to experts, PaLM “demonstrates breakthrough capabilities in solving many very complex tasks.” The system is capable of explaining jokes and performing multi-step arithmetic tasks, as well as restoring broken code.
PaLM is used for research purposes. Google employees developed the model as a proof of concept for scaling the language model using the Pathways architecture. The goal is to experiment with new techniques and one day create a unified AI system that can generalize thousands or millions of tasks and learn from different types of data.