Is it possible to converse with whales?
There’s an ambitious project aiming to interpret the vocalizations of sperm whales using artificial intelligence and engage in conversations. “I don’t know much about whales. I’ve never seen one in my life,” says Michael Bronstein, an Israeli computer scientist teaching at Imperial College London. Despite not being an ideal candidate for projects involving whale communication, his expertise in machine learning could be key to an ambitious endeavor formally launched in March 2020. An interdisciplinary group of scientists is attempting to decode the language of marine mammals using AI. If successful, the CETI project (Cetacean Translation Initiative) would mark the first time we truly understand what animals are saying, perhaps even enabling conversations with them.
The journey began at Harvard University’s Radcliffe Fellowship in Cambridge, Massachusetts, where an international group of scientists spent a year together. Radcliffe Fellowships promise an opportunity to step away from daily life. One day, Israeli computer scientist and cryptography expert Shafi Goldwasser visited the office of marine biologist David Gruber at City University of New York. Freshly appointed as the new director of UC Berkeley’s Simons Institute for the Theory of Computing, Goldwasser heard a series of clicking sounds that reminded him of malfunctioning electronic circuits or Morse code. It turned out to be the Morse code sperm whales use to communicate with each other. “I said, ‘Maybe we should do a project to translate whale sounds so humans can understand them,'” Goldwasser recalls. “I really said something stupid. I didn’t think he would take me seriously.”
However, the fellowship provided a platform to consider radical ideas seriously. Bronstein had been tracking the latest developments in natural language processing (NLP), a subfield of AI. He was convinced that sperm whale “codas,” their short vocalizations, had structures suitable for such analysis. Luckily, Gruber knew a biologist, Shane Gero, who had been recording sperm whale codas extensively around Dominica Island in the Caribbean since 2005. Bronstein applied several machine learning algorithms to this data. “At least for relatively simple tasks, it seemed to work very well,” he says. However, it was just a proof of concept. To delve deeper, the algorithms needed more context and data—millions of whale codas.
But do animals even have language? This question has long been a topic of debate among scientists, often seen as one of the last bastions of human exclusivity to satisfy. Austrian biologist Konrad Lorenz, a pioneer in animal behavior studies, wrote about his communication with animals in “King Solomon’s Ring,” published in 1949. “Animals do not possess language in the true sense,” wrote Lorenz.
German marine biologist Carsten Brensing, author of multiple books on animal communication, counters, “I rather think we still don’t understand animal language well enough.” Brensing is confident that many animal vocalizations can indeed be considered language. Dogs, for instance, might bark with varying conditions, he says. “First and foremost, language has meaning. That is, certain vocalizations have a fixed meaning that doesn’t change.” For example, the Siberian jay is known to have a vocabulary of about 25 calls, some with specific meanings.
The second criterion is “grammar,” the rules for constructing sentences. For a long time, scientists believed animals’ communication lacked sentence structures. However, in 2016, researchers in Japan published a study in Nature Communications about Japanese tits’ vocalizations. In certain situations, the birds combine two different calls to warn each other of approaching predators. When researchers played this sequence backward, the birds’ response significantly decreased. “This is grammar,” says Brensing.
The third criterion is that if an animal’s vocalization is entirely innate, it’s not considered language. Lorenz believed that animals are born with a repertoire of expressions and don’t learn much throughout their lives. “Animal expressions of emotion are limited to unconscious expressions, such as the calls of the Eurasian jackdaw ‘Kia’ or ‘Kiaw,’ unlike human spoken language,” he wrote.
Some species of animals have been proven to learn new vocabularies, develop dialects, and recognize each other’s names through vocal learning. Some birds mimic cellphone ringtones. Dolphins develop unique whistles, using them like names to identify themselves.
The sperm whale dives deep into the ocean, engaging in long-distance communication using a clicking system. Photo: Amanda Cotton/Project CETI
The clicks of sperm whales are ideal candidates for deciphering their meaning. It’s not just because these clicks can be easily converted into 1s and 0s, unlike the continuous sounds of other whale species. Sperm whales also cannot use body language or facial expressions, crucial communication tools for other animals, as they dive to the deepest ocean depths and communicate over vast distances. “It’s realistic to consider whale communication primarily as acoustic,” says Bronstein. Sperm whales have the largest brains in the animal kingdom, six times larger than humans’. One might imagine that such animals, conversing for long periods, must have something to say to each other. Are they sharing information about fishing grounds? Are whale mothers discussing parenting, akin to human conversations? CETI researchers believe it’s worth exploring.
Learning an unknown language would be easy if there were something like the famous Rosetta Stone, discovered in 1799. This ancient stone had the same text written in three languages, unlocking the key to deciphering Egyptian hieroglyphs. Of course, such a tool doesn’t exist in the animal world. There are no dictionaries between humans and whales or books outlining the grammar rules of sperm whale language.
However, there’s a way around this. Clearly, children learn their native language by observing the language spoken around them without these tools. Researchers conclude that this type of learning is fundamentally statistical. Children remember that the word “dog” is often said when a furry animal enters the room, that certain words are commonly associated with others, and that certain word sequences are more likely than others. Over the past decade, machine learning methods have mimicked this type of learning. Researchers fed vast amounts of language data into massive neural networks. These networks, without knowing the content of the language, could discover the structure of language through statistical observations.
For example, there’s a famous language model called “GPT-3” developed by OpenAI. GPT-3 can complete sentences when given the beginning of a sentence, similar to how predictive text works on smartphones but more sophisticated. The language model processes vast amounts of text from the internet statistically, learning not only which words commonly occur together but also the rules for constructing sentences. The language model can generate correctly pronounced sentences and even produce remarkably high-quality text. Furthermore, this language model can create fake news articles on given topics, summarize complex legal documents in simple terms, and even translate between two languages.
However, this requires vast amounts of data. GPT-3’s neural network is programmed using approximately 175 billion words of data. On the other hand, the codas collected by Shane Gero’s “Dominica Sperm Whale Project” amount to less than 100,000 words. Moreover, we still don’t know what constitutes a “word” in sperm whale language.
If Bronstein’s idea succeeds, developing a system that generates grammatically correct whale utterances similar to human language models becomes quite realistic. The next step would be an interactive chatbot attempting conversations with freely living whales. Of course, whether whales would accept a chatbot as a conversational partner remains unknown. Bronstein muses, “Perhaps the whales would say, ‘Stop talking nonsense!’.”
Artificial Intelligence (AI) researchers are hopeful that AI will be the key to deciphering sperm whale communication. Illustration: Project CETI
However, even if this idea proves successful, the fundamental flaw of all language models is that they don’t understand the content of the language being spoken. It would be ironic if researchers created a bot capable of fluent conversation with whales, only for us to not understand a word. Therefore, researchers believe they must annotate audio recordings with data about whale behavior from the start, such as where the whales are, who is talking to whom, and what reactions are elicited. The challenge is finding a way to automate at least some of these millions of annotations.
Moreover, significant development is needed, including sensors to record individual whales and technologies to monitor their locations. These technologies are necessary to assign specific sounds to individual animals. The CETI project applied for and received funding from TED’s “Audacious Project” for five years. Many organizations, including the National Geographic Society and MIT’s Computer Science and Artificial Intelligence Laboratory, are involved in this project.
The idea of applying machine learning techniques to animal language is not new to CETI researchers. When former physicist, designer, entrepreneur, and technology critic Azar Raskin first heard about the complex language of Gelada monkeys in Africa in 2013, he had a similar idea. Could Natural Language Processing (NLP) techniques developed for processing human languages be applied to animal vocalizations? He founded the Earth Species Project. At that time, this technology was still in its infancy and took another four years to be operationalized as a self-learning method for inter-language automatic translation. This word embedding technology visualizes all words of a language as a multidimensional galaxy, placing closely associated words together and representing their connections with lines. For example, “king” is related to “man,” and “queen” is related to “woman.”
As a result, it was discovered that it’s possible to align maps of two human languages. Currently, this technology can perform textual translations between two human languages, and in the near future, it might be applied to voice recordings without text.
But is it possible to overlay maps of human and animal languages? Raskin believes it’s theoretically possible. “Mammals, in particular, should share some common experiences. The need to breathe, eat, mourn when a child dies, and so on.” At the same time, Raskin expects many areas where the maps don’t align. “There may be parts that can be directly translated and parts that can’t be directly translated into human experience, and it’s uncertain which would be more captivating.” If animals could speak for themselves, and humans could listen, Raskin envisions it as a “moment of great cultural change.”
It’s undeniable that this sperm whale mother and calf are communicating, but researchers are curious about what they’re saying to each other. Photo credit: Amanda Cotton/Project CETI
Certainly, such expectations are somewhat ahead of current research. Some scientists are highly skeptical about whether CETI’s data collection will yield anything truly interesting. Steven Pinker, a linguist and author of “The Language Instinct,” is quite skeptical about this project. “I’m curious about what they’ll find,” he writes in an email. However, he doubts they’ll find rich content and structure in sperm whale codas. “That is, sperm whale codas are distinctive vocalizations that seem to be limited to identifying oneself, perhaps conveying emotions. If whales could convey complex messages, why wouldn’t we see them using this ability for complex tasks, as humans do?”
Diana Reiss, a researcher at Hunter College, CUNY, disagrees. “If you look at you and me right now, we’re having very meaningful communication without either of us doing much,” she answers in a video interview. Similarly, she believes we don’t understand much about what whales are saying to each other. “We’re pretty ignorant at this point,” she says.
Reiss has studied dolphins for years and communicates with them using a simple underwater keyboard. She co-founded the “Interspecies Internet” group to explore effective communication methods with animals. Among her co-founders are musician Peter Gabriel, Internet pioneer Vinton Cerf, and Neil Gershenfeld, director of MIT’s Center for Bits and Atoms. Reiss welcomes CETI’s ambition, especially its interdisciplinary approach.
Some CETI researchers also acknowledge that seeking meaning in whale codas might not yield anything fascinating. Program lead Grover remarks, “One of our biggest risks is that whales could turn out to be incredibly boring. But we don’t think that’s the case. In my experience as a biologist, when you observe something closely, you never find out less than what you thought.”
The name of the CETI project evokes SETI (Search for Extraterrestrial Intelligence). SETI has scanned the skies for radio signals from alien civilizations since the 1960s but hasn’t found any messages yet. Bronstein feels that until signs of extraterrestrial intelligence are found, we should test our decoding abilities with signals detectable on Earth. Instead of pointing antennas into space, we should listen to the alien culture underwater. “It’s extremely arrogant to think that the only intelligent, sentient beings on Earth are Homo sapiens,” Bronstein says. “Discovering that there’s a civilization of animals right under our noses might be a catalyst for changing how we treat our environment and showing more respect for the living world.”