
AsianScientist (Feb. 24, 2025) – Humans are great at applying what they learn to new situations. For example, if toddlers learn to identify color red after they are shown a red ball, flower, or car, they will likely be able to correctly identify the color of strawberry even if they are seeing it for the first time.
A key part of this skill is compositionality—the ability to break things down into reusable parts.
In robotics, researchers are trying to figure out how machines can develop this ability—learning language and physical actions together, even when exposed to partial examples.
To study this, researchers at the Cognitive Neurorobotics Research Unit at the Okinawa Institute of Science and Technology (OIST), Japan, created a brain-inspired AI model that combines vision, movement and language. They wanted to teach a robot how to understand and use language by connecting words with actions. The results were published in Science Robotics.
The scientists designed a system where a robot learned to move or stack colored blocks based on verbal instructions. The goal was to see if the robot could generalize and understand new commands it hadn’t been explicitly trained on.
To train the robot, it was given a set of tasks such as “grasp red” or “put blue on green” which it then learned by observing its own movements – visual and physical sensations – and linking them with words. This was done using a complex artificial intelligence (AI) model, which helped the robot predict and plan actions.
The researchers also tested whether the robot could follow new instructions it hadn’t seen before and checked if the robot could understand instructions by watching other actions rather than performing them itself.
Moreover, the researchers also conducted an ablation study by removing specific AI components, like visual attention and working memory, to see if they were essential for learning.
They observed that the robot performed better when trained with a larger variety of word-action combinations. It learned to generalize, i.e., it could understand new instructions that weren’t in its training set. When visual attention and working memory were removed, the robot struggled to generalize well.
Further tests showed that visual attention and working memory are crucial for accurate learning, without them, the robot struggled to complete tasks correctly. These findings help scientists better understand how humans and AI can learn through a mix of language and physical experience.
“We are continuing our work to enhance the capabilities of this model and are using it to explore various domains of developmental neuroscience. We are excited to see what future insights into cognitive development and language learning processes we can uncover,” said Jun Tani, head of Cognitive Neurobiotics Research Unit, OIST, and senior author of the paper.
“By observing how the model learns to combine language and action,” said Prasanna Vijayaraghavan, first author of the study. “We gain insights into the fundamental processes that underlie human cognition. It has already taught us a lot about compositionality in language acquisition, and it showcases potential for more efficient, transparent, and safe models.”
This research helps develop robots that can better understand and respond to human instructions in real-world settings. Future improvements could make robots more interactive and capable of learning language in a more human-like way.
—
Source: Okinawa Institute of Science and Technology ; Image: Creative Commons Attribution 4.0 International License (CC BY 4.0). Image background modified to adjust to website dimensions.
The article can be found at Development of compositionality through interactive learning of language and action of robots.
Disclaimer: This article does not necessarily reflect the views of AsianScientist or its staff.