Skip to main content

Ego4D, do you see what I see?

What if artificial intelligence could look at the world and understand it like we do?

Scientists are teaching AI to use a first-person perspective to perceive the environment through our eyes, which will make AI even more useful, especially when combined with wearable cameras. Currently, most computer vision systems employ imagery from a third-person perspective, but we humans experience the world as the center of the action. This “ego­centric” perception is fundamentally different, and computer vision systems struggle to understand it.

KAUST researchers are part of a collaboration among 13 universities and labs in nine countries to form Ego4D, a Face­book-funded project aimed at solving research challenges in egocentric perception. The project has five benchmarks: help­ing with tasks related to memory, forecasting, hand and object manipulation, audiovisual records and social interaction.

The KAUST team contributed roughly 450 hours of first-per­son video, part of more than 3,000 hours of anonymized video created by more than 700 participants who used wearable cameras to record what they saw in their everyday lives. The result is a publicly available dataset that is more than 20 times larger than the biggest resource of similar imagery.

More than 700 participants wore cameras to record what they saw each day to create more than 3,000 hours of video, which is now a publicly available resource.
© 2021 KAUST; Anastasia Serin

“Given our expertise in human-activity understanding in long-form video, my team specifically targeted the episodic memory benchmark, which focuses on finding moments, objects or answers to language queries occurring in the past,” says Bernard Ghanem, lead researcher in visual computing at KAUST. “In other words, we developed and evaluated baseline methods to take an egocentric video and find the moment or language query you are interested in.”

Just as speech recognition made virtual assistants much more useful, teaching AI to use egocentric video will yield more powerful assistive tools.

“Ego4D enables AI to gain knowledge rooted in the physical and social world, gleaned through the first-person perspec­tive of the people who live in it,” says Kristen Grauman, lead research scientist at Facebook. “Not only will AI start to under­stand the world around it better, it could one day be personal­ized at an individual level — it could know your favorite coffee mug or guide your itinerary for your next family trip. And we’re actively working on assistant-inspired research prototypes that could do just that.”

Chen Zhao with the wearable camera used to record everyday activities.
© 2021 KAUST; Anastasia Serin