Abstract: Zero-shot image captioning can harness the knowledge of pre-trained visual language models (VLMs) and language models (LMs) to generate captions for target domain images without paired ...
Neuroscientists have been trying to understand how the brain processes visual information for over a century. The development of computational models inspired by the brain's layered organization, also ...
Abstract: Visual target navigation is a critical capability for autonomous robots operating in unknown environments, particularly in human-robot interaction scenarios. While classical and ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results