Speak While Watching: Unleashing TRUE Real-Time Video Understanding Capability of Multimodal Large Language ModelsPublished in arXiv, 2026Share on Bluesky Facebook LinkedIn X (formerly Twitter) Previous Next