OTTER: A Vision-Language-Action Model with Text-Aware Visual Feature Extraction Paper • 2503.03734 • Published Mar 5 • 1 • 2