vikhyatk
/

moondream2

Image-Text-to-Text

text-generation

Model card Files Files and versions Community

vikhyatk commited on 19 days ago

Commit

28a4bef

·

verified ·

1 Parent(s): 200690c

Update README.md

Files changed (1) hide show

README.md +12 -3

README.md CHANGED Viewed

@@ -50,9 +50,18 @@ print(f"Found {len(points)} person(s)")
 ### Changelog
-**2025-06-21**
-(release notes coming soon)
 **2025-04-15** ([full release notes](https://moondream.ai/blog/moondream-2025-04-14-release))

 ### Changelog
+**2025-06-21** ([full release notes](https://moondream.ai/blog/moondream-2025-06-21-release))
+* **Grounded Reasoning**
+  Introduces a new step-by-step reasoning mode that explicitly grounds reasoning in spatial positions within the image before answering, leading to more precise visual interpretation (e.g., chart median calculations, accurate counting). Enable with `reasoning=True` in the `query` skill to trade off speed vs. accuracy.
+* **Sharper Object Detection**
+  Uses reinforcement learning on higher-quality bounding-box annotations to reduce object clumping and improve fine-grained detections (e.g., distinguishing “blue bottle” vs. “bottle”).
+* **Faster Text Generation**
+  Yields 20–40 % faster response generation via a new “superword” tokenizer and lightweight tokenizer transfer hypernetwork, which reduces the number of tokens emitted without loss in accuracy and eases future multilingual extensions.
+* **Improved UI Understanding**
+  Boosts ScreenSpot (UI element localization) performance from an F1\@0.5 of 60.3 to 80.4, making Moondream more effective for UI-focused applications.
+* **Reinforcement Learning Enhancements**
+  RL fine-tuning applied across 55 vision-language tasks to reinforce grounded reasoning and detection capabilities, with a roadmap to expand to \~120 tasks in the next update.
 **2025-04-15** ([full release notes](https://moondream.ai/blog/moondream-2025-04-14-release))