Llasa Collection TTS foundation model compatible with Llama framework (160k hours tokenized speech data released) • 11 items • Updated Feb 21 • 18
Learning to Manipulate Anywhere: A Visual Generalizable Framework For Reinforcement Learning Paper • 2407.15815 • Published Jul 22, 2024 • 14
Wavelets Are All You Need for Autoregressive Image Generation Paper • 2406.19997 • Published Jun 28, 2024 • 32
internlm/internlm-xcomposer2d5-7b Visual Question Answering • Updated Jul 22, 2024 • 1.56k • 203
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output Paper • 2407.03320 • Published Jul 3, 2024 • 96
Running on Zero 754 754 Florence 2 📉 Analyze images to generate captions, detect objects, or perform OCR
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs Paper • 2406.16860 • Published Jun 24, 2024 • 61
view article Article Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models Jun 24, 2024 • 192
LiveMind: Low-latency Large Language Models with Simultaneous Inference Paper • 2406.14319 • Published Jun 20, 2024 • 14