Jarod Rieth

JRizzled

JRizzled

AI & ML interests

None yet

Recent Activity

liked a model about 11 hours ago

Shakker-Labs/FLUX.1-dev-LoRA-AntiBlur

liked a Space about 11 hours ago

finegrain/finegrain-image-enhancer

reacted to hesamation's post with ❤️ about 22 hours ago

The best researchers from DeepSeek, OpenAI, Microsoft, and ByteDance explored RL and Reasoning in LLMs, Here's some of their key findings: 1/ RL can further improve distilled models. These models are essentially SFT fine-tuned with the data generated by larger models, and the SFT+RL combo does not disappoint. This is verified in the DeepSeek-R1 paper. 2/ both GRPO and PPO algorithms suffer from length bias; they encourage longer responses. This can be tackled by introducing explicit rewards based on the length of the answer. 3/Most reasoning research is focused on code and math. But training models on logic puzzles improves them for mathematical tasks too. This shows the RL reasoning is generalized beyond the specific domain knowledge. Previous research also shows RL can be a great generalizer. 4/The reasoning might not be only induced by RL; it might already be hidden in the base models due to the pre-training and CoT data they were trained on. So while RL does wake up the reasoning beast, maybe it's not the only solution (e.g. other methods such as distillation) 5/ back to the length bias; reasoning models tend to generate longer responses for wrong answers. RL might be the culprit. RL favours longer answers when the reward is negative, to dilute the penalty per individual token and lower the loss. This might explain the "aha" moments! 6/ OpenAI's competitive programming paper showed an interesting finding: o3 can learn its own test-time strategies (like writing an inefficient but correct solution to verify the answer of an optimized solution) RL helps LLMs develop their own reasoning & verification methods. The recent article by @rasbt helped me a lot in getting a broad view of the recent research on reasoning models. He also lists more influential papers on this topic, It's a must-read if you're interested. check it out 👇 https://magazine.sebastianraschka.com/p/the-state-of-llm-reasoning-model-training

View all activity

Organizations

None yet

JRizzled's activity

liked a model about 11 hours ago

Shakker-Labs/FLUX.1-dev-LoRA-AntiBlur

Text-to-Image • Updated Sep 13, 2024 • 52.5k • 219

liked a Space about 11 hours ago

1.42k

Finegrain Image Enhancer

🖼

Clarity AI Upscaler Reproduction

reacted to hesamation's post with ❤️ about 22 hours ago

Post

2449

The best researchers from DeepSeek, OpenAI, Microsoft, and ByteDance explored RL and Reasoning in LLMs,

Here's some of their key findings:

1/ RL can further improve distilled models. These models are essentially SFT fine-tuned with the data generated by larger models, and the SFT+RL combo does not disappoint.

This is verified in the DeepSeek-R1 paper.

2/ both GRPO and PPO algorithms suffer from length bias; they encourage longer responses. This can be tackled by introducing explicit rewards based on the length of the answer.

3/Most reasoning research is focused on code and math. But training models on logic puzzles improves them for mathematical tasks too.

This shows the RL reasoning is generalized beyond the specific domain knowledge.

Previous research also shows RL can be a great generalizer.

4/The reasoning might not be only induced by RL; it might already be hidden in the base models due to the pre-training and CoT data they were trained on.

So while RL does wake up the reasoning beast, maybe it's not the only solution (e.g. other methods such as distillation)

5/ back to the length bias; reasoning models tend to generate longer responses for wrong answers. RL might be the culprit.

RL favours longer answers when the reward is negative, to dilute the penalty per individual token and lower the loss.

This might explain the "aha" moments!

6/ OpenAI's competitive programming paper showed an interesting finding:

o3 can learn its own test-time strategies (like writing an inefficient but correct solution to verify the answer of an optimized solution)

RL helps LLMs develop their own reasoning & verification methods.
The recent article by @rasbt helped me a lot in getting a broad view of the recent research on reasoning models.

He also lists more influential papers on this topic, It's a must-read if you're interested.

check it out 👇
https://magazine.sebastianraschka.com/p/the-state-of-llm-reasoning-model-training

liked a model 1 day ago

suayptalha/minGRU-Sentiment-Analysis

Text Classification • Updated Dec 28, 2024 • 8 • 4

liked a Space 1 day ago

Chat with Bitnet-b1.58-2B-4T

👾

Chat with Microsoft's 1.58bit Bitnet model!

liked 2 datasets 1 day ago

facebook/PE-Video

Viewer • Updated 4 days ago • 118k • 3.96k • 15

Conard/fortune-telling

Viewer • Updated Feb 17 • 207 • 4.07k • 130

liked a Space 1 day ago

Zen Style Shape

🔥

Structure-Preserving Style Transfer with Canny, Depth & Flux

reacted to Kseniase's post with 👍 1 day ago

Post

5189

11 new types of RAG

RAG is evolving fast, keeping pace with cutting-edge AI trends. Today it becomes more agentic and smarter at navigating complex structures like hypergraphs.

Here are 11 latest RAG types:

1. InstructRAG -> InstructRAG: Leveraging Retrieval-Augmented Generation on Instruction Graphs for LLM-Based Task Planning (2504.13032)
Combines RAG with a multi-agent framework, using a graph-based structure, an RL agent to expand task coverage, and a meta-learning agent for better generalization

2. CoRAG (Collaborative RAG) -> CoRAG: Collaborative Retrieval-Augmented Generation (2504.01883)
A collaborative framework that extends RAG to settings where clients train a shared model using a joint passage store

3. ReaRAG -> ReaRAG: Knowledge-guided Reasoning Enhances Factuality of Large Reasoning Models with Iterative Retrieval Augmented Generation (2503.21729)
It uses a Thought-Action-Observation loop to decide at each step whether to retrieve information or finalize an answer, reducing unnecessary reasoning and errors

4. MCTS-RAG -> MCTS-RAG: Enhancing Retrieval-Augmented Generation with Monte Carlo Tree Search (2503.20757)
Combines RAG with Monte Carlo Tree Search (MCTS) to help small LMs handle complex, knowledge-heavy tasks

5. Typed-RAG - > Typed-RAG: Type-aware Multi-Aspect Decomposition for Non-Factoid Question Answering (2503.15879)
Improves answers on open-ended questions by identifying question types (a debate, personal experience, or comparison) and breaking it down into simpler parts

6. MADAM-RAG -> Retrieval-Augmented Generation with Conflicting Evidence (2504.13079)
A multi-agent system where models debate answers over multiple rounds and an aggregator filters noise and misinformation

7. HM-RAG -> HM-RAG: Hierarchical Multi-Agent Multimodal Retrieval Augmented Generation (2504.12330)
A hierarchical multi-agent RAG framework that uses 3 agents: one to split queries, one to retrieve across multiple data types (text, graphs and web), and one to merge and refine answers

8. CDF-RAG -> CDF-RAG: Causal Dynamic Feedback for Adaptive Retrieval-Augmented Generation (2504.12560)
Works with causal graphs and enables multi-hop causal reasoning, refining queries. It validates responses against causal pathways

To explore what is Causal AI, read our article: https://www.turingpost.com/p/causalai

Subscribe to the Turing Post: https://www.turingpost.com/subscribe

Read further 👇

1 reply

liked a model 2 days ago

arnir0/Tiny-LLM

Text Generation • Updated Nov 3, 2024 • 38.2k • 10

liked 3 models 3 days ago

liked a Space 3 days ago

RF-DETR

🔥

SOTA real-time object detection model

reacted to BrigitteTousi's post with 🤗 4 days ago

Post

2961

AI agents are transforming how we interact with technology, but how sustainable are they? 🌍

Design choices — like model size and structure — can massively impact energy use and cost. ⚡💰 The key takeaway: smaller, task-specific models can be far more efficient than large, general-purpose ones.

🔑 Open-source models offer greater transparency, allowing us to track energy consumption and make more informed decisions on deployment. 🌱 Open-source = more efficient, eco-friendly, and accountable AI.

Read our latest, led by @sasha with assists from myself + @yjernite 🤗
https://huggingface.co/blog/sasha/ai-agent-sustainability