Ondrej Salamon

onslm
·

AI & ML interests

MAGIC to me!!!

Recent Activity

liked a dataset 1 day ago
uonlp/CulturaX
updated a collection 1 day ago
Papers
View all activity

Organizations

None yet

onslm's activity

reacted to davanstrien's post with 👍 3 days ago
view post
Post
1608
I've created a v1 dataset ( davanstrien/reasoning-required) and model ( davanstrien/ModernBERT-based-Reasoning-Required) to help curate "wild text" data for generating reasoning examples beyond the usual code/math/science domains.

- I developed a "Reasoning Required" dataset with a 0-4 scoring system for reasoning complexity
- I used educational content from HuggingFaceFW/fineweb-edu, adding annotations for domains, reasoning types, and example questions

My approach enables a more efficient workflow: filter text with small models first, then use LLMs only on high-value content.

This significantly reduces computation costs while expanding reasoning dataset domain coverage.
liked a Space 3 days ago