We just published a dataset using a new (for us) preference modality: direct ranking based on aesthetic preference. We ranked a couple of thousand images from most to least preferred, all sampled from the Open Image Preferences v1 dataset by the amazing @data-is-better-together team.
We just published a dataset using a new (for us) preference modality: direct ranking based on aesthetic preference. We ranked a couple of thousand images from most to least preferred, all sampled from the Open Image Preferences v1 dataset by the amazing @data-is-better-together team.
π₯ Yesterday was a fire day! We dropped two brand-new datasets capturing Human Preferences for text-to-video and text-to-image generations powered by our own crowdsourcing tool!
Whether you're working on model evaluation, alignment, or fine-tuning, this is for you.
π₯ Yesterday was a fire day! We dropped two brand-new datasets capturing Human Preferences for text-to-video and text-to-image generations powered by our own crowdsourcing tool!
Whether you're working on model evaluation, alignment, or fine-tuning, this is for you.
We benchmarked @xai-org 's Aurora model, as far as we know the first public evaluation of the model at scale.
We collected 401k human annotations in over the past ~2 days for this, we have uploaded all of the annotation data here on huggingface with a fully permissive license Rapidata/xAI_Aurora_t2i_human_preferences
1 reply
Β·
reacted to their post with ππ§ πβ€οΈ17 days ago
Runway Gen-3 Alpha: The Style and Coherence Champion
Runway's latest video generation model, Gen-3 Alpha, is something special. It ranks #3 overall on our text-to-video human preference benchmark, but in terms of style and coherence, it outperforms even OpenAI Sora.
However, it struggles with alignment, making it less predictable for controlled outputs.
We've released a new dataset with human evaluations of Runway Gen-3 Alpha: Rapidata's text-2-video human preferences dataset. If you're working on video generation and want to see how your model compares to the biggest players, we can benchmark it for you.
This dataset was collected in roughly 4 hours using the Rapidata Python API, showcasing how quickly large-scale annotations can be performed with the right tooling!
All that at less than the cost of a single hour of a typical ML engineer in Zurich!
The new dataset of ~22,000 human annotations evaluating AI-generated videos based on different dimensions, such as Prompt-Video Alignment, Word for Word Prompt Alignment, Style, Speed of Time flow and Quality of Physics.