Multimodal Pre-training
Exploring pre-training paradigms of large models across modalities towards Artificial General Intelligence (AGI).
Paper • 2405.16528 • Published • 3Note Efficient Training | Check TSAIL(https://ml.cs.tsinghua.edu.cn/), GaLore(https://github.com/jiaweizzhao/GaLore) for more.
Scaling Vision Pre-Training to 4K Resolution
Paper • 2503.19903 • Published • 42Note Scalibility | Scalability Research Paradigms for Large Models Research on the scalability of large models typically involves several common paradigms: Scaling Laws Studying how model performance improves as parameters, data, and compute increase, often fitting empirical power-law relationships. This helps predict the returns of training larger models. Key papers: Kaplan et al., "Scaling Laws for Neural Language Models" (2020) Henighan et al., "Scaling Laws for Autoregressive Generative Model
mohdmus99/slurm_commands
Viewer • Updated • 73 • 11Note DeepOps | SLURM skills range from basic job submission to expert-level cluster management. Level 1 covers essential commands like sbatch, srun, squeue, enabling users to run simple jobs. Level 2 focuses on efficient resource requests (--mem, --cpus-per-task), logging, and job history (sacct). Level 3 introduces advanced scheduling techniques such as job dependencies (--dependency), priority management, partitions, and job arrays. Level 4 involves expert skills like diagnosing scheduler behavior
lldacing/flash-attention-windows-wheel
Updated • 238Note Speed-up attention | Other work includes xformers, sage attention(https://github.com/thu-ml/SageAttention). NB CUDA complier.
Embodied-CoT/ecot-openvla-7b-oxe
Robotics • 8B • Updated • 824 • 2Note Minkowsky style Sparse Tensor, to speed up sparse voxel in space. | recommend of pruning of Large Reconstruction Model - style or NeRF - style work, TensorRF, distillat field of 3D Gaussian. Pretrained LRM could be leveraged for Mesh generation. Some game eigine companies: unity, roblox(https://github.com/Roblox/cube/tree/main/cube3d) Recent impressive works and pre-trained backbones: Hunyuan3D, TRELLIS, InstantMesh, Zero123, ... Also check Dream- style works for more generation
BartenderXD/Hunyuan3D
Updated • 11BartenderXD/MVRLT
Viewer • Updated • 128 • 9Scaling Language-Free Visual Representation Learning
Paper • 2504.01017 • Published • 32Reinforcement Pre-Training
Paper • 2506.08007 • Published • 255SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training
Paper • 2505.11594 • Published • 76ranjaykrishna/visual_genome
Updated • 432 • 76BLINK-Benchmark/BLINK
Viewer • Updated • 3.81k • 4.34k • 28gorilla-llm/Berkeley-Function-Calling-Leaderboard
Preview • Updated • 1.38k • 85LightSwitch: Multi-view Relighting with Material-guided Diffusion
Paper • 2508.06494 • Published • 3nvidia/Cosmos-Reason1-7B
Image-Text-to-Text • 8B • Updated • 261k • 138