Jim Lai

grimjim

AI & ML interests

Experimenting primarily with 7B-12B parameter text completion models. Not all models are intended for direct use, but aim for research and/or educational purposes.

Recent Activity

updated a model about 20 hours ago

grimjim/kunoichi-lemon-royale-hamansu-v1-32k-7B

published a model about 21 hours ago

grimjim/kunoichi-lemon-royale-hamansu-v1-32k-7B

updated a model about 21 hours ago

grimjim/kunoichi-lemon-royale-v2experiment1-32K-7B

View all activity

Organizations

Posts 22

Post

1533

I recently have been looking at a paper titled "Why Warmup the Learning Rate? Underlying Mechanisms and Improvements", by Dayal Singh Kalra and Maissam Barkeshli, and was struck by "warmup" being analogous to simulated annealing.
https://arxiv.org/abs/2406.09405
Taking the physical analogy further, the "warmup" is a stochastic process to knock the system out of current local minima, allowing easier transition toward newer minima. It works because it reduces "fit" and therefore "friction".

Post

2281

This recent paper points to an explanation for the unreasonable effectiveness of Frankenmerges: Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach (2502.05171)

Specifically, the duplication of layers in Frankenmerges serves a purpose similar to what occurs in their recurrent-depth architecture. Successful frankenmerges that operate without additional fine-tuning are able to recover or "heal" from any damage due to abrupt transitions between layer blocks. Operational replicated layer blocks can provide functional benefits grounded in latent reasoning. Frankenmerges can also result in hybrid reasoning, by splicing together the latent reasoning of different models.

Back in April 2024, I was able to duplicate a few layers in the Llama 3 8B model, turning it into a 9B model, without harming benchmarks significantly, despite any transition damage.
grimjim/llama-3-experiment-v1-9B
My informal experimentation suggested that latent reasoning circuits could occupy continguous stacks of 2-4 layers, though the result was highly sensitive to the choice of transition location between layers.

View all Posts

Collections 5

models 137

datasets 4

grimjim/PAlign-PAPI-personality_prompt.json-cleaned

Viewer • Updated Mar 11 • 300 • 69

grimjim/ontario_baby_names_1917-2022

Preview • Updated Mar 7 • 42

grimjim/empatheticdialogues

Updated Jan 14 • 39

grimjim/adversarial-10-alpaca

Viewer • Updated Aug 16, 2024 • 10 • 23 • 1

Jim Lai

AI & ML interests

Recent Activity

Organizations

Posts 22

Collections 5

grimjim/DeepSauerHuatuoSkywork-R1-o1-Llama-3.1-8B

grimjim/SauerHuatuoSkywork-o1-Llama-3.1-8B

grimjim/SauerHuatuoSkywork-o1-Llama-3.1-8B-GGUF

grimjim/HuatuoSkywork-o1-Llama-3.1-8B

grimjim/kuno-kunoichi-v1-DPO-v2-SLERP-7B

grimjim/kukulemon-7B

grimjim/kukulemon-spiked-9B

grimjim/kukulemon-32K-7B

models 137

grimjim/kunoichi-lemon-royale-hamansu-v1-32k-7B

grimjim/kunoichi-lemon-royale-v2experiment1-32K-7B

grimjim/kunoichi-lemon-royale-v2ext-32K-7B

grimjim/MagnaMellRei-v1-12B

grimjim/Magnolia-v3a-12B

grimjim/Magnolia-v7-12B

grimjim/MagnaRei-v2-12B

grimjim/Magnolia-v6-12B

grimjim/MagnaRei-v1-12B

grimjim/Magnolia-v5a-12B

datasets 4

grimjim/PAlign-PAPI-personality_prompt.json-cleaned

grimjim/ontario_baby_names_1917-2022

grimjim/empatheticdialogues

grimjim/adversarial-10-alpaca

Jim Lai

AI & ML interests

Recent Activity

Organizations

Posts 22

Collections 5

models 137 Sort: Recently updated

datasets 4 Sort: Recently updated

models 137

datasets 4