@onekq on Hugging Face: "This is bitter lesson 2.0…"

onekq

posted an update 4 days ago

Post

1467

This is bitter lesson 2.0
https://storage.googleapis.com/deepmind-media/Era-of-Experience%20/The%20Era%20of%20Experience%20Paper.pdf

If this reads too lofty to you, consider some low-hanging fruits. Experiences here are reward signals we send to LLMs, e.g. human score in RLHF, verification in AlphaProof, or test results for code generation.

RFT (reinforced finetuning) will become main stream, and IMO make LLMs behave more like agents.

agentlans

2 days ago

I'm skeptical about the big conclusions in the paper, especially about human society.

AI agent experience is fundamentally different from human experience. Yes, you can give AI multimodal inputs. You can let AI roam around and explore freely, but AI agents aren't limited by their biology and physiology.

They don't get thirsty or hungry.
They don't feel emotions or pain.
They don't grow old, get sick, or die.
They don't reproduce and evolve as a species.

And this gem from the paper:

Perhaps even more importantly, the agent could recognise when its behaviour is triggering human concern, dissatisfaction, or distress, and adaptively modify its behaviour to avoid these negative consequences.

So the AI agent must somehow learn empathy from reward signal data, even though it has no human values or experiences.

onekq

about 15 hours ago

This requires philosophical minds. I am quite sure authors themselves as technologists didn't think about these when they wrote it.

Join the conversation