https://storage.googleapis.com/deepmind-media/Era-of-Experience%20/The%20Era%20of%20Experience%20Paper.pdf
If this reads too lofty to you, consider some low-hanging fruits. Experiences here are reward signals we send to LLMs, e.g. human score in RLHF, verification in AlphaProof, or test results for code generation.
RFT (reinforced finetuning) will become main stream, and IMO make LLMs behave more like agents.