
Person | Opinion | Affiliations |
---|---|---|
![]() Finbarr Timbers
Ex-DeepMind
Artfintel
![]() |
We'll achieve superhuman performance on specific tasks with verifiable rewards. I see no evidence for general transfer, but it seems extremely plausible. |
Artfintel
![]() |
![]() Gwern Branwen
Independent Researcher
![]() |
"Everyone neglects to ask, what are we scaling?" Depends on what data they scale up on. The more you scale up on a few domains like coding, the less I expect transfer, as they become ultra-specialized. |
![]() |
![]() Jacob Buckman
Founder of Manifest AI
![]() ![]() |
Generalization can't really be predicted like that except empirically. All I know is as you add more compute and data you go from minimal transfer to some transfer to broad transfer. I have no clue where on that spectrum we stand when we run out of compute or data |
![]() ![]() |
![]() Karthik Narasimhan
Co-author of GPT paper
![]() ![]() |
I expect some generalization with domain specific retraining |
![]() ![]() |
![]() Near
Independent
Independent
|
I think the "spikiness" of intelligence will continue to be notable (models which are extremely good at some things yet quite 'dumb' at others), but that it is easy to improve generalization in the areas we care about, since it just requires some data/RL fun. |
Independent
|
![]() Nathan Lambert
Post-Training Lead at Allen AI
![]() |
New models trained to reason heavily about every subject will come to have better average performance than standard autoregression. In domains with explicit verifiers, this performance will be superhuman, in domains without, reasoning will still enable better performance, but maybe not more economical performance. |
![]() |
![]() Pengfei Liu
Shanghai Jiao Tong University
![]() |
Increased compute and inference time will drive reasoning capabilities to expert-level performance where rich feedback loops exist. However, the development of general reasoning will be gated by two factors: the availability of problems requiring genuine deep thinking, and access to high-quality expert cognitive process data or well-defined reward signals. |
![]() |
![]() Ross Taylor
Led reasoning at Meta AI
![]() |
I think general reasoning will come fairly quickly. Right now it's easier to scale in domains where problems are easy to verify with an external signal. The generalisation will come if models themselves become good verifiers across domains. |
![]() |
![]() Shannon Sands
Nous Research
![]() |
There's at least some generalisation to other tasks like logic puzzles - but it might require more domain specific training to improve on many more out of domain tasks. |
![]() |
![]() Steve Newman
Co-founder of Google Docs
AI Soup
![]() |
This is a trillion dollar question. If I had to guess: we'll see some transfer of reasoning skills across domains, but (on anything resembling current architectures) some specialized training will be needed in each domain. We'll learn a lot one way or another this year. |
AI Soup
![]() |
![]() Tamay Besiroglu
Co-founder of Epoch AI
![]() |
I think minimal transfer is wrong because reasoning is a very general skill that you can apply to perform a wide range of actions. Planning, for instance, is something that requires good reasoning. |
![]() |
![]() Teortaxes
Independent
Independent
|
I think there will be a period of strong 'natively verifiable reasoning overhang' which translates to more general verifiers using models' strong coding ability and general knowledge+tools, then they grok more general regularities of sound reasoning, and the next generation can natively generate good reasoning data for all domains. |
Independent
|
![]() Xeophon
Independent
Independent
|
We will see some generalizations into other domains where the model was not explicitly trained on. For example, R1 writes better and more creative stories than V3, the model it is based on. To push this further, models need to be trained on more data in other domains. |
Independent
|
![]() Chris Barber
Creator of this
Independent
|
Synthesis: The expert takes point to generalization for all logically bound domains where we can construct verifiers for now, and trending in the direction of broad transfer in the future. More notes from experts: @chrisbarber. |
Independent
|
Tamay suggested a good follow up would be to "elicit ideas for experiments that they would expect to turn out one way conditional on "Weak transfer" and another if "Strong transfer" is correct" – let me know if you have ideas.
To answer, tag or DM me at @chrisbarber or email me at [email protected]
Thank you to Amir Haghighat, Arun Rao, Ash Bhat, Avery Lamp, Charlie Songhurst, Connor Mann, Daniel Kang, Dhruv Singh, Eric Jang, Ethan Beal-Brown, Finbarr Timbers, Flo Crivello, Griffin Choe, Gwern, Herrick Fang, Jacob Buckman, James Betker, Jay Hack, Josh Singer, Julian Michael, Katja Grace, Karthik Narasimhan, Logan Graham, Matt Figdore, Mike Choi, Nathan Lambert, Nicholas Carlini, Nitish Kulkarni, Pengfei Liu, Rick Barber, Robert Nishihara, Robert Wachen, Rohit Krishnan, Ron Bhattacharyay, Ross Taylor, Shannon Sands, Spencer Greenberg, Steve Newman, Tamay Besiroglu, Teknium, Teortaxes, Tim Shi, Tim Wee, Tyler Cowen, and Xeophon.