Could Advanced AI Accelerate the Pace of AI Progress? Interviews with AI Researchers

Released on 1st March 2025

Jared Leibowich

Nikola Jurkovic

Tom Davidson

Executive Summary

We interviewed five AI researchers from leading AI companies about a scenario where AI systems fully automate AI capabilities research. To ground the setting, we stipulated that each employee is replaced by 30 digital copies with the same skill set and the ability to think 30 times faster than a human. This represents a 900-fold increase in the cognitive labor that AI companies direct towards advancing AI capabilities.

Our key takeaways are:

Compute for experiments will likely be a bottleneck. Finding better algorithms often requires running computationally expensive ML experiments. Even with abundant cognitive labor, AI progress would be constrained by the time needed for these experiments to run.
AI cognitive labor could probably extract significantly more research insights out of limited compute. AI could increase efficiency by improving experimental design, high-level research directions, and project prioritization. Other strategies include running experiments at smaller scales, generating higher-quality synthetic data, or prioritizing algorithmic improvements that do not require large experiments (e.g., scaffolding).
The overall pace of AI progress might be between 2 and 20 times faster in the discussed scenario. However, the researchers emphasized their high degree of uncertainty.
Abundant AI labor might significantly improve experiment design and implementation. This includes eliminating subtle bugs, stopping experiments early, constantly monitoring and analyzing experiments, and making every experiment as informative and efficient as the best experiments are today. Some researchers thought this effect would be small, but others thought it could be very significant.
The gains to smaller experiments would likely be larger than the gains to larger experiments. The cost of smaller experiments tends to be dominated by the time to code them, and most bugs are removed at small scales before larger runs are attempted.
There is uncertainty about the extent to which multiple small-scale experiments could effectively replace large-scale ones. In some cases, new algorithms are effective at large scales but not at small scales.

Potential Bottlenecks	Potential Efficiency Gains
Compute limitations for large experiments Real-world data collection constraints Time needed for experiments to run	Better experiment design and execution Elimination of subtle bugs Improved resource allocation Faster analysis of experimental results Rapid iteration on research ideas

Table 1: A summary of our main takeaways.

Introduction

Recent advances in large language models (LLMs) have enabled novel ways for AI to accelerate the work of AI researchers. Coding copilots enable quick generation of short snippets of code, augmenting the work of software engineers (Ziegler et al., 2022). Currently, AI systems can complete some tasks related to AI R&D, but they have not surpassed expert human-level capabilities on 8-hour AI R&D coding tasks (Wijk et al., 2024).

Capabilities related to accelerating AI R&D are an important consideration for accurately modeling AI capabilities progress. Some have suggested that AI automating a large fraction of AI research could result in rapid increases in AI capabilities (Davidson, 2023), potentially leading to an intelligence explosion (OpenAI, 2023). Such scenarios are highlighted as potentially dangerous in the safety frameworks of OpenAI (OpenAI, 2023), Google DeepMind (Google DeepMind, 2024), and Anthropic (Anthropic, 2024).

We interviewed five AI researchers on the effects of fully automating AI R&D and the bottlenecks that could prevent rapid capability gains.

Existing AI R&D Speedups. Existing AI systems can speed up developers through code completion (Cui et al., 2024), provide ratings for reinforcement learning (Bai et al., 2022), generate pretraining data (Gunasekar et al., 2023), and optimize LLM prompts (Pryzant et al., 2023). More recently, AI agents have gained the ability to perform software engineering tasks such as solving GitHub issues (Jimenez et al., 2024). OpenAI employees have claimed that the o1 model has authored pull requests in the OpenAI codebase (Sequoia Capital, 2024) and Google CEO Sundar Pichai recently claimed that more than a quarter of Google’s new code is generated by AI (Morris, 2024). Additionally, AI systems have become more integrated into the chip supply chain (Goldie et al., 2024).

Interviewing AI Experts about AI R&D. Surveys of AI experts show that a large proportion of experts predict AI will likely be able to automate AI research this century (Grace et al., 2024). Recently, Owen published findings from interviews with AI experts about AI R&D automation (Owen, 2024). Their research focused on characterizing the tasks involved in AI R&D, evaluating AI systems for these capabilities, and predicting when AI R&D might be automated. By contrast, we consider a hypothetical future scenario in which AI R&D has already been fully automated, and explore what the consequences might be.

Bottlenecks to AI progress. Previous work has identified compute as one of the major sources of AI progress (Ho et al., 2024). Other research has attempted to quantify the importance and likelihood of different bottlenecks to AI scaling until 2030 (Sevilla et al., 2024).

Methods

We conducted interviews with five AI researchers (current or past employees of leading AI companies) in March 2024, asking them to engage in the following hypothetical scenario:

Imagine that the compute available for experiments and training is increasing at the rate it has been increasing over the last several years, but there is now a major update: 30 AI-powered copies of each person are at your company, and each copy can "think" 30 times as fast as a human. They can do the same tasks as a human worker with access to a computer and the internet, including navigating operating systems, writing code, writing papers, communicating with other workers, but 30 times faster. Therefore, the company is fully automated.

In the scenario, there is enough compute to run these AI copies in addition to the normal compute the company can access.

Structured interviews were conducted to gauge researchers’ perspectives on the potential acceleration of AI progress in this scenario and to identify the key factors influencing their viewpoints.

Researchers were first asked to consider a significant AI capabilities project they had previously undertaken that lasted at least a month. They discussed how much more quickly it would have been completed in the hypothetical scenario of abundant cognitive labor, and whether its quality could have been improved. They were prompted to discuss what bottlenecks might have prevented the project from going even faster.

The researchers then discussed the implications of the scenario for the overall pace of capabilities advancement, as opposed to only focusing on a specific project. Concretely, the researchers considered the amount of progress that would normally take ten years (at the recent pace of progress), and tried to estimate how much more quickly that progress would occur in the scenario. They did the same exercise for three years and 30 years worth of progress.

The researchers were also asked about how various efficiency gains might get around the compute bottleneck and thus speed up the pace of advancement.

Our full interview script can be found in Appendix B.

Results

Compute as the Main Bottleneck

Most researchers identified the runtime of ML experiments as the primary constraint on accelerating AI capabilities progress in this scenario. One researcher claimed that experimentation is already bottlenecked on compute because there is a backlog of tasks to run, so abundant cognitive labor would not result in enormous speedups. Another researcher highlighted many places for improvement, such as mixed-precision training or tweaks to architecture, data curation, optimizers, and hyperparameters, but said that compute would still be needed for testing these.

The researchers thought that experiment design, coding of the experiments, theoretical analysis, and analysis of experimental results could be sped up and/or improved in the scenario. Additionally, because AI copies could work without taking breaks, there would be better decisions about what experiments to prioritize, more idea generation, and higher quantity and quality of synthetic data. Combining these effects could lead to better utilization of existing compute.

Overall Pace of AI Capabilities Advancement

Researchers estimated that the overall pace of AI progress might be between 2 and 20 times faster in this scenario (over the discussed timeframes), although they emphasized their high degree of uncertainty. See more details in Appendix C. One researcher hypothesized that after exploiting readily available advances, further algorithmic improvements would yield diminishing returns without commensurate increases in computational power. However, the researcher mentioned that there could still be improvements to data, architectures, optimizers, loss functions, and different aspects of training. Another researcher thought that progress would speed up because there would probably be a paradigm shift, similar to the discovery of transformers, over the next 3-10 years, which would encourage exploratory work. Abundant cognitive labor would significantly accelerate the speed of this research.

Examining Sources of Efficiency Gains

Correcting Bugs

Researchers emphasized that it is important to clarify what counts as a bug. On an expanded definition, bugs could include situations such as being confused about a question, then implementing an experiment, and finally looking at the results and realizing that the question was misguided. One researcher claimed there would be a significant efficiency gain if every experiment avoided subtle bugs of this kind. When solely focusing on coding errors, certain experiments, like reinforcement learning, typically have more bugs. These areas would benefit disproportionately once bugs are not an issue. However, although addressing bugs is likely to broadly accelerate coding, certain areas would not be faster, such as analyzing the experiments, running the experiments, and writing code before debugging. Significant time is not lost to bugs in large-scale experiments because they are often first run at smaller scales.

Better Experiment Design

Some researchers thought the potential gains from better experimental design would probably be small, while others thought they might be very large.

Small Gains

One researcher said that the question is hard to interpret because the phrase “deciding what experiments to run" can have multiple interpretations. The researcher emphasized that selecting optimal experiments is a crucial skill, one that many humans already excel at. They also claimed that proficiency in experiment selection has an upper limit, constrained by the inherent noise in experimental data. The researcher argued that analyzing past experiments provides limited guidance for future research directions.

Large Gains

A notable suggestion was that AI models in this scenario could comprehensively review the entire research literature, a task beyond human capacity. Moreover, these models could potentially facilitate rapid idea exchange across the entire research community.

Possibly Substantial Gains

One researcher hypothesized that substantial speed-ups would occur when AI matches human capabilities in higher levels of prioritization and research planning. Another researcher claimed that experiments are probably on a power law distribution of value, and the proposed scenario might sample disproportionately from the higher end of the power law.

Running Many Small Experiments

Researchers noted that small-scale experiments are already standard practice in model training. These smaller experiments are typically constrained by coding time, both for the experiment itself and for implementing new features. Thus, if coding were significantly expedited, many more small experiments could inexpensively test research ideas.

A researcher explained that the value of many small experiments would be limited if the results did not translate to larger scales. The researcher emphasized that some techniques may be ineffective at smaller scales but effective when attempted at larger scales with more compute (e.g. Generative Adversarial Networks). For example, Reinforcement Learning from Human Feedback may not improve GPT-2 or GPT-1, even though it works with GPT-3 or GPT-4. This could be partially mitigated by carefully choosing the scale to obtain the maximum amount of information at the smallest scale possible.

Better Allocation of Human and Compute Resources

Opinions diverged on the potential for improved resource allocation. One perspective suggested that the impact would be minimal, given that GPU usage is already highly optimized, particularly for large-scale experiments. This view emphasized the significant effort already invested in resource prioritization.

However, one researcher thought that the efficiency gains from better organizational scaling would be large. They claimed that significant alignment and communication bottlenecks exist in human companies. For example, parts of the company may engage in activities of questionable value (principal-agent problems), or executives may pose informational bottlenecks. If a company composed of mostly AI workers could solve these problems, there could be substantial gains in efficiency. This might be achieved by scaling up the number of employees that supervise and coordinate other employees, or it could result from coordination benefits exclusive to AIs, such as having many employees share the same weights. However, the researcher hypothesized that most progress stems from exceptional experiments that are being conducted regardless of principal-agent problems.

Tasks with Minimal Compute Bottlenecks

Scaffolding and Fine-tuning

One researcher emphasized that prompting is quite important, so abundant cognitive labor could find efficient prompting and scaffolding methods at scale, which would particularly help research and software engineering. The researcher mentioned that fine-tuning uses quite a small fraction of compute, so AI-powered researchers might create many more specialized fine-tuned models because fine-tuning is more talent-limited than compute-limited.

Recursive Data Generation

When discussing post-training enhancements and fine-tuning, one researcher said that the optimal use of most workers would be to replace pre-training datasets with their cognitive outputs. The models would be scaffolded into a bureaucracy whose collective capabilities are better than those of any individual model, resulting in very high-quality training data. The company could take a very large number of digital minds, direct them at any problem, ask them to make scientific progress on it, and then train their next iteration of systems to emulate their collective outputs, creating a positive feedback loop.

Additional Thoughts

We asked if researchers had final thoughts about the scenario that was proposed to them.

One researcher hypothesized that because compute would be a bottleneck to AI progress, abundant cognitive labor would likely be used to acquire more compute, and AI companies would likely deploy some of their AI agents on economically valuable tasks to acquire more compute.

Another researcher emphasized the importance of real-world data when thinking about abundant cognitive labor. For example, the field of robotics is severely limited by the ability to obtain real-world data. Many AI applications, such as protein folding and control for fusion reactors, need real-world data to validate progress. This real-world data would be extremely slow relative to the amount of available cognitive labor. Thus, the correct way to spend cognition might be to improve the ability to get rapid physical feedback. The researcher explained that this approach is less about AI research and more about developing advanced physical laboratories capable of efficiently harnessing vast cognitive resources. These facilities may differ significantly from conventional labs with human staff.

Multiple researchers responded that it is essential to map out the external landscape in the proposed scenario. In the real world, the AI company would not be in a vacuum, and it would not be limited to using human-level AI for research purposes. Thus, the interactions between the AI company and the rest of society complicate the scenario.

Limitations

Small and biased sample of interviewees. We only interviewed five AI researchers who we contacted through our existing professional networks. We do not consider their views to perfectly rep- resent the broader AI research community. We think our interviewees’ views are most representative of researchers who expect AI progress to be relatively quick over the next decade.

Limitations with the hypothetical scenario. These include:

Compute will not be constant over time: While the scenario assumed that the amount of compute will continue to increase at a similar rate, researchers suggested that large increases in available cognitive labor would likely be used to increase the amount of compute available and improve chip design. Furthermore, a researcher mentioned that these AI-powered researchers could use their abundant cognitive labor to accumulate a large amount of money and power in the world, which they could use to buy chip companies.
AIs will have different skill profiles to humans: One researcher said it is highly unlikely that AI systems would be similar to humans in all ways except for the fact that they are 30 times faster. Rather, these AIs would probably be much better than humans at some things and worse at others.
Narrow focus on a single factor: The study primarily examines the impact of abundant cognitive labor on the pace of algorithmic progress in AI capabilities research. However, there are many other important factors to consider for AI R&D automation, such as the extent of deployment, risks, regulations, and concentration of power. Also, the scenario assumes that AI companies would primarily use their AIs to do AI capabilities research, which may not be the case.
The company might not be fully automated: There are tasks such as maintaining and expanding physical infrastructure, as well as interpersonal tasks such as meeting with stakeholders that might not be instantly automated despite abundant cognitive labor.

Conclusion

The AI researchers focused on compute as the main bottleneck for AI progress but expressed a wide range of perspectives on its expected impact, potential solutions, and other factors impeding progress. The interviews emphasized that the dynamics surrounding AI progress could shift significantly by the time AI reaches human-level AI R&D capabilities, potentially leading to a large increase in the overall speed of AI progress.

Acknowledgements

We thank Romeo Dean, Michael Chen, Eli Lifland, and Al Xin for helpful feedback.

References

Anthropic. (2024, October 15). Responsible Scaling Policy. https://assets.anthropic.com/m/24a47b00f10301cd/original/Anthropic-Responsible-Scaling-Policy-2024-10-15.pdf

Bai, Y., Kadavath, S., Kundu, S., Askell, A., Kernion, J., Jones, A., Chen, A., Goldie, A., Mirhoseini, A., McKinnon, C., Chen, C., Olsson, C., Olah, C., Hernandez, D., Drain, D., Ganguli, D., Li, D., Tran-Johnson, E., Perez, E., … Kaplan, J. (2022). Constitutional AI: Harmlessness from AI Feedback (No. arXiv:2212.08073). arXiv. https://doi.org/10.48550/arXiv.2212.08073

Cui, K. Z., Demirer, M., Jaffe, S., Musolff, L., Peng, S., & Salz, T. (2024). The Productivity Effects of Generative AI: Evidence from a Field Experiment with GitHub Copilot. An MIT Exploration of Generative AI. https://doi.org/10.21428/e4baedd9.3ad85f1c

Davidson, T. (2023, June 27). What a Compute-Centric Framework Says About Takeoff Speeds. Open Philanthropy. https://www.openphilanthropy.org/research/what-a-compute-centric-framework-says-about-takeoff-speeds/

Goldie, A., Mirhoseini, A., Yazgan, M., Jiang, J. W., Songhori, E., Wang, S., Lee, Y.-J., Johnson, E., Pathak, O., Nova, A., Pak, J., Tong, A., Srinivasa, K., Hang, W., Tuncer, E., Le, Q. V., Laudon, J., Ho, R., Carpenter, R., & Dean, J. (2024). Addendum: A graph placement methodology for fast chip design. Nature, 634(8034), E10–E11. https://doi.org/10.1038/s41586-024-08032-5

Google DeepMind. (2024). Frontier Safety Framework. https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/introducing-the-frontier-safety-framework/fsf-technical-report.pdf

Grace, K., Stewart, H., Sandkühler, J. F., Thomas, S., Weinstein-Raun, B., & Brauner, J. (2024). Thousands of AI Authors on the Future of AI (No. arXiv:2401.02843). arXiv. https://doi.org/10.48550/arXiv.2401.02843

Gunasekar, S., Zhang, Y., Aneja, J., Mendes, C. C. T., Giorno, A. D., Gopi, S., Javaheripi, M., Kauffmann, P., Rosa, G. de, Saarikivi, O., Salim, A., Shah, S., Behl, H. S., Wang, X., Bubeck, S., Eldan, R., Kalai, A. T., Lee, Y. T., & Li, Y. (2023). Textbooks Are All You Need (No. arXiv:2306.11644). arXiv. https://doi.org/10.48550/arXiv.2306.11644

Ho, A., Besiroglu, T., Erdil, E., Owen, D., Rahman, R., Guo, Z. C., Atkinson, D., Thompson, N., & Sevilla, J. (2024). Algorithmic progress in language models (No. arXiv:2403.05812). arXiv. https://doi.org/10.48550/arXiv.2403.05812

Jimenez, C. E., Yang, J., Wettig, A., Yao, S., Pei, K., Press, O., & Narasimhan, K. (2024). SWE-bench: Can Language Models Resolve Real-World GitHub Issues? (No. arXiv:2310.06770). arXiv. https://doi.org/10.48550/arXiv.2310.06770

Morris, S. (2024, October 30). Google’s strong earnings boosted by cloud computing gains. Financial Times. https://www.ft.com/content/7f96d467-f59e-4350-8c0b-31011734c24e
OpenAI. (2023, December 18). Preparedness Framework (Beta). https://cdn.openai.com/openai-preparedness-framework-beta.pdf

Owen, D. (2024, August 27). Interviewing AI researchers on automation of AI R&D. Epoch AI. https://epochai.org/blog/interviewing-ai-researchers-on-automation-of-ai-rnd

Pryzant, R., Iter, D., Li, J., Lee, Y. T., Zhu, C., & Zeng, M. (2023). Automatic Prompt Optimization with “Gradient Descent” and Beam Search (No. arXiv:2305.03495). arXiv. https://doi.org/10.48550/arXiv.2305.03495

Sequoia Capital. (2024). Noam Brown and Team on Teaching LLMs to Reason. Sequoia Capital. https://www.sequoiacap.com/podcast/training-data-noam-brown/

Sevilla, J., Tamay Besiroglu, Ben Cottier, Josh You, Edu Roldán, Pablo Villalobos, & Ege Erdil. (2024, August 20). Can AI Scaling Continue Through 2030? Epoch AI. https://epochai.org/blog/can-ai-scaling-continue-through-2030

Wijk, H., Lin, T., Becker, J., Jawhar, S., Parikh, N., Broadley, T., Chan, L., Chen, M., Clymer, J., Dhyani, J., Ericheva, E., Garcia, K., Goodrich, B., Jurkovic, N., Kinniment, M., Lajko, A., Nix, S., Sato, L., Saunders, W., … Barnes, E. (2024). RE-Bench: Evaluating frontier AI R&D capabilities of language model agents against human experts (No. arXiv:2411.15114). arXiv. https://doi.org/10.48550/arXiv.2411.15114

Ziegler, A., Kalliamvakou, E., Li, X. A., Rice, A., Rifkin, D., Simister, S., Sittampalam, G., & Aftandilian, E. (2022). Productivity assessment of neural code completion. Proceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming, 21–29. https://doi.org/10.1145/3520312.3534864

Interview Quotes

For readability, filler words were removed from the quotes.

On better designed experiments because the AI copies are spending more time thinking about how to run the best experiments. “My gut reaction is that doing this close to optimally isn’t that much better than what competent researchers already do."

Experiments being run at smaller scales wherever possible because the AI copies are spending time figuring out when this could be done. “The AI now just has much better information and maybe better-calibrated intuitions than the human would have done about how to decide what to try at large scale."

Improved ways to find new architectures. “The AIs do have more intuition than these dumb NAS search algorithms; you do start to get more value out of it, but it’s still somewhat limited. And then obviously when they get as good as humans, the sort of higher-level prioritization and research planning and so on, you probably start to get more substantial speed-ups."

Ways that a large number of AI researchers could still add very large amounts of value besides just speeding up the project. “I do expect that the code would be very high quality with unit tests, for example, and very well documented, with readme files and everything. And they could maybe write a research log of all the experiments that have been tried: why some experiments were run, what was learned from the experiments, and so on."

Scaling issues while there is abundant cognitive labor. “It might not be the case that you get linear scaling, where more researchers means more research projects. Maybe it’s just really hard to divide up the work between them or they just create more work for each other, like they’re all having to review each other’s pull requests for code, and just like in real-world organizations, this whole pace of development actually slows down the more engineers you add to a project."

On improving efficiency through real-world data. “I feel like the correct way to spend the cognition is going to be improving your ability to get that feedback really fast."

On most researchers mentioning the compute bottleneck. "I’m also glad to hear that people think [compute would be a] bottleneck because the vision of a software-only singularity just sounds terrifying."

Interview Script

INTRO
Thanks for meeting with us today.

I want to emphasize that all the information from my interviews with researchers will be anonymized, and it’s important to me that I don’t ask for or divulge any sensitive information, so I’m not going to ask for any specific details about what you’re working on. As far as what I’m going to do with the report once it’s done, I plan on making a post about it on Lesswrong. I’m happy to let you see the report before I publish it, so you can be assured that I’m not sharing any sensitive details. Does that sound good?

[Wait for interviewee response]

As I mentioned in the email, for my fellowship, we’re exploring what might happen in a scenario where AI can fully automate AI capabilities research. We want to know how much abundant AI cognitive labor could speed up AI capabilities progress. So, for the people I’m interviewing, I have this thought experiment that I mentioned in the email, and I’ll just repeat now to jog your memory: Imagine that compute available for experiments and training is basically increasing at the rate it’s been increasing over the last several years, but now, there is one big difference: there are 30 AI-powered copies of each person at your company working 30 times as fast. This means, your whole company is basically fully automated, and as far as level of "intelligence" for these 30 copies of each person, let’s assume that their cognitive abilities are equal to each worker they’re replacing, with the only differences being that there are 30 times as many of them, and that each copy works 30 times as fast. They can do anything a human worker could do with access to a computer and the internet, including navigating operating systems, writing code, writing papers, communicating with other workers, just 30 times faster.

So there’s just. . . ABUNDANT cognitive labor. . .

I’m really curious about the bottlenecks that might still arise in this new scenario for the speed of AI capabilities progress. So, for my thought exercise, choose a big project you previously worked on, ideally related to improving AI capabilities. Let’s say one that took at least a month to do. And now, starting today, we’re in this new reality I spoke of. How long do you think that same project would take versus how long it originally took? And what bottlenecks arose in this scenario that prevented it from going even faster?

[INTERVIEWEE ANSWER] [OPTIONAL FOLLOW-UP QUESTIONS]

[For each bottleneck they mention ask “Are there ways you could use abundant cognitive labor to get around it?"]

[At the end of their response: “Any other bottlenecks you haven’t mentioned?"] QUESTION #2

Thanks for your response. So, for my next question, I’m curious about, even with the bottlenecks, were there ways that your army of AI researchers could still add very large amounts of value besides just speeding up the project? Like doing it to a higher quality or something like that.

[INTERVIEWEE ANSWER] [OPTIONAL FOLLOW-UP QUESTIONS] QUESTION #3
We just discussed one specific project. Now let’s look at things from a very high level, what do you think would be the overall pace of frontier lab AI capabilities progress, compared to the current pace?

To be more concrete:

Without this abundant cognitive labor, capabilities would improve by a certain amount in the next 3 years, 10 years, and 30 years.
How long do you think it would take AI capabilities to improve by these same amounts in this thought experiment? Remember, the compute available is the same in both scenarios, it’s just that in the second scenario there’s abundant AI cognitive labor.

[At the end of their response: “Any other bottlenecks you haven’t mentioned?"]

[For each bottleneck they mention ask “Are there ways you could use abundant cognitive labor to get around it?"]

[INTERVIEWEE ANSWER] [OPTIONAL FOLLOW-UP QUESTIONS]

QUESTION #4 [Ask in question 3 if they mention the compute bottleneck]

I’m quickly going to list a bunch of other potential sources of efficiency gains for getting around the compute bottleneck, one-by-one, and every time I list one, let me know if you have any thoughts on this as a potential efficiency gain - how big of a gain, etc. Or if you have no thoughts, feel free to pass. Here they are:

[Only mention topics they have not brought up earlier in interview]

No experiments ever have bugs...

[INTERVIEWEE ANSWER]

Better designed experiments because the AI copies are spending more time thinking about how to run the best experiments...

[INTERVIEWEE ANSWER]
Experiments being run at smaller scales wherever possible because the AI copies are spending time figuring out when this could be done, so they could then extrapolate experiments from small to large-scale...

[INTERVIEWEE ANSWER]

Resources being prioritized better between different workers and different teams. . . Both idle GPUs being better prioritized between teams, as well as on a human resources level, workers being shuffled between projects in a more efficient manner. . .

[INTERVIEWEE ANSWER] [OPTIONAL FOLLOW-UP QUESTIONS]

And finally, are there any tasks that you could imagine contributing to massive gains in capabilities very quickly without getting bottlenecked? How much would capabilities improve before these gains run out (e.g. “as much progress as would normally take 5 years")?

[And if they don’t suggest it, mention post-training enhancements (including fine-tuning) as a possibility]

[INTERVIEWEE ANSWER] [OPTIONAL FOLLOW-UP QUESTIONS] QUESTION #5

Do you have any more thoughts before we end the interview?

[INTERVIEWEE ANSWER]

Thanks again for your time. I really appreciate it. Have a wonderful rest of your day!

[Potential clarifying questions]

Q: What about the compute for all these new workers though?
A: They will be allowed to have their own compute; you don’t need to hold that part constant.
Q: Do all 30 workers for each copy of a person have to do the same job, or could I move them around to other tasks?
A: You can move them around to other tasks.

Compilation of Quantitative Estimates

In this section, we compile numerical estimates given by each researcher.
Unless otherwise specified, speed-ups from specific factors refer to the experimental parts of a project taking less time to complete. For instance, a 2x speedup from "No experiments have bugs" would mean that the experiments are completed twice as fast in terms of time in the real world, but not necessarily that the overall project is completed twice as fast.

Researcher #1:

Speed-up of individual project: Thought experiment would cause major AI project to go 50% faster

Pace of overall research over various timelines:

3 years of AI capabilities progress without automation happens in 2 years with automation
10 years of AI capabilities progress without automation happens in 4 or 5 years with automation
30 years of AI capabilities progress without automation happens in 10 years with automation

No experiments have bugs: 30% speed-up

Better designed experiments: 15% speed-up

Experiments done at smaller scales whenever possible: 5% speed-up

Resources being better prioritized: 20% speed-up to the entire project

Researcher #2:

Speed-up of individual project: Thought experiment would cause a 2x to 10x speedup in major AI project

Pace of overall research over various timelines:

5 years of AI capabilities progress without automation happens in 18 months with automation

No experiments ever have bugs: 1.5x to 2x speed-up

Better designed experiments: “Less than fixing all bugs, but still a boost"

Experiments done at smaller scales whenever possible: “I’d rank this above fixing all bugs"

Resources being better prioritized: “About as significant as no bugs"

Other gains:

Mixed-precision training: 2x speed-up

Tweaks to architecture: 2x speed-up

Tweaks to data curation: 2x speed-up

Tweaks to optimizers: 2x speed-up

Researcher #3:

Speed-up of individual project: They thought a conservative estimate was that the thought experiment would cause an 8x-11x speed-up (a project that took 8 months in real life would have taken 3 or 4 weeks in the thought experiment), but they said there is also a chance that the speed-up would be much faster (taking only 2 to 7 days instead).

Pace of overall research over various timelines:

60 years of AI capabilities progress without automation happens in 3 years with automation No experiments ever have bugs: 2x to 5x speed-up

Better designed experiments: 2x to 4x speed-up

Experiments done at smaller scales whenever possible: 10x speed-up

Resources being better prioritized: 10x to 100x speed-up to the entire project, but very wide distribution of possibilities

Researcher #4:

Speed-up of individual project: 10x speed-up Pace of overall research over various timelines:

3 years of AI capabilities progress without automation happens in 3/10 of a year with automation
10 years of AI capabilities progress without automation happens in 3 or 4 years with automation
Researcher noted that they had “very wide bars" with their estimates

No experiments ever have bugs: 2x to 5x speed-up

Better designed experiments: 2x to 5x speed-up

Experiments done at smaller scales whenever possible: “massively speeds things up"

Resources being better prioritized: “Doesn’t matter very much"

Researcher #5:

Speed-up of individual project: Did not give a specific estimate

Pace of overall research over various timelines:

3 years of AI capabilities progress without automation happens in 1 year with automation
10 years of AI capabilities progress without automation happens in 1 or 2 years with automation

No experiments ever have bugs: 5% speed-up
Better designed experiments: 10% to 15% speed-up

Experiments done at smaller scales whenever possible: 15% to 20% speed-up

Resources being better prioritized: 5% speed-up to the entire project

Released on 1st March 2025

Podcast

Newsletter

Could Advanced AI Accelerate the Pace of AI Progress? Interviews with AI Researchers

Citations

Citations

Executive Summary

Introduction

Related Work

Methods

Results

Compute as the Main Bottleneck

Overall Pace of AI Capabilities Advancement

Examining Sources of Efficiency Gains

Correcting Bugs

Better Experiment Design

Small Gains

Large Gains

Possibly Substantial Gains

Running Many Small Experiments

Better Allocation of Human and Compute Resources

Tasks with Minimal Compute Bottlenecks

Scaffolding and Fine-tuning

Recursive Data Generation

Additional Thoughts

Limitations

Conclusion

Acknowledgements

References

Interview Quotes

Interview Script

Compilation of Quantitative Estimates

Researcher #1:

Researcher #2:

Researcher #3:

Researcher #4:

Researcher #5:

Citations