jacquesthibs's Quick takes

jacquesthibs

This is a special post for quick takes by jacquesthibs. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

Sorted by

New & upvoted

Click to highlight new quick takes since: Today at 4:27 PM

jacquesthibsMay 26 20231

AI safety

I recently sent in some grant proposals to continue working on my independent alignment research. It gives an overview of what I'd like to work on for this next year (and more really). If you want to have a look at the full doc, send me a DM. If you'd like to help out through funding or contributing to the projects, please let me know.

Here's the summary introduction:

12-month salary for building a language model system for accelerating alignment research and upskilling (additional funding will be used to create an organization), and studying how to supervise AIs that are improving AIs to ensure stable alignment.

Summary

Agenda 1: Build an Alignment Research Assistant using a suite of LLMs managing various parts of the research process. Aims to 10-100x productivity in AI alignment research. Could use additional funding to hire an engineer and builder, which could evolve into an AI Safety organization focused on this agenda. Recent talk giving a partial overview of the agenda.
Agenda 2: Supervising AIs Improving AIs (through self-training or training other AIs). Publish a paper and create an automated pipeline for discovering noteworthy changes in behaviour between the precursor and the fine-tuned models. Short Twitter thread explanation.
Other: create a mosaic of alignment questions we can chip away at, better understand agency in the current paradigm, outreach, and mentoring.

As part of my Accelerating Alignment agenda, I aim to create the best Alignment Research Assistant using a suite of language models (LLMs) to help researchers (like myself) quickly produce better alignment research through an LLM system. The system will be designed to serve as the foundation for the ambitious goal of increasing alignment productivity by 10-100x during crunch time (in the year leading up to existentially dangerous AGI). The goal is to significantly augment current alignment researchers while also providing a system for new researchers to quickly get up to speed on alignment research or promising parts they haven’t engaged with much.

For Supervising AIs Improving AIs, this research agenda focuses on ensuring stable alignment when AIs self-train or train new AIs and studies how AIs may drift through iterative training. We aim to develop methods to ensure automated science processes remain safe and controllable. This form of AI improvement focuses more on data-driven improvements than architectural or scale-driven ones.

I’m seeking funding to continue my work as an independent alignment researcher and intend to work on what I’ve just described. However, to best achieve the project’s goal, I would want additional funding to scale up the efforts for Accelerating Alignment to develop a better system faster with the help of engineers so that I can focus on the meta-level and vision for that agenda. This would allow me to spread myself less thin and focus on my comparative advantages. If you would like to hop on a call to discuss this funding proposal in more detail, please message me. I am open to refocusing the proposal or extending the funding.

jacquesthibsMay 12 20231

AI safetyShow more

I gave talk about my Accelerating Alignment with LLMs agenda about 1 month ago (which is basically a decade in AI tools time). Part of the agenda covered (publicly) here.

I will maybe write an actual post about the agenda soon, but would love to have some people who are willing to look over it. If you are interested, send me a message. I am currently applying for grants and exploring the possibility of building an org focused on speeding up this agenda and avoid spreading myself too thin.

jacquesthibsDec 7 20221

Near-Term AI capabilities probably bring low-hanging fruits for global poverty/health

I'm an alignment researcher, but I still think we should be vigilant about how models like GPT-N could potentially be used to make the world a better place. I like the work that Ought is doing with respect to the academic field (and, hopefully, alignment soon as well). However, my guess is that there are low-hanging fruits popping up because of this new technology, and the non-profit sector has yet to catch up.

This shortform is a Call To Action for any EA entrepreneur, you could potentially boost efficiency of the non-profit sector with the use of these tools. Of course, be careful since GPT-3 will hallucinate sometimes. But putting it in a larger system with checks and balances could 1) make non-profits save time and money 2) make previously inefficient or non-viable non-profits become a top charity.

I could be wrong about this, but my expectation is that there will be a lag between the time people can use GPT effectively for the non-profit sector and when they actually do.