Chris Leong

Organiser @ AI Safety Australia and NZ
6361 karmaJoined Sydney NSW, Australia



Currently doing local AI safety Movement Building in Australia and NZ.


I think many people are overestimating the reputational risks here.

Firstly, cancel culture is past its peak. Secondly, for better or worse, the Overton window is largely than it was previously (I expect this process to continue further). Thirdly, many of the folk who play the 'guilt by association game' already hate us and already have enough ammunition that we aren't going to change their minds. Fourthly, the folk who play that game most strongly mostly wouldn't make good community members anyway. Fifthly, the more you bend in relation to reputational attacks or to ward them off, the more that people see you as a juicy target.

For that reason, I don't think we should prioritise worries about reputational risks nearly as much as you think (in fact, posts like this seem to cause more reputational risks than they potentially solve by implicitly accepting the frame that EA and LightCone and Manifold shouldn't be regarded as separate entities, but all mashed together).

I strongly believe that we should allow each community to pursue its own path. Effective Altruism cares primarily about impact, rationalism primarily about strong epistemics and Manifold about accurate prediction markets. This will naturally lead to divergent preferences about who is acceptable to platform; and I'd much rather embrace the divergence than engage in in-fighting over which community gets to set the norms.

Even though there is some overlap between the communities (myself included), I really think we should push back against conflating the two communities. We should also push to further distinguish Lightcone from Lightcone venue hirers. Collapsing these associations doesn't benefit our reputation.

Interesting project.

Definitely seems like someone should be experimenting with this.

Seems doable most of the time in the best future, but the failure rate will likely be high enough that people wouldn’t want to use it for a while.

I think SoGood ran similar things.

In any case, I think it's clear that AI Safety is no longer 'neglected' within EA, and possibly outside of it.

Maybe a better question is how neglected is this within society? And AI technical research is a lot less neglected than before, but governance work is still looking extremely neglected AND we appear to be in a critical policy window.

It's really hard to know without knowledge of how much a nanny costs, your financial situation and how much you'd value being able to look after your child yourself.

If you'd be fine with a nanny looking after your child, then it is likely worthwhile spending a significant amount of money in order to discover whether you would have a strong fit for alignment research sooner.

I would also suggest that switching out of AI completely was likely a mistake. I'm not suggesting that you should have continued advancing fundamental AI capabilities, but the vast majority of jobs in AI relate to building AI applications rather than advancing fundamental capabilities. Those jobs won't have a significant effect on shortening timelines, but will allow you further develop your skills in AI.

Another thing to consider: if at some point you decide that you are unlikely to break into technical AI safety research, it may be worthwhile to look at contributing in an auxiliary manner, ie. through mentorship or teaching or movement-building.

I think you're underrating the risk of capabilities acceleration.

Interesting research. One thing I'd take into account is that talent need is a somewhat limited measure for impact. I expect that there would be decreasing marginal returns as you add more people to the same research direction. So for example, if you already have 100 people doing interpretability research, I expect that they'd already be picking most of the low-hanging fruit, especially if you're adding more iterators. However, this might be worthwhile anyway if you believe that we're in a short-timeline world and that one of the most important things is producing usable research fast.

I'll post some extracts from the commitments made at the Seoul Summit. I can't promise that this will be a particularly good summary, I was originally just writing this for myself, but maybe it's helpful until someone publishes something that's more polished:

Frontier AI Safety Commitments, AI Seoul Summit 2024

The major AI companies have agreed to Frontier AI Safety Commitments. In particular, they will publish a safety framework focused on severe risks: "internal and external red-teaming of frontier AI models and systems for severe and novel threats; to work toward information sharing; to invest in cybersecurity and insider threat safeguards to protect proprietary and unreleased model weights; to incentivize third-party discovery and reporting of issues and vulnerabilities; to develop and deploy mechanisms that enable users to understand if audio or visual content is AI-generated; to publicly report model or system capabilities, limitations, and domains of appropriate and inappropriate use; to prioritize research on societal risks posed by frontier AI models and systems; and to develop and deploy frontier AI models and systems to help address the world’s greatest challenges"

"Risk assessments should consider model capabilities and the context in which they are developed and deployed" - I'd argue that the context in which it is deployed should account take into account whether it is open or closed source/weights as open-source/weights can be subsequently modified.

"They should also be accompanied by an explanation of how thresholds were decided upon, and by specific examples of situations where the models or systems would pose intolerable risk." - always great to make policy concrete"

In the extreme, organisations commit not to develop or deploy a model or system at all, if mitigations cannot be applied to keep risks below the thresholds." - Very important that when this is applied the ability to iterate on open-source/weight models is taken into account

Seoul Declaration for safe, innovative and inclusive AI by participants attending the Leaders' Session

Signed by Australia, Canada, the European Union, France, Germany, Italy, Japan, the Republic of Korea, the Republic of Singapore, the United Kingdom, and the United States of America.

"We support existing and ongoing efforts of the participants to this Declaration to create or expand AI safety institutes, research programmes and/or other relevant institutions including supervisory bodies, and we strive to promote cooperation on safety research and to share best practices by nurturing networks between these organizations" - guess we should now go full-throttle and push for the creation of national AI Safety institutes

"We recognise the importance of interoperability between AI governance frameworks" - useful for arguing we should copy things that have been implemented overseas.

"We recognize the particular responsibility of organizations developing and deploying frontier AI, and, in this regard, note the Frontier AI Safety Commitments." - Important as Frontier AI needs to be treated as different from regular AI.

Seoul Statement of Intent toward International Cooperation on AI Safety Science

Signed by the same countries.

"We commend the collective work to create or expand public and/or government-backed institutions, including AI Safety Institutes, that facilitate AI safety research, testing, and/or developing guidance to advance AI safety for commercially and publicly available AI systems" - similar to what we listed above, but more specifically focused on AI Safety Institutes which is a great.

"We acknowledge the need for a reliable, interdisciplinary, and reproducible body of evidence to inform policy efforts related to AI safety" - Really good! We don't just want AIS Institutes to run current evaluation techniques on a bunch of models, but to be actively contributing to the development of AI safety as a science.

"We articulate our shared ambition to develop an international network among key partners to accelerate the advancement of the science of AI safety" - very important for them to share research among each other

Seoul Ministerial Statement for advancing AI safety, innovation and inclusivity

Signed by: Australia, Canada, Chile, France, Germany, India, Indonesia, Israel, Italy, Japan, Kenya, Mexico, the Netherlands, Nigeria, New Zealand, the Philippines, the Republic of Korea, Rwanda, the Kingdom of Saudi Arabia, the Republic of Singapore, Spain, Switzerland, Türkiye, Ukraine, the United Arab Emirates, the United Kingdom, the United States of America, and the representative of the European Union

"It is imperative to guard against the full spectrum of AI risks, including risks posed by the deployment and use of current and frontier AI models or systems and those that may be designed, developed, deployed and used in future" - considering future risks is a very basic, but core principle

"Interpretability and explainability" - Happy to interpretability explicitly listed

"Identifying thresholds at which the risks posed by the design, development, deployment and use of frontier AI models or systems would be severe without appropriate mitigations" - important work, but could backfire if done poorly

"Criteria for assessing the risks posed by frontier AI models or systems may include consideration of capabilities, limitations and propensities, implemented safeguards, including robustness against malicious adversarial attacks and manipulation, foreseeable uses and misuses, deployment contexts, including the broader system into which an AI model may be integrated, reach, and other relevant risk factors." - sensible, we need to ensure that the risks of open-sourcing and open-weight models are considered in terms of the 'deployment context' and 'foreseeable uses and misuses'

"Assessing the risk posed by the design, development, deployment and use of frontier AI models or systems may involve defining and measuring model or system capabilities that could pose severe risks," - very pleased to see a focus beyond just deployment

"We further recognise that such severe risks could be posed by the potential model or system capability or propensity to evade human oversight, including through safeguard circumvention, manipulation and deception, or autonomous replication and adaptation conducted without explicit human approval or permission. We note the importance of gathering further empirical data with regard to the risks from frontier AI models or systems with highly advanced agentic capabilities, at the same time as we acknowledge the necessity of preventing the misuse or misalignment of such models or systems, including by working with organisations developing and deploying frontier AI to implement appropriate safeguards, such as the capacity for meaningful human oversight" - this is massive. There was a real risk that these issues were going to be ignored, but this is now seeming less likely.

"We affirm the unique role of AI safety institutes and other relevant institutions to enhance international cooperation on AI risk management and increase global understanding in the realm of AI safety and security." - "Unique role", this is even better!

"We acknowledge the need to advance the science of AI safety and gather more empirical data with regard to certain risks, at the same time as we recognise the need to translate our collective understanding into empirically grounded, proactive measures with regard to capabilities that could result in severe risks. We plan to collaborate with the private sector, civil society and academia, to identify thresholds at which the level of risk posed by the design, development, deployment and use of frontier AI models or systems would be severe absent appropriate mitigations, and to define frontier AI model or system capabilities that could pose severe risks, with the ambition of developing proposals for consideration in advance of the AI Action Summit in France" - even better than above b/c it commits to a specific action and timeline

Load more