Hide table of contents

Summary

  • There are many resources for people who are interested in empirical AI alignment, but there aren’t so many for beginners in the theoretical aspect.
  • I enjoyed proof-based math classes, so I decided to take a course on basic machine learning math from the University of Illinois Urbana-Champaign. This course might be a good resource for people unsure how to start pursuing theoretical research.
  • Overall, I found it was a great introduction to theoretical concepts, but it had very little programming, which may be important from the perspective of career back-up options and for those who are interested in empirical alignment research.

Background

There are many high quality resources for people looking to explore empirical alignment, but there is a lack of the same for theoretical alignment. In particular, I believe that many of the currently existing resources that people point to for beginners in the alignment space (such as the machine learning course on MIT Open Learning Library) are highly oriented toward people interested in pursuing empirical research. (That is not to say that having empirical experience is not useful for people interested in pursuing theoretical work; there’s a good chance that having experience in both domains could help someone make important discoveries.) That being said, I found that there’s a lack of guidance and direction for people who are interested in pursuing the more mathematical, theoretical side of machine learning research.

I’m writing this post to summarize my thoughts on the course as well as my next steps in pursuing AI alignment. I attempt to provide some recommendations for learning more about theoretical alignment research, such as trying to determine if you’d be a good fit for the sort of work, how to follow up on your interest, and how to consider your other options for maximizing your impact. Whilst reading this, please keep in mind that I am not very experienced in the alignment space, and whilst I try to make sure that everything I write is correct, I may have stated some inaccuracies. I also don’t really know much about the questions I try to answer, and much of what I write are my best guesses. Of course, please remember that this is my experience; I think how useful the course will be varies significantly depending on your overall plan/goals as well as individual efforts/aptitudes. I am also writing this with the audience of high school/university students and people in their early career in mind, so it might not be as helpful if you’re more advanced.

Why I wanted to try out alignment research

I wanted to try to pursue alignment research because I believe that working to prevent catastrophic outcomes from AI can be highly impactful. I read The Alignment Problem by Brian Christian, which was my first exposure to ideas about AI risk. And I was tentatively convinced that agents could be “trained” to pursue goals that are different to the goals which the designers actually intended, a la goal misgeneralization. Systems already exhibit misaligned behavior, I thought, and if this pattern of misalignment persists as systems get more and more complex and used in increasingly critical tasks, I couldn’t even begin to fathom the pitfalls that may lie ahead for society. With all this being said, I had uncertainties. For example, I still think that it’s probably true that alignment will need to catch up to AI capabilities as AI systems get more complex, since, without expressing it formally, in order to become “truly smart”, these same systems would need to develop the ability to better understand human needs. My current thought is that danger primarily lies before the achievement of AGI; in other words, I am mainly concerned about the gap between “AI systems that give the appearance of being advanced (for example, some people trust LLM output to the extent that they don’t verify accuracy)” and “AI systems that are truly capable of being aligned with human needs and can perform tasks to a level higher than human capability”.

Exploring your fit for proof-based math

This course (as well as theoretical alignment research in general) is mainly focused on proving properties of systems. But before one takes this course, some prerequisite knowledge in probability and linear algebra is necessary, which requires significant time to acquire.[1] Accordingly, these are probably useful mostly for people with no math experience, but I want to give some pointers to help determine if it’s worth putting in the time to get to the point where you’d be prepared to take this course.

If you’re not familiar with proof-based math, consider taking an intro to math proof writing course. For a 30 minute trial, read this intro to proof writing and see if it interests you. If someone’s only exposure to math was high school math, I would strongly encourage them to try proof-based math because it is a much more creative process than elementary math courses. It is almost always the case that the first time I look at a homework assignment in math classes including the math of ML one, I have no idea how to start doing any of the problems. After a few minutes of thought, I might write down my initial guesses on how to start working on the problems. It is typically only later on, when I’ve taken some time to think about something else, that the pieces of solving the problem begin to come into place and when I begin to formalize my intuitions into actual concepts. I’ve found this process to have frustrating moments (such as when I realize that my intuitions didn’t work out the way I thought initially), but there are also times where I feel an immense feeling of satisfaction at having arrived at a complete proof.

People who have pursued math degrees in university probably already have some idea of how much they enjoy the abstract reasoning and creativity that is associated with higher level mathematical research. Accordingly, I think that how much you enjoyed the kind of thinking necessary for those classes can give an indication of how well suited you would be for working on theoretical alignment research. In addition, most applied math students will already have a background in linear algebra and probability, so they’ll probably already have a decent sense of whether or not they’ll enjoy it.

My experience with the course

I took the Mathematics of Machine Learning course offered through University of Illinois Urbana-Champaign’s NetMath programme in summer 2023, with support from the Long-Term Future Fund. (The course lecture videos are available for free.) The format of the course is watching ~40 recorded lectures as well as completing homework assignments, exams and relatively simple Python-based projects. Overall, I believe that the course does a good job of helping people gain familiarity with major concepts in theoretical ML, but I am somewhat hesitant to recommend it as an option for first-time entrants into the alignment space as I think that other options might do a better job of helping people learn important skills.

Pro: The coursework is mostly done on your own schedule. As the lecture videos are completed on your own schedule, and there are no fixed due dates for homework assignments and projects, the course affords a lot of flexibility with regards to scheduling. Exams do have to be scheduled and taken with virtual proctoring, but these can still be scheduled to accommodate your availability.

Pro: This course has helped me learn about common themes in research papers. For example, in order to understand how much I benefited from the course, I took 40 minutes to skim read a research paper about the convergence of reinforcement learning where the authors attempt to introduce a new algorithm that is better at finding global optima than previous paradigms. The authors discussed the concept of how this algorithm relates to PAC-learning (agnostic PAC-learning) and how it relates to other mathematical concepts in machine learning. I found that, perhaps psychologically, the fact that I already had some of the context from my “math of machine learning” class was helpful because it gave me a sense of confidence that I would likely be able to understand the material given that it connects to things that I've already learnt about. Nevertheless, this class is simply an introduction. I believe that there are many concepts that people will read about in research papers that they simply will have to look into themselves because their previous experiences with the knowledge didn't introduce them to the new concepts. (I think that this is largely a consequence of fields getting larger and larger, causing people to have to learn more to get up to speed.)

However, I think that new entrants to the alignment field may be better off trying to learn about programming, less advanced math (specifically probability and linear algebra) and deep learning. One student in the course graduated with a bachelor’s degree in linguistics in 2012 and had experience working as a developer of NLP algorithms for a governmental agency. He told me that he wanted to take the course to deepen his understanding of machine learning, but he said that the material was too difficult, and hence he dropped out of the course. I am attempting to illustrate my concern that some people who take the “math of machine learning” class and find it difficult might conclude that they are a bad fit for AI alignment and accordingly, decide not to work on AI alignment at all. A person who I spoke to when I wrote this article told me that they also agree that starting with programming can allow people to contribute to research agendas more quickly than if they were to begin with theory. All in all, I think it might make sense to learn about the basics of AI safety technical research, which would already put you in a strong position to contribute to empirical research, and then consider learning about the theoretical aspects.

Con: I think rapid feedback loops and collaboration are valuable, and the course is lacking in this regard. When I was in high school, I used to naively believe that math is an activity that people do on their own in solitude. But in the math department of my own university, I’d often meet with professors for office hours, or simply run into fellow classmates by chance, and quickly discuss approaches to problems on the latest homework assignment. In hindsight, these casual encounters and being able to talk about ideas in a relaxed environment were probably valuable in my efforts to learn about difficult concepts. In contrast, for the “math of machine learning” class, because it was held online, my primary means of getting help were by sending emails to the instructor, arranging virtual meetings with him, or posting on discussion forums. On the surface, these may appear to be similar to the resources available in an in-person class, however it certainly lacks the spontaneity of taking classes in person. (The instructor’s response time to emails was quite variable, and as he preferred communicating over email, we only met for a virtual meeting once at the beginning of the course.) So this could be a disadvantage of taking the class remotely. Of course, different people have different learning styles, and I could imagine cases where the remote class format makes it more accessible to students from a variety of backgrounds. 

Possible pro: I’ve learnt more about how to figure stuff out in general. The course content is quite challenging, and because I didn’t have such quick access to assistance, I quickly began to look up stuff online more proactively. I think that reading and watching other explanations of concepts helped me to understand concepts more thoroughly. And I think that having to write emails to discuss the things I was confused about forced me to order my thoughts more logically compared to if I was simply thinking to myself or discussing out loud.

Con: The course costs $1272 (for undergraduate students) or $1530 (for graduate students). Eligibility for university credit may be a reason in favor of paying. Alternatives like reading the course textbook, watching the lecture videos and seeking tutoring on your own might be better if you aren’t in need of getting university credit.

Possible con: The course does not have a great emphasis on programming. There are two short (1-2 hours) projects that use scikit-learn for tabular data analysis and tf.keras to build a simple neural network. It’s hard for me to say the best next steps, but it seems plausible that these projects might help you have enough experience so that you can begin contributing to open source projects, provided that you have sufficient mentorship and the tenacity to dive deep into code bases and documentation. Because learning about deep learning does require some background knowledge about machine learning, it seems like the deep learning course is a great next step to pursue after taking this “math of machine learning” course. There are also more code-centred introductions to deep learning, like the fast.ai course. However, it seems that having at least some of the experience with the programming involved in ML engineering can be valuable regardless of what type of research you wish to pursue.

ML skills are useful even outside AI safety research, though theory might be less useful. Whilst I believe there’s a good chance that one’s impact in theoretical alignment research is likely to be heavy-tailed, I still think there’s a strong argument to be made for trying out/testing your fit for theoretical alignment research even if you have doubts about having the “requisite intelligence”. Specifically, I believe that machine learning skills are highly likely to be valuable in other areas besides making direct contributions to research in machine learning. I am not certain, but it seems plausible to me that people who pursue theoretical or conceptual alignment research might be at a disadvantage to those working in more empirical areas when it comes to seeking employment outside of the alignment space. Many companies and start-ups have demand for people with expertise in designing ML systems, but it seems like there is much more demand for ML engineers than for theoretical researchers. There’s also a good possibility that someone’s experience with machine learning would be able to get a job that allows them to “earn to give”, further increasing their impact. (But if working on ML capacities increase the chance of existential risk, this might outweigh the positive impact to be had by earning to give, so I think some caution is warranted.)

Learning tip: Spaced repetition and active recall were helpful. I used Anki starting after the first exam to use spaced repetition to help me improve my recall of important concepts, especially the precise wording of definitions. Whilst I’ve used Anki to study Japanese in the past, this is the first time I’ve used Anki for math. People often trivialize the role that memory plays in learning, but I think there’s a good argument to be made for the fact that being able to retrieve pre-existing knowledge easily would help you understand new mathematical ideas more easily, and help you with difficult problems. I found that I would often read a proof in the textbook, and not understand it very deeply after a first read. I would find that attempting to recall the main points of the proof from my memory actually has a somewhat counterintuitive effect of helping me consolidate my understanding of the proof. I think that recall plays a valuable role not just as a way to test one’s understanding, but also as a way to build that understanding.

My next plans, looking back

I haven’t been actively learning about machine learning since November 2023, as, whilst I ultimately achieved an “A” in the course, I also found the course quite difficult. So I updated towards thinking I’m not a good fit for theoretical alignment. In the interim, I’ve been reading about AI alignment research topics and EA without deeply engaging. I want to explore other areas like contributing to AI policy advocacy or working on biosecurity, but a few weeks ago, I decided to further test fit for empirical alignment research. To that end, since late May 2024, I’ve been working on the MLAB curriculum in my free time in order to explore my fit for empirical alignment research. I also want to start giving effectively when my financial position is sturdier, especially if I continue my current work (which in my opinion, is not directly impactful).

There are research topics that I want to pursue on my own or, preferably, with mentorship. As an example, there is a concept called the no-free lunch theorem which informally says that all algorithms have probability distributions that they don’t do well on. I’m curious how much this actually matters, given that in real life, phenomena often follow certain patterns, and hence it may not be that important for an algorithm to work on all probability distributions. 

In reflection, I’m glad that I took the “math of machine learning” course, but I did wish I spent more time pursuing research and other opportunities whilst I was at university. For example, I wish that I had pursued research opportunities with professors and PhD candidates at my university. In addition, whilst I have interned at an organization focused on promoting the U.S.-Japan relationship and at a progressive education organization (where I still volunteer from time to time), it may have been beneficial for me to pursue internships and post-graduation careers in software engineering insofar as it would have allowed me to develop skills valuable for empirical alignment research. (The recommendations in the previous two sentences are also given by Charlie Rogers-Smith.) I’ve just started working full time in March 2024, so I think that I still have a lot of potential to explore as I am in the early stage of my career.

Questions and comments

I am happy to answer any questions. I'd also be happy to receive advice or comments on my plans, so please feel free to post them as a comment to this post. If you prefer to reach me directly, you can reach me at ( jsfi2010@hotmail.com ).

Acknowledgements

I am very grateful to the Long-Term Future Fund for funding this project. In addition, I’d like to express my appreciation to Aidan O’Gara for reviewing this article. I'm also grateful for the advice I've received from a number of people (both inside and outside of the EA community) on my plans.

  1. ^

    There is no technical requirement to have taken prerequisite coursework. One could choose to take the course without any math background, but I think that they'd have to catch up very quickly with the prerequisites.

2

0
0

Reactions

0
0

More posts like this

Comments
No comments on this post yet.
Be the first to respond.
Curated and popular this week
Relevant opportunities