Is helpful/friendly :-) Loves to learn. Wants to solve neglected problems. See website for current progress.
I'm very interested in talking to biosecurity experts about neglected issues with: microneedle array patches, self-spreading animal/human vaccines, paper-based (or other cheap) microfluidic diagnostics, and/or massively-scalable medical countermeasure production via genetic engineering.
Also, interested in talking to experts on early childhood education and/or positive education!
Reach out if you have questions about:
I'll respond to Linkedin the fastest :-)
It takes courage to share such detailed stories of goals not going right! Good on you for having the courage to do so :-)
It seems that two kinds of improvements within EA might be helpful to reduce the probability of other folks having similar experiences.
Proactively, we could adjust the incentives promoted (especially by high-visibility organisations like 80K hours). Specifically, I think it would be helpful to:
To be more specific about what I mean by making content focus on "one of many impactful paths," here are examples of content rewrites on 80K hour's career reviews:
Original: "The highest-impact career for you is the one that allows you to make the biggest contribution to solving one of the world’s most pressing problems."
Rewrite: The highest-impact career for you depends on your unique skills and motivations. Out of the careers that suit you, which ones increase your contributions to solving one of the world's most pressing problems?
Original: "Below we list some other career paths that we don’t recommend as often or as highly as those above, but which can still often be top options for people we advise."
Rewrite: Below, we list some career paths that we recommend less frequently than those above. However, they might specifically be a good fit for your unique preferences.
Original: "The lists are based on 10 years of research and experience advising people, and represent the careers it seems to us will be most impactful over the long run if you get started on them now — though of course we can’t be sure what the future holds."
Rewrite: None, the ending clause on uncertainty is good :-)
Reactively, various efforts have been trying to improve mental health support within EA. I look forward to seeing continued progress in creating easily-accessible collections of resources!
Thank you for your thoughtful questions!
RE: "I guess the goal is to be able to run models on devices controlled by untrusted users, without allowing the user direct access to the weights?"
You're correct in understanding that these techniques are useful for preventing models from being used in unintended ways where models are running on untrusted devices! However, I think of the goal a bit more broadly; the goal is to add another layer of defence behind a cybersecure API (or another trusted execution environment) to prevent a model from being stolen and used in unintended ways.
These methods can be applied when model parameters are distributed on different devices (ex: on a self-driving car that downloads model parameters for low-latency inference time). But they can also be applied when a model is deployed on an API hosted on a trusted server (ex: to reduce the damage caused by a breach).
RE: "without allowing the user direct access to the weights? Because if the user had access to the weights, they could take them as a starting point for fine tuning?"
The four papers I presented don't focus on allowing authorised parties to use AI models without accessing their weights. However, this is recommended by implementing secure APIs instead of directly distributing model parameters whenever possible in (Shevlane, 2022).
Instead, the papers I presented focused on preventing unauthorised parties from being able to use AI models that they illegitimately acquired. The content about fine-tuning was referring to tests to see if unauthorised parties could fine-tune stolen models back to original performance if they also stole some of the original data used to train the model.
RE: "As far as I can tell, the key problem with all of the methods you cover is that, at some point you have have to have the decrypted weights in the memory of an untrusted device." and "The DeepLock paper gestures at the possibility of putting the keys in a TPM. I don't understand their scheduling solution or TPMs well enough to know if that's feasible, but I'm intuitively suspicious"
You're correct about the technical hypotheses you had about when models is unencrypted parameters are stored in memory. I agree, the authors generally give vague explanations for how to keep the keys of the models secure.
Personally, I saw the presented techniques as mainly reducing the easiest opportunities for misuse (ex: a sufficiently well-funded actor like a state or large company could plausibly bypass these techniques, whereas a rogue hacker group may lack the knowledge or resources to do so). This is a useful (but not complete) start, since it means that fewer parties with more predictable incentives can be regulated regarding their use of AI. This is relatively preferred compared to the difficulty of regulating the use of a model like LLaMA (or more advanced) after it is publicly leaked.
RE: Given this, I don't really understand how any of these papers improve over the solution of "just encrypt the weights when not in use"? I feel like there must be something I'm missing here.
You can think of the DeepLock paper as "just encrypt the weights when not in use." Then, the AdvParams paper becomes: "be intelligent about which parameters you encrypt so that you don't have to encrypt/decrypt every single parameter out of millions-billions"
In contrast, the preprocessed input paper has nothing to do with encrypting weights. Its aim is to make the possession of the parameters useless (whether encrypted or not), unless you can preprocess your input in the right way with the secret key.
The hardware accelerated retraining paper is similar in that the model's parameters are intended to be useless (encrypted or not) without the secret key and the hardware scheduling algorithm that determines which neurons get associated with which key. Here, the key is needed to flip the signs of the right weighted inputs at inference time.
RE: Trusted Multiparty Computing
Yes, your analogy is insightful about thinking of the model weights as data contributed by the developer and the in prince data as being contributed by the and user. I certainly agree with (Shevlane, 2022) that we should aim for these kinds of trusted execution environments whenever possible.
However, this may not be possible for all use cases. (I've just been listing the one example with a self-driving car that doesn't have local trusted computing hardware for cost-efficiency purposes, but cannot use servers with these devices for latency reasons. There are lots of other examples in the real world, however.) The other thing to note is that different solutions can be used in combination as "layers of defence." (Ex: encrypt parameter snapshots from training that aren't actively being used, while deploying the most updated parameter snapshot with trusted hardware - assuming this is possible for the use case being considered.)
RE: Model Stealing and Side-Channel Attacks
Yes, the current techniques have important limitations that still need to be fixed (including these attacks and just basic fine-tuning as I showed with some of the techniques above). There's a long way to go in deploying AI algorithms securely :-) In some ways, we're solving this problem at an unprecedented scale after generalised models like ChatGPT became useful to many actors, without the need for any fine tuning. Though an argument is made about how the Google Cloud Computer Vision platform also faced a similar problem previously (Shevlane, 2022).
I'm not sure this is a good idea.
Why potentially reduce the effectiveness of those future interventions by launching this campaign?
If you're interested in supporting education, scholarships to next generation education companies might be worth supporting (example - disclaimer, I've gone through the program of this particular company).
Regarding investments in environmental causes, more neglected causes are more valuable to invest in. For instance, supporting NOVEL carbon capture companies (ie. not tree planting).
Given the high-tech industry in Canada, it might be relatively advantageous to support neglected research priorities.
If you're donating to humanitarian causes, you'd have the greatest impact on the dollar directing resources to Indigenous communities. Interventions related to eCBT (mental health apps) for indigenous youth might be especially promising to fund.
Context: I work as a remote developer in a government department.
Practices that help: