Madhav Malhotra

Teaching Assistant @ Centre for AI Safety

571 karmaJoined Jul 2021Pursuing an undergraduate degreeWorking (0-5 years)Toronto, ON, Canada

madhavmalhotra.com

Message

Interests:

BiosecurityAI riskPsychotherapy

Bio

Participation
2

Is helpful/friendly :-) Loves to learn. Wants to solve neglected problems. See website for current progress.

How others can help me

I'm very interested in talking to biosecurity experts about neglected issues with: microneedle array patches, self-spreading animal/human vaccines, paper-based (or other cheap) microfluidic diagnostics, and/or massively-scalable medical countermeasure production via genetic engineering.

Also, interested in talking to experts on early childhood education and/or positive education!

How I can help others

Reach out if you have questions about:

How to have good conversations
How to help people
How to make / get started on goals
How to take care of mental health
How to present well
How to be more creative
How to use problem-solving frameworks
What work is needed on biodiversity loss, carbon capture, plastic pollution, mental health in developing countries, electronics production, renewable electricity, and biosecurity.
Questions about editing podcasts, websites, or CAD designs.
Fun facts about history, biology, or psychology :D Or podcast recommendations in the fields :D

I'll respond to Linkedin the fastest :-)

Posts
21

Sorted by New

Madhav Malhotra's Quick takes

Madhav Malhotra

· 3y ago · 1m read

Know a grad student studying AI's economic impacts?

Madhav Malhotra

· 2y ago · 1m read

Preventing AI Misuse: State of the Art Research and its Flaws

Madhav Malhotra

· 3y ago · 14m read

Summary: The Case for Halting AI Development - Max Tegmark on the Lex Fridman Podcast

Madhav Malhotra

· 3y ago · 5m read

AI, Cybersecurity, and Malware: A Shallow Report [Technical]

Madhav Malhotra

· 3y ago · 10m read

AI, Cybersecurity, and Malware: A Shallow Report [General]

Madhav Malhotra

· 3y ago · 9m read

Young People of EA - Database of Friendly Contacts

Madhav Malhotra

· 3y ago · 1m read

Effective Giving Recommendations in India?

Madhav Malhotra

· 3y ago · 1m read

Intro to Cyberbiosecurity - Kathryn Millet | Biosecure

Madhav Malhotra

· 3y ago · 2m read

Summary of "Technology Favours Tyranny" by Yuval Noah Harari

Madhav Malhotra

· 3y ago · 2m read

Comments
103

What are work practices that you’ve adopted that you now think are underrated?

Answer by Madhav MalhotraApr 28, 202312

Context: I work as a remote developer in a government department.

Practices that help:

Show up at least 3 minutes early to every meeting. Change your clocks to run 3 minutes ahead if you can't discipline yourself to do it. Shows commitment.
- On a related note, take personal time to reflect before a meeting. Think of questions you want to ask or what you want to achieve, even if you're not hosting the meeting and you just do it for 5 minutes.
- Try scheduling a calendar reminder with an intention before the meeting. Ex: Say back what others said before you speak (active listening). Ex: Go out of your way to help. Ex: Red team ideas.
Create a physical calendar and cross off days until the end of a project. Creates urgency.
Displace email communication to some organised form/tracker. Ex: When I have a bunch of bug/features to write code for, I'll ask people to put their comments in one centralised spreadsheet instead of keeping track of email threads.
Host events to build personal connections. Ex: Games lunches, making cards for someone who just had a baby, etc. Takes virtual relationships a lot further.
Ask for recurring feedback. Ex: in a weekly meeting. Forces people to actually reflect on how you've been doing instead of giving superficial answers impromptu. Also, normalises negative feedback as well as positive.
- If you do get superficial responses: "X looks awesome!" - ask followups like: "Could you give me an example of what went well so that I know what to keep doing?"

Story of a career/mental health failure

Madhav Malhotra3y14

It takes courage to share such detailed stories of goals not going right! Good on you for having the courage to do so :-)

It seems that two kinds of improvements within EA might be helpful to reduce the probability of other folks having similar experiences.

Proactively, we could adjust the incentives promoted (especially by high-visibility organisations like 80K hours). Specifically, I think it would be helpful to:

Recommend that early-career folks try out university programs with internships/coops in the field they think they'd enjoy. This would help error-correct earlier rather than later.
Adjust the articles on high-visibility sites to focus less on finding the "most" impactful career path, but instead one of many impactful career paths. I especially say this because sites like 80K hours have gotten a lot more general traffic ever since they vastly increased marketing. When you're reaching a broader target audience (especially for the first time), it's not as essential to urgently direct someone to the exact right career path. It might be a more reasonable goal to get them thinking about a few options. Then, those who want to refine their plan can be directed to more specialised resources within EA (ex: biosecurity -> reading list).

To be more specific about what I mean by making content focus on "one of many impactful paths," here are examples of content rewrites on 80K hour's career reviews:

Original: "The highest-impact career for you is the one that allows you to make the biggest contribution to solving one of the world’s most pressing problems."
Rewrite: The highest-impact career for you depends on your unique skills and motivations. Out of the careers that suit you, which ones increase your contributions to solving one of the world's most pressing problems?

Original: "Below we list some other career paths that we don’t recommend as often or as highly as those above, but which can still often be top options for people we advise."
Rewrite: Below, we list some career paths that we recommend less frequently than those above. However, they might specifically be a good fit for your unique preferences.

Original: "The lists are based on 10 years of research and experience advising people, and represent the careers it seems to us will be most impactful over the long run if you get started on them now — though of course we can’t be sure what the future holds."

Rewrite: None, the ending clause on uncertainty is good :-)

Reactively, various efforts have been trying to improve mental health support within EA. I look forward to seeing continued progress in creating easily-accessible collections of resources!

Preventing AI Misuse: State of the Art Research and its Flaws

Madhav Malhotra3y1

Thank you for your thoughtful questions!

RE: "I guess the goal is to be able to run models on devices controlled by untrusted users, without allowing the user direct access to the weights?"

You're correct in understanding that these techniques are useful for preventing models from being used in unintended ways where models are running on untrusted devices! However, I think of the goal a bit more broadly; the goal is to add another layer of defence behind a cybersecure API (or another trusted execution environment) to prevent a model from being stolen and used in unintended ways.

These methods can be applied when model parameters are distributed on different devices (ex: on a self-driving car that downloads model parameters for low-latency inference time). But they can also be applied when a model is deployed on an API hosted on a trusted server (ex: to reduce the damage caused by a breach).

RE: "without allowing the user direct access to the weights? Because if the user had access to the weights, they could take them as a starting point for fine tuning?"

The four papers I presented don't focus on allowing authorised parties to use AI models without accessing their weights. However, this is recommended by implementing secure APIs instead of directly distributing model parameters whenever possible in (Shevlane, 2022).

Instead, the papers I presented focused on preventing unauthorised parties from being able to use AI models that they illegitimately acquired. The content about fine-tuning was referring to tests to see if unauthorised parties could fine-tune stolen models back to original performance if they also stole some of the original data used to train the model.

RE: "As far as I can tell, the key problem with all of the methods you cover is that, at some point you have have to have the decrypted weights in the memory of an untrusted device." and "The DeepLock paper gestures at the possibility of putting the keys in a TPM. I don't understand their scheduling solution or TPMs well enough to know if that's feasible, but I'm intuitively suspicious"

You're correct about the technical hypotheses you had about when models is unencrypted parameters are stored in memory. I agree, the authors generally give vague explanations for how to keep the keys of the models secure.

Personally, I saw the presented techniques as mainly reducing the easiest opportunities for misuse (ex: a sufficiently well-funded actor like a state or large company could plausibly bypass these techniques, whereas a rogue hacker group may lack the knowledge or resources to do so). This is a useful (but not complete) start, since it means that fewer parties with more predictable incentives can be regulated regarding their use of AI. This is relatively preferred compared to the difficulty of regulating the use of a model like LLaMA (or more advanced) after it is publicly leaked.

RE: Given this, I don't really understand how any of these papers improve over the solution of "just encrypt the weights when not in use"? I feel like there must be something I'm missing here.

You can think of the DeepLock paper as "just encrypt the weights when not in use." Then, the AdvParams paper becomes: "be intelligent about which parameters you encrypt so that you don't have to encrypt/decrypt every single parameter out of millions-billions"

In contrast, the preprocessed input paper has nothing to do with encrypting weights. Its aim is to make the possession of the parameters useless (whether encrypted or not), unless you can preprocess your input in the right way with the secret key.

The hardware accelerated retraining paper is similar in that the model's parameters are intended to be useless (encrypted or not) without the secret key and the hardware scheduling algorithm that determines which neurons get associated with which key. Here, the key is needed to flip the signs of the right weighted inputs at inference time.

RE: Trusted Multiparty Computing

Yes, your analogy is insightful about thinking of the model weights as data contributed by the developer and the in prince data as being contributed by the and user. I certainly agree with (Shevlane, 2022) that we should aim for these kinds of trusted execution environments whenever possible.

However, this may not be possible for all use cases. (I've just been listing the one example with a self-driving car that doesn't have local trusted computing hardware for cost-efficiency purposes, but cannot use servers with these devices for latency reasons. There are lots of other examples in the real world, however.) The other thing to note is that different solutions can be used in combination as "layers of defence." (Ex: encrypt parameter snapshots from training that aren't actively being used, while deploying the most updated parameter snapshot with trusted hardware - assuming this is possible for the use case being considered.)

RE: Model Stealing and Side-Channel Attacks

Yes, the current techniques have important limitations that still need to be fixed (including these attacks and just basic fine-tuning as I showed with some of the techniques above). There's a long way to go in deploying AI algorithms securely :-) In some ways, we're solving this problem at an unprecedented scale after generalised models like ChatGPT became useful to many actors, without the need for any fine tuning. Though an argument is made about how the Google Cloud Computer Vision platform also faced a similar problem previously (Shevlane, 2022).

Young People of EA - Database of Friendly Contacts

Madhav Malhotra3y2

Hi!

As I mentioned in the post, I'd delete the database in a month from the post for privacy reasons. My apologies for the inconvenience :/

Young People of EA - Database of Friendly Contacts

Madhav Malhotra3y2

This is certainly a useful resource for those who live in areas without the effective altruism groups around them! Thank you for sharing :-)

EA Infosec: skill up in or make a transition to infosec via this book club

Madhav Malhotra3y2

Could you please share more details on which parts of the curriculum would be inaccessible to recent graduates? From the outline of the book alone, it's hard to estimate the level of technical depth needed.

80,000 Hours has been putting much more resources into growing our audience

Madhav Malhotra3y1

I'd look forward to seeing you post the results of the in-depth survey on the forum :-)

Help GiveDirectly beat "teach a man to fish"

Madhav Malhotra3y31

I'm not sure this is a good idea.

It seems possible that the individual interventions you're linking to research on are not representative of every possible intervention about skill development.
Also, it seems possible that future interventions may integrate both building human and economic capital to enable recipients to make changes in their lives. Ie. Skill-building + direct cash transfers.
Also, it's generally uncertain whether GiveDirectly will continue to be the most effective or endorsed donation recommendation. I say this given changes in how we measure wellbeing (admittedly, a topic with frequent updates to opinions and mistake corrections being made).

Why potentially reduce the effectiveness of those future interventions by launching this campaign?

80,000 Hours has been putting much more resources into growing our audience

Madhav Malhotra3y12

I'm surprised to see how the book giveaway is more expensive than the costs of actually placing the ads to get eyes on the sites! Why did you decide to give away a physical book? What do you think the cost-effectiveness of that is compared to ebooks or not having a giveaway?

Cause prioritization in Canada.

Answer by Madhav MalhotraFeb 27, 20231

If you're interested in supporting education, scholarships to next generation education companies might be worth supporting (example - disclaimer, I've gone through the program of this particular company).

Regarding investments in environmental causes, more neglected causes are more valuable to invest in. For instance, supporting NOVEL carbon capture companies (ie. not tree planting).

Given the high-tech industry in Canada, it might be relatively advantageous to support neglected research priorities.

For instance, you might be able to fund organisations like iGEM or the National Research Council to support biosecurity work on broad-spectrum antivirals, germicidal UV lights, shotgun genetic sequencing at airports, etc. Feel free to search the forum for simple explanations about these concepts.
Similarly, you might be able to fund research grants to work on AI safety topics including interpretability, robustness, and anomaly detection research at the Vector Institute.

If you're donating to humanitarian causes, you'd have the greatest impact on the dollar directing resources to Indigenous communities. Interventions related to eCBT (mental health apps) for indigenous youth might be especially promising to fund.

Madhav Malhotra

Bio

Participation2

How others can help me

How I can help others

Posts 21

Comments103

Participation
2

Posts
21

Comments
103