NAO Updates, Fall 2024

Jeff Kaufman 🔸

This is a linkpost for https://naobservatory.org/blog/updates-fall-2024

It's been a busy season at the Nucleic Acid Observatory, and we have a lot to share since our last update. As always, If anything here is particularly interesting or if you’re working on similar problems, please reach out!

Wastewater Sequencing

We performed an initial analysis of untargeted sequencing data from aggregated airplane lavatory waste and municipal treatment plant influent that we collected and processed during our Fall 2023 partnership with CDC’s Traveler-based Genomic Surveillance program and Ginkgo Biosecurity. We've now analyzed viral abundance and diversity in sequencing data across multiple sample types and wastewater-processing and sequencing protocols. Next steps include further investigating how protocol and sample-type affect specific viruses and bacteria, as well as understanding pathogen temporal dynamics seen in airport versus treatment-plant samples.

We have continued to work with Pardis Sabeti's Lab at the Broad Institute to develop an optimized protocol for preparing RNA sequencing libraries from nucleic acids extracted from wastewater. By the end of the year, we plan to use this protocol to sequence the samples collected during our Fall 2023 effort described above.

We have also been collaborating with Jason Rothman, formerly of Katrine Whiteson's lab at the University of California, Irvine, and now at his own lab at the University of California, Riverside, to sequence and analyze Southern California wastewater. The sequencing is complete, with a total of 45B read pairs. We're aiming to make the raw sequencing data public with a data paper later this fall.

We've scaled up our collaboration with Marc Johnson's group at the University of Missouri. Marc's group is now running a full flow cell every other week. While the MU Genomics Core had been sequencing these on a NovaSeq 6000 with the S4 flow cell they've moved to the NovaSeq X+ with the 10B flow cell, which lowers sequencing costs by about 30% due to cheaper reagents. These runs include samples from sewersheds in multiple major metropolitan areas, among them four Chicago-area sewersheds through a new collaboration with Rachel Poretsky at University of Illinois Chicago. We're still looking for additional partnerships, and if you have a relationship with a treatment plant in a city with a large amount of international travel we can potentially sequence your influent and share the data with you.

Pooled Individual Sequencing

We are continuing our exploration of pooled individual sequencing, where we go to busy public places and collect nasal swab samples. We've been out to sample eight times, with an average of 33 samples per hour with $5/person and $2/person compensation, and plan to evaluate other compensation options.

We recently received permission to sample on the MIT campus, including inside buildings, and have applied for permission to sample inside Boston's public transit stations. This is important for our effort because we expect outdoor sampling in the winter to have a lower participation rate, in addition to being less pleasant for the experimenters.

We're developing methods for extracting the nucleic acids from these samples and running Nanopore sequencing. We're iterating on approaches to maximize the number of viral reads, which primarily means minimizing the fraction of human genome reads while maximizing overall yield.

Other Sampling Strategies

While we are primarily working with wastewater and nasal swabs, we think other sampling approaches are still worth exploring. We think air and blood samples are especially promising:

We prepared a preprint on indoor air sampling that we're submitting for publication. We presented it at BSL4Znet 2024, and will be presenting a poster on it at CBD S&T 2024 as well.
We've published two blog posts on blood, covering blood as a sample type and opportunities for sampling from the US blood and plasma donation system.

Nucleic Acid Tracers

We now have permission to use our nucleic acid tracers for an experiment where we deposit them in a toilet and measure concentration at a wastewater treatment plant, which we call a "deposition experiment". This is a great outcome from a multi-year process involving collaboration with multiple regulatory bodies. This fall we will be characterizing sequencing efficacy via spike-ins to estimate the amount we would need to deposit to be detectable in our ongoing surveillance, and if efficacy is sufficiently high we hope to run a deposition experiment.

Analysis of Sequencing Data

We've now moved our primary analysis over to our completely rewritten Nextflow-based metagenomic sequencing pipeline. The new pipeline is much more scalable, using AWS Batch to distribute processing across many machines. We've also made good progress in measuring and optimizing costs, to where we can now analyze a billion read pairs for under $10. This is under 1% of the cost of producing this data: the cost in flow cells alone is around $1k per billion read pairs. With costs at an acceptable level, we've now run all our internal sequencing data through the new pipeline.

Our genetic engineering detection pipeline is now operational, at the end of months of work on reducing false positives. In addition to the genetically engineered viral vector we described in our initial announcement, we have since detected two additional HIV-based viral vectors. We ran a preliminary positive control experiment where our collaborators at MU spiked engineered lentiviral particles into wastewater influent, and are currently writing up results to share. We'll be speaking about this work at the AI-powered Diagnostics session of CBD S&T 2024.

We are also exploring other approaches to detecting novel pathogens in metagenomic data. This summer we resumed work developing a method for reference-free detection based on flagging and assembling exponentially increasing sequences in wastewater data. We plan to investigate additional strategies this fall.

We've recently shared an updated version of our preprint on the relationship between relative abundance in published metagenomic sequencing data and incidence or prevalence in public health data. Among other changes, the preprint now uses our new pipeline for analysis. We also applied these methods to the unpublished sequencing data collected by our collaborators at MU and UCI to generate an estimate for the sequencing depth necessary to detect influenza A and B.

Our collaborators Willie Neiswanger and Oliver Liu at the University of Southern California have a paper accepted at an upcoming NeurIPS workshop. This paper, A Foundation Model for Metagenomic Sequences, describes their work training a metagenomic foundation model on the sequencing data our MU and UCI collaborators collected. Their goal is to apply this model to pathogen-agnostic detection.

We have heard from a range of groups that our expertise in pathogen-agnostic detection is something they are potentially interested in drawing on. If you're interested in our analysis services please let us know.

Organizational Updates

After comparing many candidate representations of our work in scoping and building a Nucleic Acid Observatory, we now have a logo!

In preparation for scaling up our wet lab operations, we've secured additional wet lab space outside of MIT, at BioLabs Tufts Boston. This is a shared lab facility in downtown Boston, and our first day with the additional space is today, October 17th.

We recently hired two new Research Scientists, Vanessa Smilansky (at MIT) and Evan Fields (at SecureBio). Vanessa has broad research experience in pathogen detection and nucleic acid sequencing. At the University of Exeter, she developed targeted genomics methods for surveillance of amphibian-infecting protists. Now, she will be joining our Near-Term First group to work on untargeted metagenomic methods for surveillance of human-infecting viruses. Evan joins us from Zoba, where he led data science and software teams optimizing shared mobility and will be working in our Robust Detection group. Outside of work he’s an avid baker.

The NAO is a collaboration between the Sculpting Evolution group at MIT and SecureBio, and the latter recently said goodbye to Operations Manager Tiff Tzeng. She leaves to help start a new AI Safety organization which will publicly develop tools for the responsible deployment of artificial intelligence. With her departure, SecureBio is hiring a Director of Operations (job posting).

SummaryBotOct 18 20243

Executive summary: The Nucleic Acid Observatory (NAO) reports progress in wastewater sequencing, pooled individual sampling, nucleic acid tracers, and data analysis techniques for pathogen detection and surveillance.

Key points:

Expanded wastewater sequencing efforts with multiple collaborations, including analysis of airplane lavatory waste and municipal treatment plant samples.
Scaled up pooled individual sequencing via nasal swabs, with plans to sample indoors at MIT and Boston transit stations.
Received approval for nucleic acid tracer deposition experiments in wastewater systems.
Improved metagenomic sequencing pipeline and genetic engineering detection capabilities, reducing costs and false positives.
Organizational updates include new logo, additional lab space, and hiring of two new Research Scientists.
Ongoing development of novel pathogen detection methods, including reference-free detection and a metagenomic foundation model.

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.

Effective Altruism Forum
EA Forum