Projects

Research directions

Designing functional proteins

The engineering of proteins with enhanced native functions or entirely novel activities are major challenges in computational protein design. However, current generative models exhibit limitations in producing functionally diverse variants. Thus, new methods are needed that can (i) extract the right patterns from evolutionary data, (ii) incorporate structural constraints, and (iii) leverage experimental data to enrich the functional generative landscape. My doctoral research focuses on developing computational approaches towards these goals.

Collaborators: Thomas Hopf, Marks Lab, Sander Lab.

Status: In Progress

EVEdesign

EVEdesign, led by Thomas Hopf, is an open-source initiative to democratise AI-driven protein design. It enables end-to-end sequence design, from entering the target protein through analysing the generated library to ordering codon-optimised DNA sequences. EVEdesign also aims to be a flexible, model-agnostic tool, where both single mutation scans and higher-order mutant generation are possible with custom sampling techniques. Mutation sites are readily mapped onto available structures, and mutant information is extracted and visualised in sequence space, all within the same analysis window. I am working on the creation of our first complex protein design pipeline, which will be included in the pre-print.

Collaborators: Thomas Hopf, Marks Lab, Steinegger Lab, D’Oelnitz Lab, Sander Lab, NIST.

Status: In Progress

Links: EVEdesign

Other initiatives, collaborations and projects

Sanger Machine Learning Bookclub

I co-founded and currently lead the Sanger Machine Learning Boocklub (SMLB), a community of nearly 100 people that brings together “BioML” enthusiasts across the Wellcome Sanger Institute and the European Bioinformatics Institute (EMBL-EBI). At the SMLB we aim to focus on the theory of Machine Learning methods that are advancing biological and biotechnological research. It was created to share knowledge, resources and research in an engaging way, and to bring curious scientists together. We run weekly sessions in one of the following forms: workshops, study sessions, Journal Club-style sessions and research talks. A record of all sessions is kept at the official SMLB Notion site.

Dates 04/2025 - Present

Links: SMLB site

Information-theoretical tools for single-cell RNA-seq data analysis

Elucidating structure from single-cell RNA-seq data is a major challenge in modern biological research. For my final project of the MRes in Bioinformatics and Theoretical Systems Biology, I improved existing and developed novel information-theoretical tools to uncover real relationships between genes as part of the single-cell RNA-seq data analysis pipeline. This work introduced theoretically tractable alternatives to available methods. Further developments from the Theoretical Systems Biology group and part of my work are now being prepared for publication.

Collaborators: Theoretical Systems Biology group, Briscoe Lab

Predicting DNA function when grafted across species

For one of my rotations in the PhD programme, I was part of a larger genome engineering project led by the Parts Lab focused on inter-species chromosome transplantation. I developed an approach that involved fine-tuning DNA language models to elucidate the extent to which the effects of chromosome grafting could be learned. I summarised my findings in a poster which earned the Best Poster Award within the PhD cohort. Manuscript in preparation.

Collaborators: Parts Lab

AI-readiness standards for space biology

I am part of an international intiative led by scientists from the European Space Agency (ESA) and the National Aeronautics and Space Administration (NASA) that aims to introduce AI-readiness standards and guidelines for space biology data, with ramifications for other biological data types. Manuscript in preparation.

Collaborators: NASA, ESA