Joseph Tsui

Voices from the Epiverse Community of Open-Source Innovators

Building Together is a series of stories about data.org’s global collaborative, Epiverse, showcasing diverse perspectives on building a trustworthy data ecosystem for anticipating, identifying, and preventing future public health crises. Anna Carnegie, Epiverse community manager at data.org, spoke to DrPH student Joseph Tsui during a workshop we convened in December 2022 entitled ‘100 days and 100 lines of code,’ which brought together field epidemiologists, software engineers, and academic analytics groups to ask “What should the first 100 lines of code written during a new epidemic look like?”

Tell us about your role

I’m a second-year DPhil student in the Department of Biology at the University of Oxford. My research mostly focuses on using phylogenetics and phylodynamics to understand infectious disease transmission and as a part of that work, I have been developing pipelines for genomic analysis. I’m particularly interested in scaling up these pipelines for real-time large-scale analyses.

In your view, what kind of tools and resources do you think are currently missing from the field at the moment?

The ecosystem of tools and packages for streamlining genomic analysis is really insufficient, especially when dealing with much larger data sets. There has been limited development in this area. However, this seems to be rapidly changing as genomic surveillance has become a crucial aspect of combatting the COVID-19 pandemic.

The ecosystem of tools and packages for streamlining genomic analysis is really insufficient, especially when dealing with much larger data sets. However, this seems to be rapidly changing as genomic surveillance has become a crucial aspect of combatting the COVID-19 pandemic.
Joseph Tsui DPhil Student, Department of Biology University of Oxford

What specific insights or ideas did you gain over the past two and a half days?

One major takeaway from the last two and a half days for me is the realization that despite the existence of numerous packages for data cleaning and processing before analysis, there is still a lack of standardized procedures – it is still ad hoc. There might be packages available for certain cleaning procedures, but very often, one has to spend quite a bit of time and energy to locate those packages and go through the documentation to ensure they are actually appropriate for the specific problems. Although many people are working on this issue, there is still much to be done in this area. I find this to be a fascinating area to work on, and attending this workshop has been really beneficial for me in terms of understanding the work underway.