Sowing the seeds for COG-UK

Historical foundations: Pathogen genomic surveillance

Set up at speed, the national genome sequencing effort launched by COG-UK could not have happened without decades of investment made before the pandemic. This is highlighted by Sir Jeremy Farrar, the director of the Wellcome Trust. He says, 'The critical point here is you cannot build expertise, infrastructure and networks in a crisis. You have intellectual networks, academic, public health networks; critically, people in an infrastructure that is there and functional before a crisis. And when a crisis hits, you can use that infrastructure. You cannot build an army when a war breaks out' (Farrar transcript).

UK: The cradle of genomics

Farrar argues that one of the reasons 'why COG-UK was so successful and world-leading is because there was very strong genomics in the UK before COVID' (Farrar transcript). The UK's strength in this area is rooted in the methods pioneered by the British biochemist Fred Sanger in Cambridge from the 1940s, firstly for sequencing proteins and then DNA, for which he was awarded two Nobel Prizes, respectively, in 1958 and 1980 (Marks 2015). Originally a slow and highly manual process, his DNA sequencing technique helped lay the foundation for the Human Genome Project, an international collaboration launched in 1990 to decipher the complete DNA sequence of the human genome. Publicly funded by US and UK government bodies, the collaboration involved multiple researchers from 20 separate universities and research centres across the United States, United Kingdom, France, Germany, Japan and China. Costing approximately $3 billion, the project took 13 years to complete (NIH).

Completed in 2003, the Human Genome Project marked a major landmark in scientific history. It rested on the development of highly advanced sequencing techniques and new computational approaches. One third of the sequence, around 3 billion base pairs long, was completed by the Wellcome Sanger Institute, which was the biggest single contribution by an institution in the collaboration (Wellcome Sanger Institute 2004). Originally named the Sanger Centre, the Institute was founded in 1992 in Hinxton near Cambridge with funding from the Wellcome Trust and UK Medical Research Council, to support the large-scale sequencing needed for the Human Genome Project (Wellcome Sanger Institute).

Following the Human Genome Project, the Sanger Institute expanded its mission to investigate the role of genetics in health and disease. This included using genomics to address emerging global health threats, such the rising problem of bacteria resistant to antibiotics and spread of multidrug resistant malaria. Awarded core funding every five years from Wellcome, the Institute had just submitted its Quinquennium funding request in December 2019. As part of the bid for this funding, to run from October 2021, its malaria researchers had developed a strategy to set up large-scale sequencing that could be combined with epidemiology and other types of data to follow a particular pathogen 'over space and time in a systematic way' so that it could be used to inform public health measures. According to Professor Kwiatkowski, who headed up the malaria programme, this was based on the fact that he and his team had reached a point where its technologies, analytical methods and global partnerships were sufficiently mature to do 'large scale, longitudinal genomic surveillance.' This meant that when the pandemic arrived, they were already primed to perform such work. As he says, 'It's what we were gearing up for' (Kwiatkowski transcript).

The Sanger Institute was not only important in terms of the high-throughput genomics it could bring to bear on different projects. It was also an important site for training, with many of its alumni going on to set up genomic research hubs in other parts of the country. As a result a number of strong genomic academic centres have now been built in places like Edinburgh, Birmingham, Manchester, Oxford and Cambridge. This meant the UK had a number of well-equipped centres to undertake sequencing of COVID-19 samples around the country once the pandemic arrived (Kwiatkowski transcript).

Figure 3.1: Photograph of the Wellcome Sanger Institute and Hinxton Hall, Cambridge, UK. Credit: Magnus Manske. Wikimedia.

High-throughput sequencing technologies

Beyond the Sanger Institute, a number of UK scientists have made important contributions to the advancement of genomic technology. This includes Professors Shankar Balasubramanian and David Klenerman who in the mid-1990s developed a new technique in the Chemistry Department at the University of Cambridge. Subsequently called 'sequencing by synthesis technology', their technique helped 'improve the speed and cost of decoding DNA by a factor of 100,000-fold' and provided the basis for the setting up of a new company called Solexa in 1998. In 2006 Solexa launched its first sequencer and very quickly became a $200 million company, giving scientists the ability to sequence 1 gigabase (Gb) of data in a single run. Importantly, it provided the means to sequence a human genome in three months for $100,000. In 2007, Solexa was acquired for $650 million by Lynx Therapeutics, a Californian-based genomics analytics company which subsequently merged with Illumina (Cambridge Enterprise; Illumina).

Balasubramanian and Klenerman were not the only UK scientists helping to pioneer new sequencing techniques. So too was Professor Hagan Bayley, a chemist at the University of Oxford. In 2005 he founded a spin-out company called Oxford Nanopore Technology (ONT) to develop a technique called nanopore sequencing. First released in 2014, nanopore sequencing marked the culmination of nearly three decades of research by different US and UK scientists. Incorporated into pocket-size devices, nanopore sequencing radically transformed the process of sequencing. Importantly, its portability opened up the possibility to carry out sequencing in remote areas with limited laboratory facilities. It also had the advantage that it did not require the DNA in a sample to first be amplified and could read a sequence in real-time (Marks 2021).

The path to the ARTIC network

Nanopore sequencing quickly caught the attention of UK scientists looking to roll out genomic sequencing in the field to monitor and track epidemic outbreaks in resource-limited settings. One of the first people to pilot it for this use was Professor Nick Loman from the University of Birmingham. Between March and October 2015 he and his colleague Dr Joshua Quick demonstrated the power of nanopore sequencing in West Africa for determining the transmission pathway of the Ebola virus, which had caused over 11,000 deaths since the first case had been diagnosed in December 2013. Critically, the technique helped pinpoint two major virus lineages that were responsible for the continuing outbreak which regularly travelled across the border between Guinea and Sierra Leone. Armed with this information, public health workers were able to improve diagnostic tools on the ground and allocate resources to bring the epidemic under control (Quick, Loman, Duraffour).

Following the Ebola virus outbreak, in 2016 Loman and a number of collaborators in the UK and Brazil successfully deployed nanopore sequencers in what became known as the ZiBRA project, to help track and curb the spread of the Zika virus. Carried by mosquitoes, the rapid spread of the Zika virus across South and Central America and the Caribbean had been declared a Public Health Emergency of International Concern in February 2016 after many pregnant women infected with the virus went on to give birth to babies with microcephaly, a condition which causes them to have much smaller heads than expected. Using a mobile laboratory equipped with nanopore sequencers, the ZiBRA project sequenced RNA from the Zika virus isolated from 1349 patient samples and more than 650 mosquitos across a wide geographical region in Brazil (Faria; Goes de Jesus; Quick, Grubaugh, Pullan).

Figure 3.2: Professor Nick Loman sequencing the Zika virus with a MinIon nanopore sequencer in Brazil (credit: Ricardo Funar).

Loman and his team were not the only ones exploring ways to quickly sequence viruses in the field to help manage epidemics. So too were Professor Ian Goodfellow, a virologist based at Cambridge University, and Professor Andrew Rambaut, a bioinformatician with expertise in viral evolution at Edinburgh University. Like Loman, they had both worked on the Ebola virus in West Africa but from different angles. Although working separately with different goals, they found a way of working together with their complementary skills. According to Goodfellow, 'they're probably the best bunch of people I've ever worked with really, they're really nice people'. He says, unlike many other academics who can sometimes be protective of their data, they were united by a desire to share their data so that everybody could see it. In this respect they were helped by a software programme that Rambaut had written called BEAST, standing for Bayesian Evolutionary Analysis by Sampling Trees. This provided a critical part of the analytical tools used to understand virus evolution. Rambaut's work was key to allowing other people to confidentially share data which could then be analysed in the context of other data. Goodfellow explains that this is important because until then 'everybody was doing our little bits, but it wasn't until you really put the whole thing together, you got a big picture of what's going on' (Goodfellow transcript). In 2017 the data was combined into a paper which provided a much more 'detailed phylogenetic history of the movement' of the Ebola virus 'within and between the three most affected countries' which had not been possible to determine before (Dudas).

Figure 3.3: Professor Ian Goodfellow in Sierra Leone when dealing with Ebola. Credit: Ian Goodfellow. Goodfellow worked with an international team to establish the first genetic sequencing facility in Sierra Leone for the rapid diagnosis of Ebola. Together they processed over diagnostic 25,000 samples, reducing diagnosis time down from 7 days to 4–6 hours. This information was sent directly to the World Health Organization (WHO) response strategy. Of the 25,000 samples tested in the diagnostic lab, they processed ~1200 samples for sequencing, which generated >600 high quality genome sequences (University of Cambridge Aug 2021; Goodfellow email).

While successful, the Ebola work had not been straightforward because a lot of the time they were learning on the job and did not have all their tools they needed. But it provided them with a powerful case to apply for a five-year collaborative grant from the Wellcome Trust 'to develop an end-to-end system for processing samples from viral outbreaks to generate real-time epidemiological information that is interpretable and actionable by public health bodies' (Figure 3.4). One of their key motivations was to find a way to make it easier to carry out sequencing in resource-limited environments and to then be able to generate actionable information in a useful timeframe. Goodfellow comments that there is little point in generating sequences 6 months after an epidemic has finished - if your main motivation is to help control the epidemic itself and inform the public health response. What was essential to this process was developing tools and a protocol that could be used for a diverse range of samples. This was important because, as Goodfellow points out, 'each sample you get can be very different. You can have a high viral titre, a low viral titre, a complex sample with lots of different bits of nucleic acid in it, or a very clean sample that's only got mostly virus. All those different things pose different problems when you try to sequence pathogens'. He says the challenge becomes even harder when working in resource limited environments and the work is being done by staff that do not have 10+ years experience in molecular biology. What they needed were robust protocols that could be picked up by individuals quickly (Goodfellow transcript; Goodfellow email).

On the back of this work with Ebola the three scientists felt that they could collectively build a strong case to try to develop the concept even further so put together an application to the Wellcome Trust. As part of the application process, Rambaut, Loman and Goodfellow had to undergo a panel interview. Goodfellow comments, 'It was a bit like a boy band with the three of us standing there in front of this panel interview answering questions. It was difficult, but I think they could see the value in it. Importantly, there were people on the funding committee that assessed it that really did see the future potential of investing in this area' (Goodfellow transcript; Goodfellow email). In the end the Wellcome Trust awarded £1,721,712 to the project in May 2017 (Fig 3.4).

Figure 3.4: Wellcome Trust Collaborative Science grant awarded to Rambaut and his collaborators which laid the foundation for setting up the ARTIC Network. Credit: Wellcome Trust grants awarded. The following institutions are listed for the applicants: University of Edinburgh (Rambaut) University of Birmingham (Loman and Quick), University of Cambridge (Goodfellow), University of Oxford (Fraser), University of Leuven (Lerney), University of California, Los Angeles (Suchard); Fred Hutchinson Cancer Research Centre, Seattle (Bedford).

Described by Goodfellow as 'a great investment', the Wellcome funding paved the way to the setting up of the ARTIC Network. An international partnership, the Network set out to develop a set of primers, laboratory protocols and bioinformatic tools to make it possible to sequence different virus samples to facilitate a quick response to outbreaks. The protocol relies on the direct amplification of the target from clinical samples using a technique known as the polymerase chain reaction (PCR). This is carried out using several (multiplexed) primers, short nucleic acid sequences, typically 20 to 25 bases long that provide the starting point for DNA synthesis in PCR. This is done to amplify DNA, or complementary DNA (DNA synthesised from single-stranded RNA template) if working with an RNA virus, known as amplicons which then go on to be sequenced (ARTIC Network).

Figure 3.5: The ARTIC Network is made up of scientists from the universities of Edinburgh, Birmingham, Cambridge, Oxford, KU Leuven, UCLA and the Fred Hutchinson Cancer Centre. It also has a number of collaborators from Nigeria, Ghana, WHO and the National Institutes of Health in the USA.

The protocol relies heavily on a piece of software developed by Dr Joshua Quick at the University of Birmingham which he created while working with Loman to tackle the Zika virus. Called the 'Primal Scheme', this software makes it possible to automatically design a panel of primers from a specific genome for multiplex PCR (Quick, Grubaugh, Pullan). Quick explains, 'Basically, what the software does is it ingests that genome and then produces the primers which are highly specific. Their sequences are exact matches to that virus genome sequence'. Having the ability to generate primers fast meant genome sequencing could be rapidly rolled out in different outbreak scenarios (Quick transcript).

Figure 3.6: Photograph of Dr Joshua Quick with picture of the primal scheme he created. Credit: ARTIC Network.

Adapting the ARTIC protocol for the SARS-CoV-2 virus

Having already been tried out in the case of the Zika virus, the ARTIC protocol meant that the viral genome sequencing community was already well prepared when the news broke of the SARS-CoV-2 virus, initially called nCOV-2019, in Wuhan. Quick's interview captures just how fast they were able to move. He remembers, 'the first SARS-COV-2 genome was published on the 17th of January [2020] and we published our scheme on the 22nd of January. It only took us five days to produce an almost fully functional genome sequencing protocol.' For Quick, the beauty of the protocol was it had been used so many times before that he and his colleagues 'had confidence that it would work' and that it did not need much testing before it was released. Posted on '', the protocol initially received very little attention the first month, but got very popular after February (Quick transcript).

Figure 3.7: First ARTIC protocol using first genome sequence of SARS-CoV-2 virus from Wuhan. Credit: Quick presentation to COG-UK Together event.

Figure 3.8: Tweet put out by Quick on 24 Jan 2020 announcing availability of primer pools for sequencing SARS-CoV-2 virus.

Asked a lot about the protocol in the early days, Quick soon found himself sending out many primers. He made these up himself using synthetic oligos that he bought. Coming as 'one sequence per tube', Quick had to put the right oligos together to generate the primers.

Once this was done, he had to label and send out the packages, all of which was time-consuming. Aimed at helping others to get started and save them time, Quick began this process before lockdown and COG-UK got fully moving. Many of the packages went to both UK and international locations. Fortunately, sending out the primers was relatively easy because the oligonucleotides remain stable at room temperature if put in the right buffer. That meant Quick did not need 'to worry about dry ice shipping' (Quick transcript).

Some of the first primers Quick sent went out to Denmark on March 16th 2020. Soon after this he also shipped batches to Germany, Mexico, Uruguay, Portugal, Texas, Serbia and Sheffield. Describing this period as 'crazy' he remembers, 'Over the course of a couple of months we sent out primers to 44 different countries all over the world.' All of the interest came from just putting the protocol 'on a blog post'. Quick believes that one of the reasons why ARTIC became the 'canonical genome sequencing protocol' was 'because we had an early mover advantage. It's like anything in science: no one wants to reinvent the wheel, especially not when they're under time pressure. Everyone knew that ARTIC had a track record for doing viral genome surveillance so they were pretty happy to accept that this was going to work' (Quick transcript).

Quick was particularly surprised to see the protocol adopted by laboratories in the UK. This was because it had essentially been designed to work with laboratory equipment able to fit inside a suitcase, which was another important development strand of the ARTIC Network. Just how strange Quick found the situation is summed up in his following words: 'normally we pack the lab up, make the primer scheme, then pack the reagent stuff and go do some fieldwork training, sequencing, whatever, in another country. But this was basically doing it in our own lab, which was definitely weird' (Quick transcript).

While originally designed for nanopore sequencing, the ARTIC protocol had the attraction that it could be easily adapted for other sequencing platforms. This is because the first half of the protocol concerns PCR which is a 'universal' method. By the end of the PCR process, scientists are able to obtain 'an amplicon pool which contains all the genetic information' that can be sequenced on any platform. Quick remembers having people using the protocol 'on all platforms, including Nanopore, PacBio, lllumina, IonTorrent' (Quick transcript).

Looking back, Quick reflects that having the ARTIC protocol in place was what helped the UK to roll out the first national surveillance programme. As he says, 'Building the infrastructure for doing a national genomic surveillance scheme is hard enough, but it's better if you already have your genome sequencing protocol. So that was why the UK was the first basically. That was done with a great bit of prescient grant funding from the Wellcome Trust.' He continues, 'Through that grant we had established the personnel and had a lot of the methods and equipment already working. The only piece that we needed was that genome from Fudan University and the rest of it just clicked into place. That's why it only took five days' (Quick transcript).

One of the reasons Quick says he and his colleagues developed the ARTIC protocol for the SARS-CoV-2 virus so fast was because 'under the ARTIC framework we felt we had a responsibility to do this. We already had the funding and felt a moral obligation to do this, even before we knew that there was going to be a pandemic. We would have done it anyway. Most of the outbreaks we've worked on before have been fairly localised outbreaks, some of them have been epidemics. We would have still done this if it had only affected one city in China because it was worth the time that it took to do it. Obviously, we didn't know that it was going to become so big' (Quick transcript).

Having sent off the first ARTIC primers for the SARS-CoV-2 virus, Quick then joined forces with Dr John Tyson at the University of British Columbia in Vancouver to improve the primer set once it became clear that a lot of people were going to use it. Their motivation for doing this was in part driven by the fact the first primer set had gone out with 'fairly minimal testing'. Called 'V1 primers' these had been 'designed off the Wuhan HU1 reference' genome. One of the issues was that it had around 100 amplicons in it, some of which were weaker than others which meant there was sometimes some drop-out in the sequencing. He and Tyson were keen to improve the primer set so that 'they were all working and working evenly'. They were also conscious that the virus would probably mutate at some point which could 'erode the effectiveness of the primer set over time'. The updated primer set incorporated 22 additional primers to improve genome coverage. Known a.

version 3, this primer set became the standard primer set for several months because as Quick points out 'we were then pretty much locked down for the next 18 months so mutations were rare' (Quick transcript).

Figure 3.9: Tweet put out by Quick in Sept 2020 announcing the publication of the LoCost protocol and another by Fatiha Benslimane, a biomedical research scientist at Qatar University. Benslimane's tweet illustrates both the global reach of the ARTIC protocol and how much the new version helped improve the workflow involved in sequencing the SARS-CoV-2 virus.

Alongside Tyson, Quick also collaborated with Drs David Stoddart and Phil James, two ONT scientists to streamline the library preparation in the protocol. A key part of this process involved sticking barcodes to the end of the amplicon combinations. Consisting of 24 base sequences, the barcodes are an important tool for reading the sequence of each sample. Their starting point was a library preparation protocol Quick had helped formulate for the Zika Virus. One of the problems with this method was it involved a number of clean-up steps which were time-consuming. Their aim was to develop a workflow that 'could be completed in a day, or ideally half a day'. This took six months to achieve. Called 'LoCost' the protocol was officially released on 25th August 2020. Its development was helped by the discovery, made by Stoddart and James, that the post-PCR SPRI clean-up could be replaced with a dilution in water. The protocol not only reduced the time required for library preparation but also reduced the reagent cost down to £10 per sample which, as the scientists working on it pointed out, made it 'practical for individual labs to sequence thousands of SARS-CoV-2 genomes to support national and international genomic epidemiology efforts' (Quick transcript; Stoddart and James transcript;Tyson ).

Figure 3.10: Number of monthly views of ARTIC protocol between Dec 2019 and Feb 2021. Credit: Quick presentation to COG-UK Together event.


Another component that was crucial to the setting-up of COG-UK was the availability of an online bioinformatics platform launched in July 2016 for academics with £8.4 million secured from the UK's Medical Research Council in 2014 (Pallen; UKRI). Known as the Cloud Infrastructure for Microbial Bioinformatics, or CLIMB, Professor Sharon Peacock described this platform as pivotal to the setting up of COG-UK. As she says, 'That's what stored the data. It was also where the analytical tools were placed' (Peacock transcript).

Figure 3.11: Overview of the different components in CLIMB. A) Sites where the computational hardware is based. B) High-level overview of the system and how the different software components connect to one another. C) Compute hardware at each of the four sites. D) Hardware comprising the Ceph storage system at each site. E) Type and role of network hardware used at each site. Credit: Figure 1 in Connor.

CLIMB was designed to address the challenges posed by the explosion of microbiological genome sequence data generated through the introduction of rapid and affordable high-throughput sequencing (Pallen). Led by Professor Mark Pallen, then at the University of Warwick who was involved in genomic epidemiology of bacterial pathogens of humans and animals, the platform was created through a cross-institutional collaboration. This collaboration included Professor Thomas Connor at Cardiff University who helped set up the Pathogen Genomics Unit at Public Health Wales, Loman at the University of Birmingham, and Professor Samuel Sheppard, the director of bioinformatics at the University of Bath. Another key partner was Simon Elwood Thompson, an expert in health informatics originally based at Swansea University who subsequently moved to the University of Birmingham.

Using a free open-source cloud-computing system, CLIMB was intended to head-off the problem microbiology groups faced setting up and maintaining their own dedicated bioinformatics hardware and software to handle the deluge of data. The collaborators realised that having individual groups purchase and install their own infrastructure was not only sub-optimal in terms of the cost and time involved, but would also reduce the ability to share data and increase the capacity for reproducibility. Accessed through the Internet, the aim of CLIMB was to provide UK microbiologists with a single environment that gave them the right computing power, storage and analysis tools to make the sharing and analysis of data simpler and faster. Each of the four universities involved in the collaboration installed the same computing equipment to work as an integrated system.

Figure 3.12: Diagram showing how Cloud computing works. The photo on the right shows Mark Pallen in front of a data centre Loman and Thompson set up in Birmingham. Credit Pallen. First conceived of in 1963, cloud computing began to take off seriously in the late 1990s. At its basic level cloud computing relies on high-memory remote servers to store and access data which cuts out the need for local hard-drives. It allows multiple people to use one computer at the same time via the Internet from different locations. Before the emergence of cloud computing, organisations had to purchase and maintain their own servers to meet their needs (Varghese).

By the time COVID-19 emerged, CLIMB was supporting over 900 users and over 300 research groups spanning 85 research institutions all the way from Edinburgh to Exeter and from Belfast to Norwich as well as government bodies like Public Health England, Public Health Wales and the Animal and Plant Health Agency. The pre-existing CLIMB infrastructure and expertise quickly pivoted to helping COG-UK deliver rapid sequencing data for the SARS-CoV-2 virus genome. A specific CLIMB-COVID team was set up to facilitate the process. This team included Dr Sam Nicholls, Radoslaw Poplawski, and Simon Thompson from the University of Birmingham, and Dr Matt Bull and Dr Christine Kitchen from the University of Cardiff.

Efforts in the area were helped by the fact that in March 2020, the MRC, now part of UKRI, provided another five years of funding, worth £1,994,477, to help build out the CLIMB infrastructure further. The new iteration aimed to improve and expand the resource to support research into a range of issues including antimicrobial resistance, emerging infectious diseases and global health. Moved from its original base at Warwick University to the Quadram Institute in Norwich, a number of new partners also came on board to facilitate the next iteration of the project. This included the University of Leicester, and MRC Unit The Gambia at the London School of Hygiene (LSHTM) (CLIMB Big Data; COG-UK Nov 2020).

The AMPHEUS platform

Both ARTIC and CLIMB were central pillars to the foundation of COG-UK. But there were also a number of other initiatives that helped seed its roll-out. One of these was a project called 'AMPHEUS' which stands for Analytical Microbiology for Precision Health and Epidemiology - a Unified Solution. This was jointly led by Professor Christophe Fraser and Dr David Bonsall at Oxford University and Professor Helen Ayles, director of research at Zambart, an independent research NGO in Zambia closely affiliated to the University of Zambia and LSHTM (AMPHEUS).

Awarded funding from the Bill and Melinda Gates Foundation just before the launch of COG-UK, the collaborators aimed to build a scalable and portable laboratory system to track and tackle infectious pathogens in low income settings. Their vision for this was based on a previous high-throughput genome sequencing system integrated with diagnostics which they had previously developed in Zambia to support real-time molecular surveillance of HIV (human immunodeficiency virus) to improve prevention and treatment of the disease which is endemic in the region. Transmitted through sexual contact, intravenous drug use and mother-to-child in pregnancy, if left untreated HIV can lead to AIDS (acquired immunodeficiency syndrome).

One of their motivations for launching the AMPHEUS project was to find a way to monitor transmission patterns and the spread of drug resistance. This was seen as important to achieving a sustained reduction in the incidence of HIV. To do this they needed to find a way to sequence HIV genomes. Because the amplicon method, like the one used in the ARTIC protocol, can be sub-optimal for getting the complete genome of HIV, they adapted another virus-enriched sequencing method, called veSEQ. It had previously been demonstrated to be efficient at sequencing Hepatitis C Virus, a virus which is even more diverse than HIV. One of the advantages of the method was that it could be easily adapted to study other RNA viruses or a panel of viruses. Combined with a computational pipeline tailored specifically to HIV, the new method proved highly effective at recovering complete HIV genomes and identifying minor variants from plasma samples and was cost-effective, both of which had previously been technical challenges for sequencing HIV genomes at scale (Bonsall).


Schemes to roll out genomic sequencing were not confined to viruses. Another area where it was also being developed was for malaria, which is transmitted from person to person by mosquitoes infected with single-celled Plasmodium parasites. Those most vulnerable to the disease are children under the age of five, patients with HIV/AIDS and people with low immunity. Once common across half of the world, global mortality from the disease declined by 90% over the course of the twentieth century through the drainage of swamplands and use of insecticides to destroy mosquitoes and use of drugs like artemisinin.

While largely eliminated from rich countries, malaria continues to be a leading cause of death and disease in many poor tropical and subtropical parts of the world, with the highest prevalence in Africa. In 2021, WHO estimated that out of 247 million cases of malaria worldwide, 95% of the cases were in Africa. Africa also accounted for 96% of malaria deaths in the world, with 80% of these being among children under the age of 5 (Roser, Ritchie; CDC).

One of the major obstacles to being able to bring malaria under control is the fact that parasites and mosquitoes can easily adapt genetically to evade preventive measures like insecticides and anti-malarial drugs. While humans exposed to malaria also acquire genetic changes to help protect them against malaria, these changes can sometimes be harmful and cause disorders like sickle-cell anaemia. In 2003 the Bill & Melinda Gates Foundation and UK Medical Research Council provided funding that helped pave the way to setting up of the Malaria Genomic Epidemiology Network (MalariaGEN). This Network was officially founded with Wellcome Trust funding in 2005 with the help of web-based software designed to integrate clinical and genetic data collected by different groups working on malaria (MalariaGen).

Figure 3.13: Photograph of Professor Dominic Kwiatkowski at the Royal Society Admissions day in London, July 2018. Credit: Duncan Hull, Wikipedia. Following his training as a paediatrician at Guy's Hospital, Kwiatkowski first developed a research interest in malaria when he spent many years in West Africa where he witnessed the high levels of infant mortality from the mosquito borne disease.

Over the last 30 years Kwiatkowski has been looking for ways to translate advances in genomic science into clinical and epidemiological applications with a focus on finding a way to reduce the burden of infectious diseases in resource poor settings. In 2000 he took up an appointment at the Wellcome Trust Centre for Human Genetics in Oxford. Five years later took up a joint appointment with the Sanger Institute from where he helped found MalariaGen. Prior to COVID-19 his group had built up extensive expertise in processing and sequencing a lot of malaria parasite samples and mosquitoes coming from many different countries. This proved invaluable in terms of Sanger Institute's contributions to COG-UK. Kwiatkowski is now Honoarary Faculty at the Sanger Institute (Kwiatkowski transcript).

Now including over 100 research groups in more than 40 malaria endemic countries, MalariaGEN is an international scientific network that brings together cutting-edge genomic sequencing technologies with genomic research to better understand the epidemiology and evolution of malaria. One of the early strands of MalariaGEN's work was to establish principles for equitable sharing of data between low, middle and high income countries, linked to capacity building in data analysis for researchers in malaria endemic countries.

Genomic surveillance for antimicrobial resistance

As well as playing an important role in coordinating genomic work on malaria, the Sanger Institute also initially housed the Centre for Genomic Pathogen Surveillance (CGPS) which emerged from a joint project with Imperial College in 2015. Led by Professor David Aaenesen, the aim of this project was to develop the capacity for genome surveillance of antimicrobial resistance (AMR), which ranked as one of the most serious global public health threats at that point. In 2017 CGPS work was strengthened by an award of 6.8 million from the National Institute for Health and Care Research, which enabled the establishment of the Global Research Unit of AMR at the University of Oxford in 2017. This was formed with partners in The Philippines, Colombia, Nigeria and India (NIHR). The Centre subsequently moved to the University of Oxford in 2021 when Aaenesen took up a position there.

Directed primarily towards low and middle-income countries, most of the efforts of the Unit were focused towards engineering web tools and software to enhance local capacity for genomic surveillance and enable the feeding of the data into national and international databases for spotting the emergence of resistance. Among the tools developed, two that proved most useful for COVID-19 turned out to be Microreact and Pathogenwatch. These provided a simple way for rapidly uploading and visualising different clustering patterns both geographically and over time (Underwood transcript; Wellcome Sanger Institute 2017).

Hospital acquired infections

Prior to COVID-19 a lot of the expertise in pathogen genomic surveillance resided in the development of schemes to help low and middle-income countries which disproportionately carry the heaviest burden of infectious diseases. But, closer to home efforts had also been mounted to address the issue of infections acquired in the hospital, also known as nosocomially acquired infections. A major problem worldwide, hospital infections are commonly harder to treat and are also associated with longer hospital stays and greater disability. Caused by a variety of pathogens including bacteria, viruses and fungi, hospital infections can be kept under control through strict infection control. In recent years this process has been aided by the use of genomic sequencing surveillance.

One of the first to demonstrate the power of pathogen genomic surveillance for nosocomial infections was Professor Sharon Peacock, a microbiologist based at both the University of Cambridge and the Sanger Institute. In 2011 she and her team had the opportunity to assist in the investigation of an outbreak of MRSA (Methicillin-resistant Staphylococcus aureus) in a Special Care Baby Unit (SCBU) at the Rosie Hospital in Cambridge. They did this by sequencing MRSA isolated from infants in the SCBU over a period of months, MRSA isolated from screening swabs taken from staff members, and MRSA from clinical specimens collected from patients (including parents) seen in both outpatient and emergency departments as well as GP surgeries. This revealed a network of transmission links between cases, which led to infection control action that led to cessation of the outbreak. Following this work, Peacock and researchers from Thailand and Australia also demonstrated the power of WGS for tracking and identifying MRSA in two intensive care units in a hospital in northeast Thailand (Marks 2015).

Figure 3.14: Photograph of Professor Sharon Peacock. Credit: Nick Saffell, Cambridge University. Leaving school at the age of 16 to start work in a shop, Peacock attained the necessary qualifications to enter medical school as a mature student by attending evening classes and a technical college while working as a nurse. The first in her family to go to university, Peacock completed her medical training in 1988 and then went on to do a doctorate in microbiology. Between 2002 and 2009 she headed up the microbiology research programme at the Mahidol-Oxford Tropical Medicine Research Unit in Bangkok, Thailand. Following that, she was appointed Professor of Clinical Microbiology at the University of Cambridge and took up honorary positions with the Cambridge University Hospitals NHS Trust and the Health Protection Agency. These positions gave her the opportunity to start researching how to translate pathogen sequencing into clinical practice to help patients. She used her research on antimicrobial research to contribute to the Chief Medical Officer's Annual Report on the subject (UKRI; Falconbridge), and to lead the writing of a chapter on pathogen sequencing in the CMO's report, Generation Genome (CMO 2016).

Another researcher interested in the use of genomic sequencing to address hospital acquired infections was Professor Judith Breuer, a molecular and clinical virologist based at University College London. Breuer has a long history of using genomics to study how the evolution of viruses impacts public health measures. In 2011, she received a five year Wellcome Trust collaborative grant to work on norovirus, which like SARS-CoV-2 is an RNA virus. Commonly causing outbreaks in winter months, norovirus is highly contagious and the most common cause of vomiting and diarrhoea, globally resulting in more than 200,000 deaths every year especially in babies (UCL). One of the reasons Breuer was particularly interested in the virus is because it is one of the most frequent causes of outbreaks in semi-close places like hospitals, nursing homes and schools. In 2013, she and colleagues had established that WGS was more effective at picking up transmissions of norovirus in the hospital than traditional methods used by infection control teams which often only picked up a minority of the true transmissions (Kundu; Breuer transcript).

Figure 3.15: Photograph of Professor Judith Breuer, 25 May 2018. Credit: @Microbiology@UCL Prior to COVID, Breuer had spent many years developing next generation sequencing methods to sequence hard-to-culture pathogens directly from clinical specimens and determining how they could be used to investigate hospital transmission and drug resistance (Academy of Medical Sciences; Breuer transcript; COG-UK June 2021).

Impressed by what the ARTIC network had achieved, a year before COVID-19 hit, Breuer approached Loman and Rambaut to see if the same approach might be used for norovirus in the UK. What struck her at the time was that a lot of investment had gone into community sequencing in low and middle-income countries but it had been relatively little explored in the UK. She argued that norovirus provided an ideal case for trying out the ARTIC approach closer to home because no one else was working on it. The idea had not gone very far by the time COVID-19 unfolded, but together with Loman and Rambaut she had formulated a plan to use the localised sequencing approach offered by ARTIC alongside centralised sequencing. This meant they were already thinking about the possibility of how to combine centralised and localised sequencing when efforts got underway to launch COG-UK (Breuer transcript).


Academy of Medical Sciences (n.d.) Professor Judith Breuer.Back


ARTIC Network (24 March 2020), SARS-CoV-2.Back

ARTIC Network LoCost.Back

Bonsall, D, Golubchik, T, de Cesare, M, et al (28 Aug 2018) 'A comprehensive genomics solution for HIV surveillance and clinical monitoring in a global health setting', bioRxiv.Back

Cambridge Enterprise (13 July 2015) 'Solexa: second-gen genetic sequencing'.Back

CDC (n.d.) Malaria's Impact Worldwide.Back

CMO (2016) Annual Report of the Chief Medical Officer: Generation Genome.Back

CLIMB Big Data 'World-leading microbial bioinformatics cyber-infrastructure'.Back

COG-UK (19 Nov 2020) 'CLIMB project receives honours for supporting COG-UK alongside other computing teams', COG-UK Blog.Back

COG-UK (June 2021) In conversation with the “Queen of Virology”, Professor Judy Breuer,COG-UK Blog.Back

Connor, TR, Loman, NJ, Thompson, S, et al (1 Sept 2016) 'CLIMB (the Cloud Infrastructure for Microbial Bioinformatics): an online resource for the medical microbiology community', Microbiology Society.Back

Dudas, G, Carvalho, LM, Bedford,T et al (20 April 2017 ) 'Virus genomes reveal factors that spread and sustained the Ebola epidemic', Nature, 544, 309-15.Back

Gates Foundation (n.d.) Malaria.Back

Goes de Jesus J, et al (7 Aug 2019) 'Acute vector-borne viral infection: Zika and MinION Surveillance', Advances in Molecular Epidemiology of Infectious Diseases.Back

Goodfellow, Ian, email (unpublished) to Lara Marks, 13 Jan 2022.Back

Falconbridge, G (18 March 2021) 'UK's top COVID-19 virus hunter had a long and winding path to the top', Reuters.Back

Faria, NR, Sabino ES, Nubnes MRT et al (25 May 2017) 'Mobile real-time surveillance of Zika virus in Brazil', Genome Medicine, 97.Back

Kwiatkowski, Dominic, Profile.Back

Kundu, S, Lockwood, Depledge, DP et al (Aug 2013) 'Next-generation whole genome sequencing identifies the direction of norovirus transmission in linked patients', Clinical Journal Infectious Disease, 57/3. 403-14.Back

Illumina, 'History of sequencing by synthesis'.Back

Quick, J, Loman, NJ, Duraffour, et al (3 Feb 2016) 'Real-time, portable genome sequencing for Ebola surveillance', Nature, 530/7589, 228–32.Back

Quick, J, Grubaugh, ND, Pullan ST et al (June 2017) 'Multiplex PCR method for MinION and Illumina sequencing of Zika and other virus genomes directly from clinical samples', Nature Protocols.Back

MalariaGen (10 Dec 2008) 'A global network for investigating the genomic epidemiology of malaria: The Malaria Genomic Epidemiology Network, Nature, 456, 732-37.Back

Marks, LV (2015) The path to DNA sequencing: The life and work of Fred Sanger.Back

Marks, LV (25 Feb 2021) 'Nanopore sequencing'.Back

'Global Health Research Unit on Genomic Surveillance of Antimicrobial Resistance, University of Oxford'.Back

NIH (n.d.) 'Human Genome Project: Fact Sheet'.Back

Pallen, M (video) CLIMB Launch: Mark Pallen introduces the CLIMB project.Back

Roser, M, Ritchie, H (Feb 2022) Malaria, Our World in Data.Back

Tyson, JR, James, P, Stoddart, D et al (4 Sept 2020) 'Improvements to the ARTIC multiplex PCR method for SARS-CoV-2 genome sequencing using nanopore', bioRxiv.Back

UKRI (n.d) 'The MRC Consortium for Medical Microbial Bioinformatics'.Back

University of Cambridge (3 Aug 2021) 'Fighting Ebola in Sierra Leone'.Back

UCL 'NOROPATROL: Why do Norovirus pandemics occur & how can we control them?'.Back

Varghese, B (19 March 2019) 'History of the cloud'.Back

Wellcome Sanger Institute (n.d.) Our History.Back

Wellcome Sanger Institute (20 Oct 2004) 'The finished human genome'.Back

Wellcome Sanger Institute (21 July 2017) New global health initiative for genomic surveillance of antimicrobial resistance funded by NIHR.Back

WHO (8 Dec 2022) Malaria fact sheet. Back

Interview transcripts

Note: The position listed by the people below is the one that they held when interviewed and may have subsequently changed.

Breuer, Judith, Professor of Virology, University College London (interivewed 24 Nov 2022, transcript unpublished).Back

Interview with Sir Jeremy Farrar, Director of the Wellcome Trust.Back

Goodfellow, Ian, Professor of Virology, University of Cambridge (interviewed 15 Dec 2022, transcript unpublished)Back

Interview with Professor Dominic Kwiatkowski, Head of Parasites and Microbes Programme at the Wellcome Sanger Institute in Cambridge and Professor of Genomics at University of Oxford.Back

Interview with Dr Joshua Quick, UKRI Future Leaders Fellow, University of Birmingham.Back

Interview with Sharon Peacock, Professor of Public Health and Microbiology in the Department of Medicine, Cambridge University and Executive Director of the COVID-19 Genomics UK (COG-UK) Consortium.Back

Interview with Dr Phil James (Associate Director, Clinical Applications), Dr David Stoddart (Senior Director of Sample Technology, Applications) and Sarah Foxton (Communications), Oxford Nanopore Technologies Ltd.Back

Interview with Dr Anthony Underwood, Head of Translational and Operational Bioinformatics, Centre for Genome Pathogen Surveillance, Oxford.Back

Respond to or comment on this page on our feeds on Facebook, Instagram, Mastodon or Twitter.