Cracking Covid: The history of COG-UK

How a community came together to break the COVID code and brought about a revolution in genomic surveillance

Introduction to the exhibition

Written and collated by Dr Lara Marks, April 2023

Setting the scene

Seen as a defining moment in history, COVID-19 has been one of the most important social, economic and public health challenges the world has experienced since the Second World War. Caused by a new human coronavirus, first identified by Chinese scientists on 7 January 2020, no one knew how the respiratory disease would spread and manifest itself.

Subsequently called SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2), the new virus raised many questions. For example, where did it come from, how did it transmit between people and what was the best protection against it? With the number of cases reaching one million worldwide by April 2, 2020 (Moore), everyone was grappling to address these issues. This was particularly the case for scientists and clinicians who were immediately thrust on to the frontline of the pandemic. With hundreds of thousands, and possibly millions of lives, potentially depending on their efforts, they quickly swung into action.

The full implications and toll of COVID-19 will probably only become clear many years down the road. But it is already apparent that scientists have never before worked so collaboratively and efficiently at such a scale to find a solution as they did for COVID-19. Where this was most obvious was the record speed with which a safe and effective vaccine was developed to combat the disease. Developed in record time, the vaccine was built from scratch to the first dose in a patient in less than a year, thereby helping to save millions of lives and change what the pandemic meant to humanity (GAVI).

COG-UK and Pathogen genomic surveillance

Another scientific achievement, but one which received far less public fanfare at the time, was the unprecedented scale and speed with which genomic surveillance got off the ground. Utilising modern sequencing technologies scientists tracked the transmission of the SARS-CoV-2 virus and identified new variants, some of which may well have had greater capacity to spread and cause more severe disease. The data gathered through this tracking process is key to measures to limit and contain further outbreaks of new COVID-19 variants at the local, national and global level as well as updating diagnostics, vaccines and therapeutics.

Genomic surveillance of the SARS-CoV-2 virus was galvanised by the establishment of the Consortium of COVID-19 Genomics UK Consortium (COG-UK), the subject of this exhibition. Initiated in March 2020 by Professor Sharon Peacock, an academic clinical microbiologist at the University of Cambridge, COG-UK was pivotal in transforming genomic surveillance. Professor Oliver Pybus, a British biologist and expert in infectious diseases at Oxford University, argues that COG-UK helped shift genomic surveillance from what was originally a ‘theoretical backwater’ into a critical tool for making public health decisions (Cyranoski).

Whole genome sequencing (WGS)

Neither the vaccine could have been developed so fast nor COG-UK launched as easily without whole genome sequencing (WGS). This technique was made possible by the DNA sequencing method developed by the British biochemist Fred Sanger in the 1970s (Sanger, 1975). First used to decode the sequence of a virus that infects bacteria (Sanger, 1978), Sanger’s technique went through various modifications thereafter which allowed for automation and higher throughput (Marks).

WGS makes it possible to transcribe the entire set of genetic instructions an organism requires to develop and maintain itself. Originally a slow and time-consuming process, genome sequencing had become significantly faster, cheaper and more efficient by the time COVID-19 appeared. This was aided by major improvements to sequencing machines, facilitated by advances in microfluidics, and huge leaps in computing power brought about through the introduction of distributed computing over the internet which expanded the capacity of data storage and analytics. Collectively known as next-generation sequencing technologies, this opened up the ability to rapidly sequence and read several genomes in parallel (Gertner). Developed as a result of a drive to increase the speed and reduce the cost of WGS, these new technologies have made it easier to analyse specific genes and genetic elements including RNA (PHG).

Just how much the sequencing process sped up can be seen from the fact that the first genome sequence of the SARS-CoV-2 virus was generated remarkably fast. It took a group of Chinese scientists, led by Yong-Zhen Zhang at Shanghai Public Health Clinical Centre and Public Health, just two days to draft the SARS-CoV-2 sequence after receiving a sample from the first patient hospitalised with the pneumonia-like illness in Wuhan. It was then released into the public domain five days later. The speed at which this happened stands in marked contrast to the genome sequence of the coronavirus responsible for the first outbreak of SARS in 2002 which took several months to complete and to be published (CDC).

Figure 1.1: Professor Eddie Holme’s tweet announcing the release of the first genome sequence of the new coronavirus which had just been deposited in GenBank, an online collection run by the National Institutes of Health in the US of publicly available gene sequences. A British born virologist based at Sydney University, Holmes first learnt of the sequence from Professor Yong-Zhen Zhang, a close colleague of his based at Fudan University with whom he had worked for eight years finding and identifying new animal viruses.

Zhang and his team worked out the sequence based on a sample taken from a 41 year old patient who had been hospitalised with a pneumonia-like illness in Wuhan on 26 December 2019. They received the sample packed in dry ice in a metal box on 3 Jan 2020 and managed to sequence the genome by 2am on 5 January 2020 after working non-stop for 40 hours in the laboratory. Quickly realising that the virus closely resembled that of the first SARS virus which infected more than 8,000 people in 37 countries and 774 deaths in from 2002 to 2003, Zhang immediately informed the Chinese government. However, he was ordered not to publish anything (Farrar, p.19).

Concerned about the blocking of the information, Holmes reached out to Sir Jeremy Farrar, a virologist and Director of the Wellcome Trust in London to find a way to get it released. They managed to do so on Jan 11 via a post on Virological.org, an open-source website run by Professor Andrew Rambaut, an evolutionary biologist and trusted contact at Edinburgh University. Just before this Zhang and Holmes submitted a paper describing the sequence to the journal Nature on 7 Jan 2020 (Farrar, p.17; Wu).

The first genome sequence of SARS-CoV-2 was completed very fast. But the sequenced data could only answer the fundamental question about what was causing the new respiratory disease. What it did not reveal was where the virus originated from and how it would evolve. Such issues could only be addressed through genomic surveillance which entailed a substantial number of viral genomes over time isolated from samples taken from as many humans, animals and places as possible and then comparing them against each other. The process necessitated a combination of random and targeted sequencing, the latter focusing on specific populations, such as people with vaccine failure, or situations where for example there was a rapid increase in cases within a geographical area. Few could have imagined when COVID-19 first emerged, just how far this effort would go and how much it would transform genomic surveillance and further knowledge about the biology of the virus.

Genomic surveillance

Combined with epidemiology, genomic surveillance takes advantage of the fact that the replication machinery of a virus is not perfect and frequently incorporates errors into its genome. This means that each virus genome has a signature, which, when sequenced, can be grouped into distinct lineages. Scientists can use this information to draw what are called phylogenetic trees, which are like family trees. Genomic surveillance also requires details of when and where a sample was taken and information about the people whose samples are tested. When matched together with the genomes these data make it possible to establish where there may be connections between different cases of COVID-19 and undetected spread of the virus. Armed with this information, scientists can track the path of the virus through a population and investigate its main points of transmission. The genomic sequence also allows the scientists to identify geographical hotspots. It can also help understand how contagious a variant is by comparing the rate of spread of one variant versus another. In addition it can aid public health measures like vaccination or treatments.

At the start of the pandemic, no one could have predicted just how fast the SARS-CoV-2 virus would evolve. Historically, coronaviruses were known to mutate slowly, acquiring just one or two mutations a month. This is half the rate of change witnessed in influenza and a quarter that of HIV (Callaway). Because of this, many researchers were initially sceptical that genomic surveillance would pick up any meaningful mutations in the SARS-CoV-2 virus. They reasoned that the number of samples would be too small at first which would meant that most of the sequenced genomes were likely to resemble each other and have little diversity. This would make it hard to draw any conclusions (Kupferschmidt). Peacock explains, ‘It's important to remember that at this early juncture people still didn't know what the trajectory was going to look like. It's incredibly easy to look back with hindsight and say, “Well, of course, we knew there were going to be billions of cases”, but we didn't.’ They also felt it was important to take action so they were prepared for any eventuality (Peacock transcript; Peacock email 11 Nov 2022).

The idea of using pathogen genomic sequencing to inform public health was not new. Public Health England (PHE), for example, had deployed it as a method for over a decade to identify and track the spread of bacterial foodborne disease and tuberculosis. Similarly, by 2019 pathogen genomics was an integral part of the infectious disease programme run by the US Center for Disease Control. Pathogen genomics had also been demonstrated to be helpful in determining the likely sources of infection during the latter phase of the outbreak of Ebola in Guinea. It had also proven useful to the development of diagnostics and vaccines and understanding the evolution of an epidemic caused by the Zika virus in Brazil (Armstrong).

COG-UK’s collaborative model

Set up with remarkable speed, COG-UK provided the first proof that pathogen genomic surveillance could be done in near real time at a national level. Scaling up existing sequencing capacities in the UK to meet this goal was an enormous challenge. Just how much of an undertaking it involved can be seen from the fact that before the pandemic, PHE was sequencing approximately 50,000 genomes each year to track tuberculosis and foodborne outbreaks. By contrast in April 2022, COG-UK managed to generate 70,000 SARS-CoV-2 genomes across the UK in just one week (Peacock, April 2022). Getting to this point required an enormous amount of collaboration on many different fronts, all glued together under the COG-UK umbrella.

One of the keys to the success of COG-UK was the distributed model it adopted. The aim was to take advantage of the expertise and equipment already available in different laboratories around the country and link them all together with a shared data system that also incorporated anonymised patient data. Intended to encourage sequencing from the ground up under a central framework, from the start the idea was to be as inclusive as possible so that everyone could contribute in their own way (Myers transcript; Peacock transcript; Parkhill transcript). The process looked like what Peacock describes as ‘a spider diagram; it was extremely complex’ (Peacock, April 2022).

Overall, the network was an innovative partnership of multiple institutions. It included 16 academic partners and four public health agencies conducting research and supporting service delivery; a central sequencing hub run by the Wellcome Sanger Institute, which delivered both sequencing and supported research as well as funding, four Lighthouse Labs providing pillar 2 samples; 14 NHS Trust laboratories providing sequencing and 65 additional NHS and other collaborators (Marjanovic Annexes). What is striking is how COG-UK managed to knit together the contrasting approaches and different goals of each of these partner members and collaborators into a unified sequencing network (COG-UK Nov 2020).

Fig 1.2: Overview of COG-UK participants. Credit: Darren Smith Presentation to COG-UK Together (1:30 mins).

Fig 1.3: List of all bodies involved in COG-UK. Credit: Marjanovic, Annexes .

The pragmatic approach adopted by COG-UK to use expertise and equipment already available in different places enabled it to get sequencing off the ground very quickly and at high volume without having to wait for the infrastructure to be built. This proved particularly effective for responding immediately to the unfolding emergency. At the start COG-UK had at its fingertips 17 sequencing sites with a total of 134 genomic sequencing machines (Marjanovic, Final Report). Having a distributed sequencing network meant that when one laboratory was full to capacity another could step in where needed.

COG-UK originally set out to sequence viral genomes from up to to 230,000 patients, health-care workers, and other essential workers with COVID-19 in the UK (COG-UK July 2020). This was a highly ambitious goal. Prior to COVID, the largest dataset generated for genomic surveillance during an epidemic had been around 1,500 genomes as part of the effort to curb the Ebola virus in West Africa. This had been done over the course of several months between 2014 and 2016 (COG-UK Nov 2020). Yet, COG-UK surpassed this figure within the first month of its operation. In its first four weeks of operation the COG-UK partners had sequenced over 7,000 genomes (COG-UK July 2020). Two months after its launch COG-UK was able to report that it had managed to sequence and analyse 16,670 SARS-CoV genomes, which accounted for more than half of the genomes reported globally. At this point the consortium had 15 sequencing sites on board and noted that its capacity exceeded demand (COG-UK Report #6).

Fig 1.4: Number of SARS-Cov-2 genome sequences reported in either MRC CLIMB or GISAID up to 27 May 2020. The diagram shows the UK’s early lead in sequencing the SARS-CoV-2 virus. Credit: COG-UK SAGE Report #7). .

Over the next few months, COG-UK continued to push genome surveillance to unprecedented levels. Between October 2020 and January 2021, COG-UK doubled the total number of SARS-CoV-2 genomes sequenced from around 100,000 to more than 200,000 (COG-UK Nov 2020; COG-UK Report #2). By September 2021, eighteen months after COG-UK started its operations, the UK had uploaded over one million SARS-CoV-2 genome sequences to the international Global Initiative on Sharing Avian Influenza Data (GISAID) database. This represented 24% of all sequences uploaded worldwide to date (GOV-UK).

Many COG-UK participants described the process as a bit like assembling an aeroplane as it was getting ready to take off. Some equated it to ‘hurtling down the runway’ while still trying to work out how to construct the unprecedented national network and how it would work. What held everyone together was the sense of a common purpose and the potential of genomic data to help inform public measures taken to combat the pandemic. As a COG-UK blog posting poignantly pointed out ‘the reality behind these genomes, and the samples from which they were sequenced’ is ‘a grim one’ of people who had died or were suffering from COVID-19 infection (COG-UK COG-UK Nov 2020).

Fig 1.5: Cartoon building the plane while trying to fly. First used by software developers in Silicon Valley, the analogy of ‘building a plane as you fly’ has become a common analogy in many sectors. Credit: James Baylay (aka James the Scribe).

Capturing the human endeavour behind COG-UK

Driven by goodwill and a desire to help humanity, the high numbers of genomes sequenced by COG-UK could not have been achieved without the hard work of hundreds of individuals working both in and outside the laboratory. For many this meant toiling very long hours and sacrificing time with loved ones at the same time as coping with the wider social and economic upheaval of the pandemic. Their contributions lie at the heart of COG-UK’s achievement. Everyone who took part in the process has their own rich and moving story to tell, reflecting the wide range of skills and backgrounds of the individuals who helped propel the project forward. It is their stories which this exhibition aims to tell.

As years go by, people’s memories of the pandemic will fade. The influenza pandemic of 1918 to 1919, which caused the death of tens of millions, for example, while seared into the consciousness of public health experts received little in the way of public memorials (Sridhar p.17; Honigsbaum). The 1918-19 pandemic provides a cautionary tale about how easily the record of the past can be lost (Jones). Now that the immediate crisis of COVID-19 has passed on, the strong temptation to put it behind us is already in danger of changing the narrative of how the pandemic is understood, and the role that scientists have played in helping to bring it under control is in danger of being forgotten and rewritten. But how scientists experienced and successfully responded to the pandemic has important lessons for the future, particularly for other infectious disease threats.

This exhibition is built upon the generosity of 85 COG-UK participants who volunteered to be interviewed in response to a call out to consortium’s members to help capture its history. Commissioned by COG-UK, Dr Lara Marks, a historian of medicine, undertook the bulk of the interviews and these were subsequently supplemented with some done by Mathilda Watson and Alison Cranage, respectively the archivist and science writer at the Sanger Institute.

At the time of the interviews, conducted mainly between December 2021 and September 2022, the next stage of COVID-19 was still highly uncertain, but many of the participants were beginning to resume their normal routines and their involvement with COG-UK was rapidly becoming a distant memory. Among the people interviewed were the key players in the foundation and management of COG-UK, laboratory leaders and technicians, clinicians involved in infection control, bioinformaticians, researchers embedded in public health agencies, policy makers and funders. Coming from a diverse range of backgrounds and based in different parts of the country, all these people had a diverse range of experiences and made vital contributions to COG-UK in their own way. Another invaluable source for the exhibition were presentations individuals gave to COG-UK's Together event held in October 2021 (COG-UK).

From its inception, COG-UK aimed to be as open and transparent as possible. For this reason, where possible, the words of the people interviewed have been transcribed. Their stories are both highly individual but at the same time part of a massive collective endeavour. Together, the interviews provide an important historical resource for gaining a perspective of how COG-UK functioned as a whole from the bottom up and helped to transform what can otherwise appear to be just abstract and dry numbers of sequences into the lived experience. Critically, they help to bring alive the rich and diverse human reality behind the scientific progress made by COG-UK. In addition to opening up a window into how genomic surveillance evolved at speed at a crucial moment, the interviews highlight what it was like for scientists facing the national emergency and how they managed to triumph in the face of adversity. In this sense they are not only an invaluable source for the history of science but also form a vital record for the wider social history of the pandemic.

Based on a rich collection of interviews taken at a point when the full ramifications of COVID-19 are yet to be understood, this exhibition can only scratch the surface of the full history of COG-UK. As with any history this is likely to shift and change over time. Hopefully, this exhibition together with the interviews provides an important stepping stone to further insights which can help inform the future handling of pandemics.

References

Armstrong, Gl, MacCannell, DR, Carleton, H, et al (26 Dec 2019) ‘Pathogen Genomics in Public Health’, New England Journal of Medicine, 381/26, 2569-80, doi: 10.1056/NEJMsr1813907.Back

Callaway, E (7 Dec 2021) ‘Beyond Omicron: what’s next for COVID’s viral evolution’, Nature News.Back

CDC (14 April 2003) ‘SARS-Associated Coronavirus (SARS-CoV) Sequencing’.Back

COG-UK (2 July 2020) ‘An integrated national scale SARS-CoV-2 genomic surveillance network’., The Lancet: Microbe,1/3.Back

COG-UK (10 Nov 2020) ‘COG-UK passes 100K genomes’, COG-UK Blog.Back

COG-UK (27 Sept 2020) Coverage Report #2.Back

COG-UK (14 May 2020) SAGE Report #6.Back

COG-UK (21 Dec 2020) SAGE Report #7.Back

COG-UK (14 Oct 2021) COG-UK Together: Marking 18 months of endeavour and achievement, recordings.Back

Cyranoski, D (15 Jan 2021) ‘Alarming COVID variants show vital role of genomic surveillance’, Nature News.Back

Farrar J, with Aluja, A (2021) The Spike vs The People: The Inside Story.Back

GAVI (27 June 2022) ‘COVID-19 vaccines have saved 20 million lives so far, study estimates’.Back

Gertner, J (28 March 2021) ‘Unlocking the Covid code’, New York Times Magazine.Back

GOV-UK (11 Oct 2021) ‘UK completes over one million SARS-CoV-2 whole genome sequences’.Back

Honigsbaum, M (25 Oct 2018) ‘Why the 1918 Spanish flu defied both memory and imagination’, Wellcome Collection.Back

Jones, EW, Sweeney, S, Milligan, I, et al (15 April 2021) ‘Remembering is a form of honouring: preserving the COVID-19 archival record’, Facets.Back

Kupferschmidt, K (9 March 2020) 'Mutations can reveal how the coronavirus moves—but they're easy to overinterpret', Science.Back

Marks LV (July 2015) Exhibition: The path to DNA sequencing: The life and work of Frederick Sanger.Back

Marjanovic, S, Romanelli, RJ, Ali, G-C, et al (2022) Evaluation of the COVID-19 Genomics UK (COG-UK) Consortium Final Report.Back

Marjanovic, S, Romanelli, RJ, Ali, G-C, et al (2022) Evaluation of the COVID-19 Genomics UK (COG-UK) Consortium Annexes.Back

Moore, S (28 Sept 2021) ‘History of COVID-19’, News Medical Life Sciences.Back

Peacock, Sharon, email to Lara Marks (unpublished) 11 Nov 2022.Back

Peacock, S (13 April 2022) ‘“An unprecedented collaborative effort”: How COG-UK and our partners came together to do something extraordinary’, COG-UK Blog.Back

PHG Foundation (2015) Pathogen Genomics into Practice.Back

Robson, S (14 Oct 2021) Presentation: COG-UK UCL, COG-UK Together Event.Back

Sanger, F, Coulson A R (1975) 'A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase', Journal Molecular Biology, 94, 441-48.Back

Sanger, F, Coulson, AR, Friedmann T et al (25 Oct 1978), ‘The nucleotide sequence of bacteriophage phiX174’, Journal Molecular Biology, 125/2, 225-46.Back

Sridhar, D (2022) Preventable: How a pandemic changed the world and how to stop the next one.Back

Wu, OF, Zhao, S, Chen Y-M et al (3 Feb 2020) ‘A new coronavirus associated with human respiratory disease in China’, Nature, 579, 265-69.Back

Interview transcripts

Note: The position listed by the people below is the one that they held when interviewed and may have subsequently changed.

Interview with Dr Richard Myers, Head of the Bioinformatics Unit at Public Health England (now UKHSA), Principal Investigator COG-UK.Back

Interview with Sharon Peacock, Professor of Public Health and Microbiology in the Department of Medicine, Cambridge University and Executive Director of the COVID-19 Genomics UK (COG-UK) Consortium.Back

Interview with Professor Julian Parkhill, Department of Veterinary Medicine, University of Cambridge.Back

Respond to or comment on this page on our feeds on Facebook, Instagram, Mastodon or Twitter.