International dimension - COG-UK within the world context

UK - early world leader in SARS-CoV-2 sequencing

From the start, COG-UK was driven by the principle that all its data should be shared. This was underpinned by Professor Sharon Peacock's belief, which was also held across the consortium, that everything should be shared with other countries so that they could replicate or emulate the work undertaken by COG-UK (Maxwell transcript). COG-UK uploaded all of its genome sequence data to two international databases: GISAID and ENA. GISAID stands for Global Initiative on Sharing Avian Influenza Data, and ENA stands for Europe Nucleotide Archive.

Figure 14.1: Cartoon by Alex Cagan depicting the connectivity of COG-UK data to the rest of the world.

GISAID was founded in 2006 after 70 leading flu specialists raised concerns about the uphill challenge of getting nations to share data during a serious outbreak of Asian avian influenza A (H5N1) in 2004. Part of the problem stemmed from a concern among some nations that they would not get appropriate credit or benefit from the genomic information they generated (Khare). Taking 18 months to secure international consensus about how the data could be shared, in 2008 GISAID formally launched a publicly-accessible database to act as a repository of genomic data from flu virus. As soon as COVID-19 hit, the GISAID team expanded the database to enable laboratories around the world to upload SARS-CoV-2 virus genome sequence datasets. Thereafter, GISAID became highly used for depositing SARS-CoV-2 genomic data (Maxmen).

Figure 14.2: Tweet demonstrating the mounting number of SARS-CoV-2 sequences uploaded to GISAID from around the world in the first few months of the pandemic. The number continued to climb, reaching more than one million SARS-CoV-2 genome sequences by April 2021 and by August 2021 had reached 3.1 million (Chen; Maxmen).

Alongside GISAID, COG-UK regularly transferred and uploaded raw sequencing reads into ENA, Europe's primary nucleotide-sequence repository. First set up in 1980, ENA exchanges data with both the DNA Data Bank of Japan and GenBank, a public database set up by the National Institutes of Health in the United States. COG-UK data was also collected by Nextstrain. Founded by a group of collaborators in 2015 originally to track flu viruses to help with public health measures and surveillance, Nextstrain maintains a database of different viral genomes, a bioinformatics pipeline for phylodynamics analysis and an interactive visualisation platform.

COG-UK rapidly became a notable global leader in genomic sequencing of the SARS-CoV-2 virus. In May 2020, after two months of getting its network up and running, COG-UK reported to SAGE that 'With 17 sequencing centres now active, COG-UK has passed a significant milestone by sequencing and analysing more than 20K SARS-CoV-2 genomes to date, corresponding to 56% of the global total number of genomes' (COG-UK SAGE Report #7). The number of genomes sequenced and analysed continued to grow in the following months reaching more than 40,000 by August 2020, more than 50 per cent of the global total. That month COG-UK emphasised to SAGE: 'Beyond generating this unprecedented viral genomic dataset, COG-UK has already had a substantial impact on national and global efforts to understand and tackle the SARS-CoV-2 pandemic, as demonstrated by the suite of dedicated tools developed by COG-UK researchers, the growing list of ground-breaking publications using COG-UK data and tools, and by the increasing focus on integrating genomic insights into infection control decisions' (COG-UK SAGE Report #10).

The importance of COG-UK is borne out by the words of Professor Sir Mike Stratton, who at the start of the pandemic was the Director of the Sanger Institute. In February 2021 he wrote: 'The UK has become the world's microscope for COVID-19. With large scale genomic sequencing, we can see how the virus is evolving day by day. We can monitor for new variants and observe them as they move. As vaccines are being rolled out, we are also the world's binoculars, we can see what is coming over the horizon; how the virus will respond and evolve in response to those vaccines' (Sanger Institute Feb 2021).

Figure 14.3: Tweet put out 19 May 2020, by Professor Matt Loose, COG-UK Principal Investigator at the University of Nottingham. It illustrates the early lead the UK had in sequencing COVID-19 internationally.

The impact of COG-UK internationally is also highlighted by Dr Emma Hodcroft, a molecular epidemiologist at the University of Bern in Switzerland and co-developer of Nextrain. In January 2021 she described the UK's sequencing work to the New York Times as 'the moonshot of the pandemic' [Zimmer]. Her respect for COG-UK's effort is revealed by her following words: 'When I heard the first comments from people that it actually seemed like the UK sequencing was something that was going to happen, I found it really amazing because the interest in getting into sequencing was not high for a lot of governments. But with COG-UK, it really sounded like the UK was going almost all-in on this, which I had not expected if I'm honest'. What COG-UK accomplished with just a small amount of money, Hodcroft says, was 'a scientists' dream' (Hodcroft transcript).

Figure 14.4: Number of genomic sequences of SARS-CoV-2 by country as of 10 Dec 2020. By this point 49 countries had published more than 100 genomic sequences with the UK (38.9%) and the USA (27.7%) accounting for the total 93,817 published genomic sequences. The number of genome sequences per reported COVID-19 cases varied between countries, with Iceland achieving the highest proportion (30%) of all cases. Credit: Fig 1, Furuse).

Efforts to roll out national sequencing in the United States saw funding being made available in February 2021.That month, Biden, the US President, announced an investment of nearly US$200 million to 'identify, track, and mitigate emerging strains of SARS-CoV-2 through genome sequencing'. The aim of this funding was to increase the sequencing capacity of the Centres for Disease Control (CDC) from 7,000 samples per week to approximately 25,000 [White House]. In April 2021, the SARS-CoV-2 Sequencing for Public Health Emergency Response, Epidemiology, and Surveillance (SPHERES) project was launched in the United States. Led by the CDC, SPHERES is made of '60 federal, state, county, local public health laboratories, several large regional and national clinical diagnostic corporations, and academic and non-profit leaders in pathogen genomics, bioinformatics, and public health from across the country' (CDC).

Over time, the UK continued to provide a substantial number of the sequences uploaded to GISAID [figure 14.4]. By February 2022, 240 countries had submitted SARS-CoV-2 sequence data to GISAID, of which 2 million genome sequences were uploaded by the UK. At this point the UK had sequenced samples from over 10 per cent of all COVID-19 cases (GOV UK).

Figure 14.5: Number of SARS-CoV-2 genomes available from GISAID portal as of 26 February 2021. Only countries for which 1000 or more genomes are available are represented. Blue: 'novel' genomes, submitted between 24 January and 26 February 2021. Gray: genomes available in GISAID before 24 January 2021. Credit: COVID-19 Data portal.

According to Hodcroft the national genomic sequencing established through COG-UK was a major game-changer because it showed what was possible. As she says, 'Coming up with an unheard of sequencing programme that links together sequencing from across the UK and puts it in a database and organises all this metadata and makes that data available publicly - that was never a given. That was a wishlist.' She believes that it was an important inspiration for other countries. Importantly, 'COG-UK set a framework for other countries to try and pursue' (Hodcroft transcript). The same view is expressed by Professor Ian Goodfellow who participated in COG-UK from Cambridge University. He believes that where COG-UK had an impact internationally 'was making clear that, if academics and Public Health people get together, with the right resources, there is value in doing sequencing on a national level (Goodfellow transcript).

Hodcroft argues that the early sequencing undertaken in the UK and by other countries such as Denmark, proved invaluable for working out what was happening in many countries in Europe. For her, the British and Danish sequencing was incredibly useful when it came to the detection of new variants and how they started to spread. She explains, 'Because in countries with less sequencing, you might pick up a sample here and there, but you might not be able to tell if it is actually transmitting in your country. With more sequencing, we could pick that up. For example, with Delta we could pick up clusters in the UK pretty early, and clusters in Denmark pretty early.' This was particularly helpful when it came to convincing governments about what was happening with the pandemic (Hodcroft transcript).

Genomic sequencing around the world

What was striking from the start of the pandemic was just how quickly the SARS-CoV-2, first spotted in a single city in China, was able to spread across the world. The speed at which it spread globally was related in large part to the interconnected nature of the world. Before COVID-19, nearly 200,000 flights were crisscrossing the world each day. With so much human movement, it took a matter of days for the virus to spread beyond the confines of China. What also contributed was the fact that many people were unaware they were infected because they had no symptoms (Bailey).

In this situation genomic sequencing was crucial to understanding the way the virus evolved and spread across borders. But this was hindered by the fact that the ability to carry out SARS-CoV-2 genomic sequencing and surveillance varied markedly across countries. As Hodcroft says the 'pandemic really threw into sharp relief how non-equal sequencing is around the world and how limiting that really was' (Hodcroft transcript).

Figure 14.5: Global SARS-CoV-2 genomic surveillance, sequencing availability, and publicly deposited genomic data as of 20 August 2021. (A) The global distribution of three strategies of SARS-CoV-2 genomic surveillance. (B) The global availability of SARS-CoV-2 sequencing, countries with a high level of availability represent the ability to perform in-country SARS-CoV-2 sequencing alone. (C) The weekly number of publicly deposited SARS-CoV-2 genomic data by region. (D) Cumulative number of publicly deposited SARS-CoV-2 genomic data by countries as of 20 August 2021. (E) The weekly proportion of infections sequenced by region. (F) Cumulative proportion of infections sequenced by countries as of 20 August 2021, defined as the proportion of cumulative isolates sequences to the cumulative confirmed cases. Credit: Fig 1, Chen.

In August 2021 an analysis of 5.1 million SARS-CoV-2 sequences from samples collected between 1 December 2019 and October 2021 downloaded from public repositories revealed that genomic surveillance and data sharing varied greatly across the globe (Figure 14.5). This revealed 'Globally, 38.1% of countries (45) had performed a high level of routine genomic surveillance, 14.4% (17) implemented a moderate level of routine genomic surveillance, 21.2% (25) implemented a low level of routine genomic surveillance, and 26.3% (31) had limited genomic surveillance. The remaining countries (76) had no data on genomic surveillance strategy identified.' Overall Europe and America uploaded the majority of SARs-CoV-2 sequences to public repositories (Chen).

Some of the difficulties other countries experienced getting genomic sequencing off the ground for COVID-19 are described by Hodcroft. She recalls, 'Something I tried to work on very early in the pandemic was to help laboratories in different places. We had contacts from labs around the world, where they said, we have the expertise, we have someone here who knows how to do sequencing, but we don't have the reagents, we don't have the machine, can you help us? Can you get some money? And I tried to do some fundraising but unfortunately, it turns out it's really complicated and really hard to raise money.' Hodcroft says, 'it was a little disheartening to know that there were places that we could have been getting sequences much earlier, and we just had really nothing set up on any scale, globally, EU, and elsewhere to support getting countries sequencing faster' (Hodcroft transcript).


For the first year of the pandemic most of the SARS-CoV-2 genomes produced in Africa were generated by 38 out of the 54 African countries, but subsequently more African countries were able to participate as a result of capacity building and the provision of resources by the Africa Centres for Disease Control and Prevention (Africa CDC) and the World Health Organization Regional Office for Africa (WHO AFRO). This led to the launch of the Africa Pathogen Genomics Initiative in October 2020, with an initial investment of US$100 million [Xavier; African Union]. Just how much capacity increased on the continent can be seen from the number of countries in the African Union with the necessary sequencing infrastructure, which reached 39 by October 2022, an increase from 15 countries at the start of the pandemic (Kwon).

Figure 14.7: Organisation of sequencing in Africa shared with COG-Train by Gerard Mbowa. Credit: Peter Thomas-McEwen.

By March 2022 the African consortium had sequenced more than 100,000 SARS-CoV-2 genomes. With this data Professor Tulio de Oliveira, a bioinformatician at Stellenbosch University and the University of KwaZulu Natal in South Africa, and his colleagues were able to map and document when and how different variants were introduced into Africa. Importantly the data revealed that 'most variants were imported into Africa more often than they were exported from the continent' (Kwon; Tegally). For de Oliveria, the work also marked a milestone because it demonstrated that African scientists could work together to produce 'high-level science', whereas 'Before, it was almost the norm that African scientists would work with a northern partner to produce that kind of level of science'. Just how much progress the African consortium has made is that their work helped flag up two of the world's five variants of concern (Beta and Omicron) (Kwon).

Following the discovery of the Beta variant, de Oliveria helped set up the COVID Variant Consortium in South Africa which includes 500 scientists. Conferring together each week, de Oliveria says these scientists not only carry out genomic surveillance to identify new variants of concern but also have the laboratory capacity to 'very quickly estimate vaccine effectiveness against the variant and the change in neutralisation and clinical severity' (Samarasekera).

COG-UK's support for other countries

Recognising the importance of widening the scope of sequencing beyond the UK, COG-UK took an active role in advising other countries how to set up their own national sequencing schemes. This was important because at the start of the pandemic genomic sequencing tended to be sporadic and highly localised. A number of countries also drew on COG-UK software tools and its data to get going (Marjanovic).

One of the countries it advised early on was Canada. Specifically it provided assistance to Genome Canada, a non-profit organisation, to launch the Canadian COVID-19 Genomics Network (CanCOGeN). Like COG-UK, CanCOGeN is a 'consortium of Canadian federal, provincial and regional public-health authorities and their health-care partners, academia, industry, hospitals, research institutes and large-scale sequencing centres'. Set up in April 2020, CanCoGNn received $40 million in federal funding. Established around the same time as COG-UK, by September 2021 CanCOGeN had sequenced more than 210,000 genomes (Public Health Forum).

In addition to offering advice, several COG-UK centres helped support sequencing in a number of countries. For example, the Quadram Institute collaborated with teams in Ireland, Lebanon, Tunisia, Nepal and Zimbabwe. In the case of Zimbabwe, the Institute already had a long-standing relationship with researchers in the National Microbiology Reference Laboratory in Harare (Quadram Institute, Jan 2021). As a result, these researchers reached out to the Quadram Institute early on to help them start sequencing the SARS-CoV-2 virus. This data helped the government determine which vaccine to roll out in the wake of the discovery of the Beta variant (Trotter transcript).

The Quadram Institute also collaborated with teams in Bangladesh, Pakistan, Lebanon and The Gambia, which arose from pre-existing collaborations with researchers in each of the countries. By July 2021 the Institute had sequenced 33,000 SARS-CoV-2 genomes and assisted eight countries to set up sequencing and provide them with the analytical capability to address the ongoing pandemic. Alongside this, it also provided sequencing and analytical training for 133 scientists from 32 countries (Quadram Institute, July 2021).

The Sanger Institute also assisted Bangladesh as part of a consortium with researchers from University of Bath and the Bangladesh-based institutes, including the Institute of Epidemiology, Disease Control and Research and Institute for Developing Science and Health Initiatives plus some other institutes. Working directly with the Bangladeshi government, in the first phase the collaborators sequenced and analysed 391 SARS-CoV-2 samples collected from positive samples taken from Bangladeshi testing facilities between March and July 2020 and then a further 85 SARS-CoV-2 samples collected between November 2020 and April 2021. Overall, the data provided insights into how the virus evolved and made it possible to track the different times and locations where new lineages emerged. Armed with this knowledge, the Bangladeshi government was able to implement more effective interventions to curb the spread of localised outbreaks to other areas (Sanger Institute, Sept 2021; Cowley; Melvin).


As part of its mission to share its expertise and knowledge, COG-UK helped launch a global educational initiative to provide open-access learning in SARS-CoV-2 genomics. Called COG-Train, the scheme was funded by the Wellcome Trust and the Foreign, Commonwealth & Development Office and jointly led by COG-UK and Wellcome Connecting Science. The programme was designed to provide a series of online open-access courses on all aspects of SARS-CoV-2 sequencing together with week-long intensive virtual training courses, short expert workshops and multiple virtual classrooms for public health officials and other stakeholders to learn together across many countries (COG-UK).

Two of those who helped set up and manage COG-Train were Dr Leigh Jackson and Peter Thomas-McEwan. According to Jackson, COG-Train grew out of the recognition that 'it was becoming unsustainable for COG to continually engage with individual countries and commit to an hour-long meeting with maybe 10 people from the UK giving directed advice to a single country, and then that happening 20, 30 times over'. Reflecting on the time and energy taken up in helping countries, Thomas-McEwan also comments that it was 'unmanageable in the long run'. It quickly became apparent that it would be much 'better to build that capacity in-country' (Jackson and Thomas-McEwan transcript).

From the outset, the objective of COG-Train was to provide resources to people to learn together. Rather than seeking to impose the model developed in the UK, COG-Train set out to look at the different ways people have tackled sequencing. As Jackson explains, 'We're not saying that the way we did it was the best. In fact, most of our content from the COG consortium members is how much they'd change if they had the time to do it again. It's more people learning from our mistakes, but also, let's showcase what other people are doing and how they've tackled the same challenges in vastly different settings'. The COG-Train team deliberately distanced themselves from referring to the UK. Some of this reasoning is explained by Jackson: 'We're a training initiative in coronavirus genomics, we're not aligning ourselves to a country, to a background, to a nation' (Jackson and Thomas-McEwan transcript).


African Union, Africa CDC (12 Oct 2020) 'US$100 million Africa Pathogen Genomics Initiative to boost disease surveillance and emergency response capacity in Africa'.Back

Bailey, L (19 Feb 2021) 'For better or worse , the COVID-19 pandemic shows our interconnected world', Population Education.Back

CDC (9 April 2021) 'A National Open Genomics Consortium for the COVID-19 Response'.Back

Chen, Z, Azman, AS, Chen, Z, et al (8 Sept 2021) 'Landscape of SARS-CoV-2 genomic surveillance, public availability extent of genomic data, and epidemic shaped by variants: a global descriptive study', medRxiVBack

COG-UK (28 May 2020) SAGE Report #7.Back

COG-UK (11 Aug 2020) SAGE Report #10.Back

COG-UK (n.d.) About COG-Train.Back

COVID-19 Data portal (10 March 2021), Italy.Back

Cowley, LA, Afrad, MH, Rahman, SIA, et al (8 Sept 2021) 'Genomics, social media and mobile phone data enable mapping of SARS-CoV-2 lineages to inform health policy in Bangladesh', Nature Microbiology, 6, 1271-78.Back

GOV UK (10 Feb 2022) 'UK completes over 2 million SARS-CoV-2 whole genome sequences'.Back

Fauver, JR, Petrone ME, Hodcroft, E, et al (28 May 2020) 'Coast-to-coast spread of SARS-CoV-2 during the early epidemic in the United States', Cell, 181/5, 990-96, e25.Back

Furuse, Y (Feb 2021) 'Genomic sequencing effort for SARS-CoV-2 by country during the pandemic', International Journal of Infectious Diseases, 103, 305-07.Back

Khare, S, Gurry, C, Freitas, L, et al (3 Dec 2021) 'Back

Kwon, D (3 Oct 2022) '100,000 coronavirus genomes reveal COVID's evolution in Africa', Nature.Back

Marjanovic, S, Romanelli, R, Claire-Ali, et al (2022) Evaluation of the COVID-19 Genomics UK (COG-UK) Consortium, Final Report, RAND Europe.Back

Maxmen, A (23 April 2021) 'One million coronavirus sequences: popular genome site hits mega milestone', Nature.Back

Melvin (14 Sept 2021) 'To curb the spread of Covid-19, restrict intercity travel as soon as a lockdown is announced', University of Bath Press Release.Back

Quadram Institute (26 Jan 2021) 'Mapping the spread of SARS-CoV-2 in Zimbabwe using genomic epidemiology', Blog.Back

Quadram Institute (11 July 2021) 'Quadram goes global on COVID-19'.Back

Public Health Forum (25 Oct 2021) 'Sequencing the Crisis: How genomics morphed from a COVID-19 research tool to a critical part of the pandemic response'.Back

Robinshaw, JD, Alter, SM, Solano, JJ, et al (27 July 2021) 'Genomic surveillance to combat COVID-19: challenges and opportunities', The Lancet.Back

Samarasekera, U (6 Aug 2022) 'Tulio de Oliveira: collaborating to boost science in Africa', The Lancet, 400, 423.Back

Sanger Institute (5 Feb 2021) 'Sequencing COVID: our latest data', Wellcome Sanger Institute Blog.Back

Sanger Institute (14 Sept 2021) 'Evidence-based national policies are essential to curb local COVID-19 infections'.Back

Tegally, H, San, JE, Cotten, M, et al (15 Sept 2022) 'The evolving SARS-CoV-2 epidemic in Africa: Insights from rapidly expanding genomic surveillance, 378/6615.Back

The White House Fact Sheet (17 Feb 2021) President Biden Announces New Actions to Expand and Improve COVID-?19 Testing.Back

Xavier, JS, Moir, M, Tegally, H, et al (22 Dec 2022) 'SARS-CoV-2 Africa dashboard for real-time COVID-19 information', Nature Microbiology, 8, 1-4.Back

Zimmer, C (6 Jan 2021) 'U.S. is blind to contagious new virus variant, scientists warn', The New York Times.Back

Interview transcripts

Goodfellow, Ian, Professor of Virology, University of Cambridge (interviewed 15 Dec 2022, transcript unpublished)Back

Interview with Dr Emma Hodcroft, Molecular epidemiologist, Institute for Social and Preventive Medicine, University of Bern, co-developer of Nextstrain.Back

Interview with Professor Matthew Holden, Director of Impact at St Andrew’s University.Back

Interview with Dr Leigh Jackson (Lecturer in Genomic Medicine, University of Exeter and Scientific Lead, COG-Train) and Peter Thomas-McEwen (COG-Train Programme Manager, University of Cambridge).Back

Interview with Professor Patrick Maxwell, Physician and the Regius Professor of Physic at the University of Cambridge.Back

Interview with Sharon Peacock, Professor of Public Health and Microbiology in the Department of Medicine, Cambridge University and Executive Director of the COVID-19 Genomics UK (COG-UK) Consortium.Back

Interview with Dr Alex Trotter, Bioscience researcher, Quadram Institute.Back

Respond to or comment on this page on our feeds on Facebook, Instagram, Mastodon or Twitter.