The Wellcome Sanger Institute becomes the central sequencing hub

The Wellcome Sanger Institute starts to discuss plans for sequencing COVID-19

Just as Peacock was beginning to get the momentum rolling behind COG-UK, the Wellcome Sanger Institute was also thinking about what it could do to help with the pandemic. According to Dr Cordelia Langford, the Director of Scientific Operations at the Sanger Institute, she and her colleagues first began thinking about how to use their expertise, equipment and processes for COVID-19 in January 2020. Continuing over the next weeks, these discussions had intensified by March. During this time, Langford remembers having brainstorming sessions with Sir Michael Stratton, the Director of the Institute, Dr Martin Dougherty, the Chief Operating Officer, and other scientific leads to think through how to tweak the Institute's technology, research activities, data analysis, algorithms and approaches so that they could be applied to the SARS-CoV-2 virus. Because Sharon Peacock has strong ties with the Sanger Institute, they also compared notes with her (Langford transcript).

One of her vivid memories from this time was a conference call the team had with Sir Patrick Vallance in which they all crowded together in one office. Happening just before lockdown, Langford's perception of Vallance during the call was that he 'was very open and keen to explore what sequencing may be able to provide. He was extremely grateful for being able to get an understanding of what it is that sequencing at scale may be able to offer.' She says the call was very much 'a seeding discussion' to let him and others know what was feasible, with not just the technology, but also the sort of scale, and the processes that were running at Sanger, and in some cases elsewhere around the world.' Ahead of the call, the team had 'built some feasibility models' to work out how many samples they could sequence every week. Not having any idea what would ensue, Langford's original vision was that the Sanger Institute might contribute a 'massive effort' for three or four months' (Langford transcript).

Figure 8.1: Photograph of Dr Cordelia Langford. Credit: Wellcome Sanger Institute. Langford first started working at the Sanger Institute when it was founded. She was part of the team who took part in the original Human Genome Project. Appointed director of Scientific Operations in 2017, Langford oversees a team of 300 people. Reflecting on her long period at the Sanger Institute she believes that the Institute's years of investment in equipment, knowledge, people and strong collaborative networks was pivotal to contribute to COG-UK. As she says 'we can't under-sell that foundation that enabled us to almost fly in the moment that we had almost been preparing for over so many years (Langford transcript).

Initially, Langford thought the Sanger Institute could develop its sequencing capacity for performing diagnostic tests, for which there was a desperate need at the start of the pandemic. She says that 'It was only when we started talking about the technicalities of diagnosis that it became more apparent that the sort of sequencing that we probably would be doing would be much better applied to genomic surveillance overall, rather than diagnosis specifically.' What helped crystallise her thinking was her discussions with other senior researchers at Sanger, like Professor Nick Thomson, who is now interim Head of the Parasites and Microbes Programme.

As soon as Kwiatkowski started to hear about the outbreak of the new pneumonia-like illness in Wuhan, like many other infectious disease experts, he began 'watching the numbers on a daily basis and doing calculations and running Excel spreadsheets and saying, “heck, this really looks worrying”'. His concerns were not eased by his discussion with Sir Jeremy Farrar, the director of Wellcome, who happened to pass through the Sanger Institute in February 2020. Worried by what he was hearing, Kwiatkowski recalls saying to Farrar that while the Sanger Institute 'had decided that viruses were not our expertise and had not planned to go down that route strategically, that this was somewhat different and we would be happy to contribute.' Importantly, Kwiatkowski stressed to Farrar that the Sanger Institute 'not only had the sequencing technology' which many labs had, but had been 'thinking very hard about how you develop large scale, operational pipelines that let you drive this at a very different scale from routine sequencing and how you scale the whole process up. This was not just in terms of the process of the sequencing per se, but the whole information flow, the samples coming in and joining that up with all the other data sources that are needed to to actually produce the end data that's actually usable and interpretable from a public health perspective' (Kwiatkowski transcript).

The Sanger Institute's role within COG-UK begins to take shape

All of the discussions highlighted by Langford and Kwiatkowski took place ahead of the COG-UK first meeting in London on 11 March 2020. Both of them attended the meeting together with Dr Matt Berriman, the senior leader of the Parasites Genomics Group. Langford says that they went along hoping to offer the Sanger Institute's 'operational leadership' which included 'building and scaling laboratory pipelines', equipment, large numbers of staff, 'data generation pipelines' and 'good relationships with commercial suppliers'. This they were keen to do alongside contributions suggested by other organisations represented at the meeting (Langford transcript).

What quickly became very apparent to Kwiatkowski at the meeting was that, unlike Loman and others, the Sanger Institute 'could not take a sample, push it quickly through a Nanopore system, get a result and get it back to the doctors on the ward within a very short time, turnaround times in the order of a day' because its 'systems were not geared up for that.' As he says, 'We had much larger scale systems that were much more clunky. Also, we were working on the Illumina platform, not the Nanopore platform. Although actually both can work well, the Nanopore is inherently better attuned to a small lab setting next to a test lab and getting the data back very quickly.' Given this situation, a lot of attention was paid to what type of value the Sanger Institute's sequencing could provide if it could not get the result back within a very short space of time. This was because, as Kwiatkowski remembers, 'at that point a lot of the use cases were around using the sequence data to look at outbreaks, particularly in hospitals and care homes where there might be a cluster of cases'. In that situation an alert would go out, and then the samples would need to be sequenced and resulting data pushed back as quickly as possible to the people investigating the outbreak. Not based in a hospital and having long turnaround times, the Sanger Institute could not easily contribute to such a scenario (Kwiatkowski transcript).

Beyond the management of outbreaks, the meeting participants were also interested in a much wider question which was how to systematically collect data that would allow them to explore something that was 'not necessarily announcing itself as a cluster, but due to a lineage emerging somewhere in the country and growing more rapidly' than expected. Kwiatkowski explains, 'In those days, we didn't necessarily have all the language for it. But we could see that identifying genome sequences that looked spookily like each other, that were expanding rapidly, even if there wasn't a very obvious epidemiological event, might signal either a massive super spreading event or some more transmissible lineage, or something else that was worrisome.' This did not require 'a comprehensive system', but it did need some way to 'provide even coverage over the country.' For this they would need to have 'a certain number of samples coming in each week, from a well distributed network of sentinel sites, or points of surveillance around a country' (Kwiatkowski transcript).

What emerged at the meeting was that there were many well-equipped laboratories in regional centres that could sequence samples from particular areas of the country. But, Kwiatkowski remembers, 'it was also clear that there were many parts of the country that weren't covered and there was clearly a need for some sort of surveillance in parts of the country that didn't have these labs next to hospitals.' Based on this, 'it was agreed that it'd be useful for Sanger to provide a framework for taking samples from places that didn't have the sequencing labs, and doing some sort of national surveillance.' The general consensus was that the Sanger Institute 'would focus on national surveillance, which would have a longer turnaround time, and would also act as an overflow if it took a while to get samples through, if the local labs were overloaded' (Kwiatkowski transcript).

According to Kwiatkowski the idea of systematic national longitudinal surveillance was far better suited to what the Sanger Institute could offer and mirrored some of the priorities outlined in its funding bid to the Wellcome Trust back in December 2019. He argues the concept was 'more interesting to us than having an awful lot of samples one week that focus on problem place X and in the next week place Y. The latter strategy is fine if what you're explicitly trying to do is move in very, very quickly and look at an outbreak that happens to be going on in Leicester or Western Super Mare or wherever. But the former strategy is particularly useful if you're trying to understand the underlying dynamics of the whole viral population, pathogen population. For example, can you see mutations taking off at a particular rate and out competing each other? Can you see certain lineages coming up in some areas, but not in others? Can you see things moving around the country?' (Kwiatkowski transcript).

The Sanger Institute puts in place its sequencing operation

Following the meeting in London, one of the first things the Sanger Institute did was to assemble some teams to work on the project for COG-UK. Often assigning bird names to its large projects, the Institute named its effort for COVID-19 Project Heron. The operation marked a historical moment for the Sanger Institute. Previously, scientific teams had their own projects, grouped under themes and underpinned by large-scale, high-throughput scientific operations and facilities, including in sequencing and cellular biology. Now everyone was coming from different places with different skill sets to all pull in the same direction. Dr Naomi Parks, who helped build the pipeline for the project says 'It was incredible how everyone from across different teams came together with a single goal. The focus and drive that brought was electric, and there has been no other time that this has happened, as normally we work on many different projects' (Mobley; Aigrain transcript; Goncalves transcript; Cutts transcript).

Initially, the project started with a handful of people led by Langford gathering in front of a whiteboard in an office to plan out what was needed. From the start, Langford established a number of guiding principles. The first, and probably the most important was to make sure all the staff taking part in the operation were fully protected. For her it was crucial to 'work out how we could work in a way that didn't in any way undermine anyone's own health'. Another was for the Sanger Institute to design and scale up its pipeline to use equipment that was not in high demand by other people. This was driven by her awareness of the many supply-chain issues at the start of the pandemic which surfaced in the wake of a lot of scaling up 'especially at testing centres, where they were using certain types of equipment. There were these national calls for bits of equipment. And the Army were then coming out and then just transporting it to these testing centres' (Langford transcript).

To get the ball rolling with the whiteboard, Langford remembers she just wrote up 'in a green pen what roles we needed in a project team to make sure that we're thinking of everything that we need to get this moving and working. We brainstormed that together and then said, let's bring in a couple of other people and brought them in to ask “Who have we forgotten? What have we forgotten? Are you willing to be part of this team? Your name's against, you know, number three; what do you reckon?” It was everything from sample logistics, to automation, to sample tracking, handling data, communications, human resources. We knew that we were about to go into a lockdown and we needed to have support from human resources. Procurement, our stores department, health and safety. You name it, everything was up on that whiteboard'. This brainstorming planning then got expanded out on the walls with special paper that can be taken off. Langford explains, 'We just had loads of columns on the wall that had different subject titles… Whose brains do we need to pick to kind of help solve these problems? Which collaborators and companies do we need to get in touch to help us scale [up]?' (Langford transcript).

Figure 8.2: Doodle of a heron drawn by Dr Petra Korlevic, research fellow, Wellcome Sanger Institute. Going on to become a logo for Project Heron, Korlevic recalls how she came to create it. She says 'At the beginning of the pandemic we were so confused as to why it was called Project Heron and where it came from. So in the first few weeks of absolute boredom, sitting at home, not knowing what is going on with the world, I ended up doodling a little logo of a heron destroying a COVID virus and somehow it circulated and took over everything and now you can see it all over campus (Korlevic transcript). Photograph of Korlevic, credit: Dan Ross/Wellcome Sanger Institute.

One of those called in to help was Dr Rachel Nelson, the Head of the Cellular Generation & Phenotyping Core Facility. She remembers on 12 March 2020, 'I turned up late to a division meeting, and when I walked in I was told that the meeting was finished and I needed to go and speak to my boss. I was like, I didn't turn up that late! When I went into her office she said, 'Sit down.' And she told me what our role was, about what the Sanger was about to become part of, and what we needed to do within a matter of weeks.' All of it felt much like a whirlwind to Nelson, recalling she went from doing 'nothing related to COVID' to all of a sudden being told 'Grab your coat, we're going to Addenbrooke's' and then being 'in a meeting room with senior professors talking about what it was that we were going to do' (Nelson transcript).

Adding to the intensity of the situation, Nelson remembers the same day being told the Sanger Institute was going to shut down for an estimated three months because of lockdown and the need to keep the COG-UK work confidential. This was not easy. For her the next two weeks 'was an interesting time as a manager, kind of surreptitiously trying to pull a team together whilst giving everyone the message to go home, except not you, and not you, and not you'. Essentially Nelson found herself having to manage 'a completely new team in-house', which she called 'the Dream Team'. Many of them were people she pulled together from her department who she 'knew were good at automation, LIMS [Laboratory Information Management System] development' and some came from the pathogens team who had dealt with previous outbreaks like swine flu and had previously worked at Public Health England and Addenbrooke's Hospital. She was also joined by a number of volunteers who responded to a call put out by the Institute. Coming from all parts of the Institute, Nelson points out that people who put themselves forward as volunteers were all regarded as taking a risk at that point. As she states, they were coming out of their houses 'despite the fact that the country has been put on lockdown, to come and work in a facility working with an active virus'(Nelson transcript).

Figure 8.3: Photograph of Dr Rachel Nelson. Credit: Dan Ross / Wellcome Sanger Institute (Midgley). Set up originally in 2012, to help generate hundreds of induced pluripotent stem cells for the Human Induced Pluripotent Stem Cell Initiative, the Cellular Generation and Phenotyping facility (CGaP) now has nearly 40 staff. Prior to COVID, Nelson had spent four years deploying her skills in process and change management to foster a very agile and adaptive team within CGaP so that they could be as versatile as possible and set up new projects at speed. Nelson argues this gave them a strong foundation to quickly pivot to processing COVID samples for Project Heron (Nelson; Nelson transcript).

Another person who was embedded in the early planning was Tanya Brooklyn, a skilled project manager who had previously helped with some of the Institute's other projects. Called in on 12th March 2020, Brooklyn remembers 'I literally walked into a room that was like a war room with sticky notes all over the walls. Post-It Notes, writing on sheets on the walls. I was introduced to a few people and responded, “Yes, it'll be fine. See you tomorrow.”'. She recalls, 'It wasn't until the 13th of March, the next day, that the penny dropped what it was that we were going to be a part of, and then helping to sort out structure and organise that internally' (Brooklyn transcript).

What was different for Brooklyn from the previous projects she had helped manage at the Sanger Institute, was that scale of sequencing envisaged as well as the immediacy of the work. As she puts it, 'There was very little time for anybody in the COG network to really think and plan. It was very much hitting the ground running, making sure that people were coordinating and speaking when they needed to speak, and just getting everybody together'. Brooklyn remembers 'It was like herding cats sometimes. But it was enjoyable. From my perspective, I've always said to my staff we are facilitators. We are just there to make sure that these people get from here to there, and that it delivers a successful outcome. We don't have to understand the granularity always of what they're doing, but we have to make sure that we get them from A to B. It's that whole giving support, facilitation, taking the burden away from people if they're going, “Oh, we've got these 20 things we need to do, and that's got to be sorted before we can even make a decision about that.” We need to say “Right, leave it with me. I'll go fix that thing”' (Brooklyn transcript).

All the relevant people had daily meetings together. Called 'stand-up meetings', these sessions were designed to talk through the plan, what progress had been made the previous day, what still needed to happen and what issues had arisen that needed to be addressed (Langford transcript). Not able to accommodate all 300 people working in Project Heron, the meetings were carefully structured. Led by Brooklyn, the meeting would start with a person leading a particular area and then anybody could then contribute to the discussion (Goncalves transcript).

Just how valuable the regular meetings were to building the team is summed up by Dr Sonia Goncalves who prior to COVID was the lead of end-to-end genomic surveillance activities within the Parasites and Microbes Programme at the Sanger Institute. She argues, 'If there was something that made this project successful, it was that everybody knew what we were there to do. There was a common understanding of what was the end goal and what we were being asked to do. Everybody knew that the little thing that they were doing was important to the whole big scheme of things. It was also very clear that everybody knew that whatever happens here impacts downstream, or, in other places in the process. So I think that common understanding, that sense of purpose, was there, and we were all there as a unit, united in doing that. It was fantastic, because we brought people from so many different teams. We had people from procurement, finance, stores, logistics, lab operations, automation and all the LIMS, IT. It was the first time I think, at least in Sanger, that we brought all of these different skills together and said, “Okay, this is the thing, this is the task. How are we going to do it as a team?” That worked really, really well' (Goncalves transcript).

Figure 8.4: Photograph of Tanya Brooklyn chairing a Heron Project meeting via Zoom, Credit: Dan Ross/ Sanger Institute. Brooklyn. Interested in science at an early age, Brooklyn was unable to pursue it as a career because she had had to give up doing biology O'level because she could not cope with the microscope work which says was a big disappointment. Not being a scientist meant that she spent a lot of time on Project Heron looking up words to understand all the terminology. From her perspective this had both 'pluses and minuses'. As she says 'Yes, I had to do extra homework to understand something, but it also gave me the ability to go “I really don't understand what you're saying. What does that mean in plain language?"'(Brooklyn transcript).

Figure 8.5: Photograph of Tanya Brooklyn working alone in the Sanger Institute. Credit Dan Ross/Wellcome Sanger Institute. One of the advantages Brooklyn remembers when she started working on Project Heron was that everyone had to leave the site because of the lockdown, 'except for people working on COVID'. She says that 'in some respects it was fortunate, because we didn't really have worries that other people were having around social distancing and the various procedures that they had to adopt to deal with that. Yes, we had very strict social distancing, hand sanitation, one-way systems, etc. Whilst those were implemented, because we didn't have the volume of staff on-site, it actually didn't make life too difficult. By way of example, my desk is in an area that's quite open-plan for 32 people. I was the only person there… So it felt safe. I was able to work. I used to email my colleagues and say, 'Sorry, I've spread out all over your desk.' [laughs] So I was just able to consume as much area as I wanted to work in, and to map things out' (Brooklyn transcript).

One of the key challenges the Sanger Institute team faced at the start was how to build an end-to-end pipeline for processing the COVID-19 samples. This is captured by Dr Ewan Harrison, an early core member of COG-UK's management team as well as a postdoctoral researcher at Sanger. Present at one of the early whiteboard planning sessions, he had a first hand view of the early days. As he says the Sanger Institute had 'lots of process pipelines, but this was different in the sense that they're very used to getting a sample, something needs to be done with that sample, or here's a bunch of samples, something needs to be done with this bunch of samples. But what happens before that hasn't really been their expertise, so they have had to stand up all that bit. So it's the arrival en masse, having to pick the samples, as well as the scale. They were already doing a lot of samples of different stuff, but they weren't doing 60,000 of the same sample every week. It was about standing up all those parallel processes and so that you've got redundancy in the system so it can't all crash down and stuff like that' (Harrison Jermy transcript).

Figure 8.6: Photograph of Dr Sonia Goncalves, Head of Service Delivery in Genomic Surveillance Unit at the Sanger Institute. Credit: Dan Ross / Wellcome Sanger Institute. Goncalves remembers being called in with John Silltoe to discuss the work for COG-UK in the last week before the Institute was about to close down. She says that the reason they were invited to join the meeting was because they had experience of running a genomic surveillance service for malaria so could bring in 'the whole end-to-end perspective'. She was tasked with the front end of the process, organising relationships for the COVID-19 work (Goncalves transcript).

Figure 8.7: Photograph of Dr Ian Johnston surrounded by planning notes on a whiteboard in his office, Credit: Dan Ross / Wellcome Sanger Institute. Johnston studied genetics at university and then worked for 14 years in forensic DNA profiling. He heard about the setting up of the Sanger Institute to work on the Human Genome Project soon after he left university which he put on his 'bucket list to work there one day.' He says 'I was fortunate enough to get that opportunity.' Heading a team of more than 80 staff, his group has expertise in sample extraction, quantification and preparation, library preparation, different sequencing platforms, data analysis and quality assurance (Johnston transcript).

The Project Heron team was broadly organised in two halves. The first focused on 'getting the samples through the sequencing machines, and all the technology associated with that. It was led by Langford and Ian Johnston, the Head of Sequencing Operations. They worked closely with Nelson, who oversaw a group of people working to make sure the samples were safe to handle and appropriately processed for sequencing (Nelson transcript). The second 'involved liaising with the people collecting the samples, setting up the logistics operations'. Two of the key people involved in this work were Goncalves, and Dr Cristina Ariana who dropped her work on malaria genomics to head up the Genomic Surveillance Operations for the COVID-19 effort. They managed the 'relationships and logistics with all the testing labs' and 'the relationship with the Department of Health and Social Care'. Another crucial part of the operation involved software registration and building a data pipeline to make sure the sequencing information could be integrated with the metadata, which was overseen by John Sillitoe with Dr Rob Amato, both of whom worked closely with Kwiatkowski on the malaria programme (Kwiatkowski transcript).

Figure 8.8: Photograph of Dr Roberto Amato. Credit: Dan Ross / Wellcome Sanger Institute. Head of Data Analysis and Translation at the Sanger Institute, Amato, took a key role in making sure metadata collected from hospitals was integrated with the sequencing results. This data, which included when and where the virus was collected, helped the public health bodies understand the transmission paths of the SARS-CoV-2 virus and monitor and manage outbreaks (Midgley).

Langford and Johnston's team managed to work out a suitable pipeline for sequencing samples within just a matter of days. But before they could start using it they needed to check that the sequencing actually worked. To do this they needed access to some samples. This turned out to be much more of a challenge because Johnston recalls all the PHE laboratories at that time 'were under so much pressure' (Johnston transcript). Some of the difficulties are captured by Ewan Harrison who spent about three or four days trying to get access to some samples from the PHE lab located within Addenbrooke's Hospital which was down the road in Cambridge. He remembers that it required 'a bit of cajolery because people didn't want to release samples without the right permissions in place, and (I think) because at that time, people were a lot more worried about the handling of the virus in the lab than they are now' (Harrison Jermy transcript). In the end, the situation was solved with the help of Sharon Peacock who, because of her PHE connections, could approve their release. Johnston remembers heading over to Addenbrooke's around 16th March 2020 and at the end of the meeting they 'returned with the first set of around 90 samples'. Received from the entire region that the Cambridge PHE laboratory serves, all of the samples were known to be positive for COVID-19 (Johnston transcript).

Figure 8.9: Photograph of John Sillitoe. Credit: Wellcome Sanger Institute. An engineer by background, Sillitoe first joined the Sanger Institute in 2018 as Head of Surveillance Operations to help support the Malaria Genomic Epidemiology Network. He was invited to help with the COG-UK initiative due to his expertise in translating research knowledge and understanding into a large-scale operation. Since 2022 he has been the Director of the Genomic Surveillance Unit at the Sanger.

First sequenced on 18th March 2020 using a very manual process, the samples were relatively straightforward to work with. Supplied in tubes, the samples came as RNA extracts so were safe to handle because they did not contain any live virus. Nonetheless, a lot rested on the samples. Nelson says she 'volunteered to process the first lot of samples because we all felt quite a lot of pressure around all of this. It was not just because it was COVID, but because it all felt so important. Our work is always important, but never before has it had such social meaning, I guess. So I volunteered to process the first batch of samples just to break the ice, and calm things' (Nelson transcript).

But the team realised that other samples would be more challenging going forward because as well as RNA extracts they expected to process primary patient samples supplied from hospitals which could contain the live virus. They also anticipated receiving lysates, which consisted of swab samples taken at testing centres put in a tube with a lysis buffer to destroy the membrane of the virus which in theory rendered it inactive. In both cases, these samples potentially posed more of a safety risk than RNA extracts to lab workers so required a deactivation step before they could be sequenced at scale. The issue kept Nelson 'awake on many occasions' because, as head of the lab she had the 'responsibility to keep everyone within that team safe' (Nelson transcript).

Harrison remembers the Sanger Institute spent huge amounts of time thinking through how they would handle such samples because 'guidance around the health and safety of sampling handling changed' in that period. He says 'They had to clear out all the Category 3 labs (CL3) so that they could actually take primary samples, because they were imagining they were going to have to take hundreds of thousands of primary samples. They had to build all this pipeline, they had to stand up all the processes to inactivate the samples. There were massive shortages of all the extraction kits, so they bought this huge amount of this chemical that would inactivate the virus so that they could do a homebrew extraction' (Harrison Jermy transcript).

As part of this process, Nelson drew up a biological risk assessment along with health and safety to make sure everything was above board. She also invited a virologist from Cambridge University to look over the risk assessment. Taking him on a tour of the laboratory, she was greatly reassured when he told her 'You guys are going OTT [over the top], but better to do that than not'. The risk assessment also went through a biological safety committee for approval (Nelson transcript).

Figure 8.10: Photographs taken outside the CL3 labs. Credit: Dan Ross/Wellcome Sanger Institute.

Once the risk assessment was complete, Nelson's team did a number of mock runs in the CL3 to make sure they knew what to do when primary patient samples arrived. Nelson vividly remembers when the first real primary samples were processed. She says, 'we were all standing outside the CL3 lab. The lab has got a big glass window in it, and we all just sat watching as those first samples got unwrapped' (Nelson transcript).

Pillar 2 samples come on board

Having got everything in place, in the end Nelson's team did not process many primary samples because on 23rd March 2020, when lockdown officially started, the government announced it was going to set up the Lighthouse Labs for conducting tests in the community. Immediately he heard the news, Johnston realised this could be a game-changer because he and his colleagues were then wrestling with the complexity of how to handle different sample types. Importantly the new testing service would be a way to access large quantities of RNA pre extracted from positive COVID-19 samples. After hearing the news he remembers knocking on Langford's door and saying, ''There's a Lighthouse lab being set up. I think this solves our headache. If we can get those extracts, we'll be able to do tens of thousands'(Johnston transcript).

Figure 8.11: Tweet from Dr Tony Cox announcing the setting up of the Lighthouse Lab in Milton Keynes on 11 April 2020.

One of the attractions of the samples from the Lighthouse lab was that it meant that they would no longer need to deactivate the virus in primary samples or perform extractions and all the supplies needed for such work. Johnston also knew that the extracts would be in 96-well plates, a format that he knew 'was perfect to go straight into one of our pre-existing processes and scale'. Another advantage with the Lighthouse Lab initiative was that its first centre in Milton Keynes was being headed up by Dr Tony Cox, who had previously headed up a team at the Sanger Institute working on DNA pipeline development. Encouraged by Langford, Johnston immediately contacted Cox and together they got the wheels in motion which they did in tandem with help from Catherine Ludden and others at COG-UK (Johnston transcript).

It quickly became clear that the arrangement could also be beneficial to the Lighthouse labs. Instead of asking for samples, the Sanger Institute and COG-UK management team realised that it would be better to offer to take the waste. Ludden points out that this meant the Lighthouse did not 'need to dispose of their plates because Sanger was disposing of them after sequencing, which is expensive'. But, as she explains, 'it did mean Sanger needed to get storage in place to take all these plates, and find a way to cherry-pick the samples because there were multiple samples per plate' (Ludden Blane transcript).

Storage challenges

Undertaking to take the waste product from the Lighthouse Labs posed a major storage challenge to the Sanger. Adding to the complexity they needed to be able to provide storage at different temperatures. As Brooklyn explains, 'depending on what stage you're at, from samples arriving in refrigerated transport, they are going into the lab and into another fridge, or they have got to be defrosted, or they are going straight into a minus-20 reefer [chilled container] because we haven't got all the data associated with those samples yet so we can't process them. So there was a lot around just the sheer logistics of getting them to site, designing and acquiring the right crates for the samples to go in so we weren't getting spillage and they were okay whilst they were being transported' (Brooklyn transcript).

Figure 8.12: Photograph of Nicholas Hough going into one of the reefers in the Sanger Institute's car park. Credit: Dan Ross/Wellcome Sanger Institute.

Figure 8.13: Photograph of Sanger Institute staff going into a chilled reefer. Credit: Dan Ross, Wellcome Sanger Institute. In the early days, many of the samples the Sanger Institute received were 'loaded quite haphazardly into boxes'. Just how much work this caused is recalled by Kwiatkowski. 'We would get large crates delivered with lots of stacks of plates. And if we said 'find a sample', and it was in such and such a box, someone had to go in there and manually pick up that freight, unload all the plates, sort through them and find the sample in a refrigerated container. It was physically hard work and very taxing on the people that had to do it' (Kwiatkowski transcript).

Figure 8.14: Dr Petra Korlevic scanning in samples. Credit: Dan Ross /Wellcome Sanger Institute. Coming from the Parasites and Microbes Department, Korlevic had a lot of experience of working on handling samples for the malaria programme. She volunteered to help manage the samples for the Heron Project after receiving an email towards the end of March. The team handling samples had their first meeting in the car park in the last week of May 2020 and actually started handling samples in the middle of June 2020. She remembers at the start there 'was a lot of back and forth because there were millions of samples so no-one really knew how to handle this.' Korlevic continued volunteering until October 2020 when the Sanger Institute opened up its campus for people to go back to their work (Korlevic transcript).

Figure 8.15: Photograph of boxes of samples loaded in a van. Credit: Dan Ross/Wellcome Sanger Institute. Korlevic recalls that she and her team often borrowed a little van owned by campus to shift the samples. They became very adept at knowing how many boxes could be packed in. Once this was done, she says ,'Sometimes one person was in the van holding everything so that it didn't move too much and then you drove from the Ogilvie building all the way to the car park, which took a whole two minutes. We would just stack everything. We had some fun times. I have some fun videos of October rain showers with us trying to move this -20 box into a -20 reefer and everything's cold and then you get some water on your shoes and you're just like scuttling around freezing.' Because it was quite physical work, 'on purpose they gave us really short shifts. It was maybe a maximum of two hours twice per week and then there was a second shift that also did two hours twice per week' (Korlevic transcript).

Needing to be resolved very fast before the first samples arrived, Brooklyn and the facilities team installed five reefers, large containers with chilled freezers, on site. These were set up in a separate car park in a secure location at the end of the Sanger Institute's campus. Having three elements of refrigeration the reefers could store up to about 9 million samples. Even with this Brooklyn was always concerned about storage. As she says, 'I remember many, many times in stand-ups going, “We could run out of space” [laughs]. Along with the storage, the team also had to devise a process for the destruction of the samples which had to be done in accordance with ethics guidelines' (Brooklyn transcript). The operation also required building a 'team from scratch' with 'sample managers and porters to handle the on-site logistics' whose job is to 'receive samples', 'check them in', making 'sure that the barcodes all add up, and then deliver them to the various different labs.' Such staff not only helped handle daily shipments coming in with samples, but also shipments of waste because as Langofrd says 'there were lots of samples that generated lots of waste plastic ware' (Langford transcript).


Originally set up to sequence the Human Genome, the Sanger Institute now has one of the largest DNA sequencing facilities in the world. Since it was founded, the Institute has sequenced hundreds of thousands of human genomes as well as bacteria and other pathogens, thousands of other species, and millions of cells. The Sanger Institute has helped drive large-scale genomic science and has harnessed the potential offered by robotics and automation for much of this work. The reason for this is explained by Ariani. She points out that 'When you are doing a low throughput process, doing it manually is totally fine, and is actually very accurate and very good. But as soon as you start ramping up, as soon as you start becoming a high throughput place, doing things manually is a recipe for things potentially going wrong. So we've always avoided doing anything manually'(Ariani transcript).

Figure 8.16: Photograph of Dr Cristina Ariani, Credit: Wellcome Sanger Institute. Ariana joined the Sanger Institute in 2015 as a postdoctoral fellow after doing a doctorate in genetics at the University of Cambridge. She was appointed the lead of Genomics Surveillance Operations in 2020. Prior to COVID-19, Ariani says she was very focused on scientific research, but once she became involved with the Heron Project her work became much more operational than scientific. Ariana believes that her scientific background really helped her understand how to improve the process for sequencing of the SARS-CoV-2 samples and communicate the results with the government (Ariani transcript).

Figure 8.17: Photograph of Rich Livett, the Senior Scientific Manager of Laboratory Information Management systems within Sanger Institute. Credit: Dan Ross/Wellcome Sanger Institute. His team focuses on building software pipelines for high throughput sequencing for different projects within the Sanger Institute. He says the things that were different for them with COVID was that they just stayed with the Parasites and Microbes group building pipelines for them. They also had to work remotely which meant they had to rely on scientists sending them videos to show how the work was done in the laboratory so it could be translated into the software process. Since 2021 he has been Lead Scrum master in the Genomic Surveillance Unit (Livett transcript).

Given this situation, the Sanger Institute was therefore able to implement the same for the Heron Project. But here the volume of samples it needed to handle was on a totally new scale. One of the challenges Kwiatkowski remembers at the start was the Institute's robots 'did not work for a while' and then when they tried to get new ones the 'tips weren't quite right so that had to be re-engineered' (Kwiatkowski transcript). After the first couple of months two robotic machines were installed capable of processing about 20,000 samples a week. Six months later they purchased more robotic equipment which increased the capacity to 64,000 samples a week (Livett transcript).

Automation was not only dependent on getting the right equipment. Faced with a deluge of 96 well plates from the Lighthouse Labs containing a mixture of both positive and negative samples, the Rich Livett and his informatics team had to write software together with Lesley Shirley, the head of automation, and the R&D team to tell the robots to pick out only the positives and place them on to a different plate. This took about six weeks to build. Describing the robots in operation Livett says they are quite mesmerising to watch'. He explains, 'the plates, which are a fairly flat plastic thing and have sample wells in them, get moved on by different sorts of robots. When they get moved on it's scheduled in a way that there's almost a dance to it, that things come on and come off in a very fluid way' (Livett transcript).

Video of robot cherry picking samples. Credit: Wellcome Sanger Institute.

Just how much the robots improved the process can be seen from the interview with Shirley. She recalls that when the Sanger Institute first started to get the 96-well plates from the Lighthouse Labs, 'we would have to generate a manifesto manually via our admin team to tell individuals in the lab which plates they need to put on the deck and which ones need to be picked. They would physically have to go through boxes and find all those individual plates, load those onto a robot after spinning them all down in the centrifuge and take the seals off.' But, as she points out 'as we scaled up, we physically couldn't do that anymore.' So she and her team helped to design two automation procedures that made it possible 'to simply take these plates out of the crates they were supplied in, load them straight into a plate instrument that supplies the plates onto a robot system. That robot would also de-seal those plates, read the barcode that was on that plate, and we integrated the data upload as well. Everything that was stored on that plate was sent across to us from the Lighthouse Labs, what we call a plate map. We upload that information into the robot so the robot knows exactly what's on that plate. We don't have to generate any manual manifesto to tell it what to do; as soon as it reads that barcode it knows. It essentially takes the seal off the plate itself, takes the samples out, reseals it and puts it into a waste.' The advantage of the automation was it 'freed up processing time' and the 'manual data upload.' This was enormously important because 'it also prevented repetitive strain injury' (Shirley transcript).

Building the end-to-end sequencing process pipeline

One of the keys to getting moving was also establishing an end-to-end pipeline to process the samples from documenting when they arrived through to them being sequenced and then the results being analysed. Comparing the pipeline to 'a factory process', Livett says it had the advantage that 'once it's running, that's it, you just turn the handle on it. It doesn't matter if you do one or 1,000 samples' (Livett transcript). Fortunately, the process for COVID was not very complicated and did not need a long pipeline. Based on the pre-existing pipelines the Sanger Institute already had, the team decided to tweak one developed for single cell RNA sequencing used for the Human Cell Atlas. It had the advantage that it had high throughput capacity and was also a 'pretty simple process' (Johnston transcript). The pipeline required putting in a set of processes at the front end to manage all the logistics and register the samples coming in from different testing sites. Sillitoe and Amato helped design this based on the delivery systems they built for malaria (Kwiatkowski transcript).

A lot of thought also went into thinking how the process of sequencing would work. One of those involved in the process was Dr Naomi Park. She recalls that, 'At the beginning of March 2020 when we were asked, “How are you going to sequence it?” the R&D team all got together round a white board and we mapped out all the different stages of how we saw this process working. We assigned different people to the different stages so who would have responsibility for each thing and this aligned with our prior experience. We made sure we had two people assigned to every part because of course this was in a pandemic, so we were also very acutely aware that at any point someone might get sick or someone's family might get sick or they might get a fever and they can't come in. So we had to make sure that there wasn't a single point of failure' (Parks transcript).

Figure 8.18: Photograph of whiteboard planning undertaken by the R&D team. Credit: Aigrain/Wellcome Sanger Institute.

Another person deeply embedded in the development of the end-to-end pipeline was Dr Louise Aigrain who prior to the pandemic had spent many years developing new pipelines and implementing 'them into the DNA pipelines where the staff runs samples day in and day out'. She says normally her job involved developing a pipeline, monitoring it and then optimising it. In normal circumstances she remembers her team had 'time to put things on hold, to check if the data is good before we proceed' but points out 'that was not an option at all' with COVID. In the new context 'We had to optimise the pipeline while it was running, and if something was going wrong, we needed to fix it immediately, because the issue would accumulate immediately. With 20,000 samples a week, if, for example, one in every 10 plates has an issue, the number of samples affected becomes really big very quickly. So that was a first, and at the same time exciting…. But … you can't just do that forever, because everything is due kind of yesterday. It really was a massive throughput and a fast turnaround time, with no option of putting it on hold. Those were really new things for us' (Aigrain transcript).

Figure 8.19: Photograph of Dr Naomi Parks, Senior Staff Scientist in DNA pipelines R&D, Sanger Institute. Credit Dan Ross/ Wellcome Sanger Institute. Parks got involved in the early planning for the Heron Project due to her expertise in multiplex PCR, which she first worked on for her doctorate. Having worked at the Sanger Institute for fifteen years she says she also 'had a really good comprehension of what was needed and understanding to be able to pass on that information. Basically that type of communication was really critical particularly for this project because it happened so fast. Lots of people missed bits of information, so to have people that could fill in the dots and take time to ask questions and make sure people were on the right track, I think was really helpful. I did anything I could do to help at that time' (Parks transcript).

Figure 8.20: Photograph of Dr Louise Aigrain, formerly senior staff scientist working in DNA Pipeline Research and Development group at the Sanger Institute. Credit: Wellcome Sanger Institute. Prior to the pandemic her main project had been working on the UK Biobank Sequencing Project, which involves sequencing the genome of all participants. Being a very large project, Aigrain says this prepared her quite well for the COG-UK project. Although she points out 'we thought we were doing things fast for the UK Biobank, but it was in months rather than in weeks or days, so obviously, it was really nothing comparable, in terms of number of samples and how fast it had to go' (Aigrain transcript).

Overall the end-to-end pipeline captures the process all the way from the receipt of a viral RNA sample to getting validated data at the end (Figure 8.21). To start with, positive samples needed to be separated from negative samples and put into 96-well plates. Some of this work can be automated, but it still requires some work by hand. Before being sequenced the viral RNA has to be converted into DNA, which is done using a method known as reverse transcription and then amplified using PCR. After this the samples go through library preparation to prepare the biological material for sequencing (Midgley).

Figure 8.21: Diagram of the key steps of the end-to-end pipeline for sequencing COVID-19 samples at Sanger Institute. Credit: Lara Marks.

It also includes 'a software to track that sample, and all the information about that sample being linked together, so that you know exactly what sequencing data belongs to that sample and which testing lab in the UK that came from' (Aigrain transcript). This software, known as the Laboratory Information System Management, or LIMS, was developed by Livett and his team. Aigrain emphasises that 'they were instrumental as we were, for sure, because the work we do, if you can't track the samples, it's useless'(Aigrain transcript).

As well as closely collaborating with the LIMS team, Aigrain's team also needed to have very close ties with the procurement team. Aigrain explains this was especially important because, due to the pandemic and Brexit they 'found it very difficult to get consumables, the plastic ware [like pipettes and tips] that we use in the lab.' As she says, 'Normally we develop a pipeline using specific consumables, we need them, we know we're going to use them and that we can't change them. But we couldn't find them any more, so we had to constantly change consumables, revalidate the pipelines with those new consumables to make sure that it worked. It constantly added work that would have been unnecessary, but we had to be pragmatic. The procurement team was really heavily involved in finding those new consumables' (Aigrain transcript).

A large part of Aigrain's job was also to build quality assurance and controls into the pipeline which is essential for checking nothing has gone wrong like cross contamination between samples which as she says is 'the nightmare of the lab scientist'. To help this process, a positive and negative control gets added to each plate of 96 samples. In this case, the negative control 'is just water', and the positive control 'a specific fragment of COVID RNA'. Aigrain had to work out how they were going to monitor that process. This meant she had to consider 'how we were going to treat them, how we were going to see that a sample plate failed because we see some contamination, or a plate failed because a positive control is not as good, so something has gone wrong. A big part of my job was setting all those thresholds and testing them and making sure, with the bioinformaticians, that they were realistic' (Aigrain transcript).

Figure 8.22: Photographs of the sequencing operation team members. 1) left to right: Irfaan Mamun, Marcella Ferrero, Howerd Fordham, Tristram Bellerby, Shaun Wright; 2) Joe Dawson; 3) Lesley Shirley and Naomi Park; 4) Mia Williams; 5) Catarina Caetano (Midgley). Credit: Dan Ros /Wellcome Sanger Institute.

Another factor that Aigrain had to take into consideration was that they 'really wanted to save as many samples as possible.' She points out, 'Normally, when we see any doubt of contamination, we don't take it. Or any doubt that maybe we don't have the yield, or, if they don't work as well as it should have been, they go in the bin. But in this case we were very aware that any single sample could contain interesting information. We were not initially calling them Variants of Interest, but we had that idea that they could be interesting, and that each sample could be of interest.' For this reason they 'tried to find thresholds in order to save as much data as we could, rather than bin data just to be very cautious.' Aigrain spent a lot of time working 'with the bioinformatics side of things to see what would make sense without at all impacting the quality of the data, but allowing us to keep as much as we could' (Aigrain transcript).

Streamlining the process

From the start of Project Heron the Sanger Institute team made a commitment to continually look at every step of the process to make improvements. Brooklyn points out this was a 'collective group effort' where no 'stones were unturned'. She says it was 'making sure that throughout the end-to-end process there was constant assessment of what we were doing, why we were doing it, and making sure that all of the activity, whilst maintaining the integrity of the process, that it added value' (Brooklyn transcript). A lot of work went into trying to shorten the turnaround time, improve the sample quality and decrease failure rates. This was important because, as Ariani points out, 'the information was at its most value the quicker it was out there for decision-makers to use and for the scientific community as well to use' (Aigrain transcript; Goncalves transcript).

The streamlining effort was led by Lee Walker, a contractor with experience in continuous improvement. Originally called in to help with the malaria project six months before the pandemic, Walker engaged with all the teams to look very carefully at each of the stages to see where there was waste in terms of time and materials (Brooklyn transcript; Aigrain transcript; Kwiatkowski transcript). The process was greatly helped by the LIMS because, as Livett, points out, it meant 'every single event has got a date stamp on it' so they had 'a lot of signposting to where there is waste in the process' (Livett transcript).

At the same time as this was happening, Parks spent a lot of time looking at ways to optimise the ARTIC multiplex PCR method which was a vital component to preparing the samples for sequencing. While the method had the advantage it could be just slotted in with the Sanger Institute's existing processes, the downside Parks says was that it was 'quite lengthy in terms of the number of steps. After we'd done that PCR we needed to clean it up, quantify it. There's a number of other enzymatic steps – end-repair, ligation, then another PCR - and each one of those requires separate automation, separate quality control (QC) checks to check if this looks all right or not. So because of that it required a lot of different bits of kit which also meant that there was a lot more opportunity for anything to go wrong. It can also become quite laborious for staff if they're running these things day in, day out' (Parks transcript).

Convinced she could find a more streamlined approach, Parks spent a number of months experimenting in the laboratory to get a tailed primer method working. Already using tailed primers for other processes in the Sanger Institute, she thought she could just copy that. But to her dismay it did not work. She recalls 'That was a massive challenge because I'd already said to everyone I'd get this working. Those in the lab having to run the early process were like, 'Brilliant. When are you going to get it working because this is not a lot of fun?' And I found it didn't work. The reason for that was because the viral template copy number can be very variable and in some cases very low. By using these tailed primers right in at the beginning it didn't get the same sensitivity. We had a dropout with some of the amplicons which fundamentally meant parts of the genome were not covered. If it drops out it's just not acceptable'(Parks transcript).

Figure 8.23: Tweet put out by Dr Naomi Parks on 4 November announcing the new tailed primer method.

Parks eventually cracked the problem with the help of Scott Goodwin, at the Sanger Institute, and Joshua Quick, at the University of Birmingham. who were also thinking about how to optimise the PCR process with tailed primers. The final piece of the puzzle fell into place when she hit upon putting the 'tailed primers in the second PCR step' (Parks transcript). To her relief it worked. She says 'I will never forget the moment I saw the data the next day and that I had been successful. The realisation of what that meant for our operations – I haven't won the lottery but I imagine it's a similar feeling' (Mobley). Importantly it massively streamlined the process (Parks transcript).

SARS-CoV-2 genomes sequenced at Sanger Institute

In the end the constant refining of the process helped the Sanger Institute team build a well-oiled operation. Just how much progress they made can be judged by the fact that they managed to reduce the turnaround time from 12 to 14 days right down to three days. This is the time it takes from receiving a positive sample from the Lighthouse labs to uploading the result on to the central CLIMB database. Langford points out this was achieved through 'stepwise increments. 'We started making really big gains towards the late summer and autumn time of the first year.' For her the game-changer was getting the Lighthouse Labs involved which enabled 'us to form a national logistics set-up, with samples flowing, originally it was a couple of times a week, now it's daily, into a huge storage centre' (Langford transcript).

How big the operation became can be seen from the fact that the Sanger Institute team handled just over 26 million samples in under two years (Brooklyn transcript). The high throughput operation meant the Sanger Institute soon became the largest contributor to the COG-UK effort (Figure 8.24). Its capacity was further strengthened by funding from the UKHSA in the summer of 2021 which allowed it to open a dedicated laboratory for sequencing the SARS-CoV-2 virus. This was a 7-days per week operation with more than 300 staff working rotas. As of 1 October 2021 the Sanger Institute had contributed 788,538 SARS-CoV-2 virus genomes to the total 1,154,347 sequenced by COG-UK, which was 68%. Overall, in 2021 the Sanger Institute contributed to approximately 20% of the world's publicly available SARS-CoV-2 genome sequences (Ariani; Wellcome Sanger Institute 2021-22).

Established to provide a separate sequencing service from the blue sky research undertaken at the Sanger, the dedicated laboratory continues to sequence COVID-19 positive samples with funding from the UKHSA. This is done as part of the Genomic Surveillance Unit, which aims to globally support partners to monitor the ever-changing evolution of pathogenic microbes and their vectors so that they can develop strategies to curb and eliminate the spread of disease. Directed by John Sillitoe, the Unit's mission is to build on the research and surveillance work undertaken in both malaria and COVID-19 (Wellcome Sanger Institute Genomic).

Figure 8.24: Number of SARS-CoV-2 genomes sequenced and analysed by COG-UK centres as of 11th Aug 2020, Source: COG-UK Report #10, Figure 1.


Ariani, C (14 Oct 2021) Presentation: COG-UK UCL, COG-UK Together Event.Back

COG-UK (11 Aug 2020) COG-UK Report #10.Back

Midgley, M (22 Oct 2022) 'Sequencing COVID-19 at the Sanger Institute', Wellcome Sanger Institute Blog.Back

Mobley, E (8 March 2022) 'Rising in Stem with Naomi Parks', Wellcome Sanger Institute Blog.Back

Nelson, R (18 May 2021) 'Car manufacturing, cellular biology and COVID', Wellcome Sanger Institute Blog.Back

Wellcome Sanger Institute (n.d.) Our History.Back

Wellcome Sanger Institute (2021-2022) Wellcome Sanger Institute Annual Highlights 2021-2021.Back

Wellcome Sanger Institute (n.d.) Genomic Surveillance UnitBack

Interview transcripts

Note: The position listed by the people below is the one that they held when interviewed and may have subsequently changed.

Interview with Dr Cristina Ariani, Lead Genomic Surveillance Operations, Wellcome Sanger InstituteBack

Interview with Dr Louise Aigrain, Former head of Research Operations, Wellcome Sanger Institute and now part of the MRC Epidemiology Unit at Addenbrooke’s Hospital.Back

Interview with Tanya Brooklyn, Genomics Surveillance Implementation Manager, Wellcome Sanger Institute.Back

Interview with Tim Cutts, Formerly Head of Scientific Computing, Wellcome Sanger Institute.Back

Interview with Dr Sonia Goncalves, Head of Service Delivery, Genomic Surveillance in the Genomic Surveillance Unit, Wellcome Sanger Institute.Back

Interview with Ewan Harrison (Deputy Director COG-UK and UKRI Innovation Fellow, Wellcome Sanger Institute, Senior Research Associate, Department of Medicine, University of Cambridge) and Dr Andrew Jermy (External Communications Advisor COG-UK).Back

Interview with Dr Ian Johnston, Head Of Sequencing Operations & R&D, Wellcome Sanger Institute.Back

Interview with Dr Petra Korlevic, Research Fellow, Wellcome Sanger Institute.Back

Interview with Professor Dominic Kwiatkowski, Head of Parasites and Microbes Programme at the Wellcome Sanger Institute in Cambridge and Professor of Genomics at University of Oxford.Back

Interview with Dr Cordelia Langford, Director of Scientific Operations, Wellcome Sanger Institute.Back

Interview with Rich Livett, Senior Scientific Manager of LIMS for core informatics, Wellcome Sanger Institute.Back

Interview with Dr Catherine Ludden, Director of Operations, COG-UK and Beth Blane, Logistics Manager for COG-UK, Research Assistant in the Department of Medicine, University of Cambridge.Back

Interview with Dr Rachel Nelson, Head of CGaP, Cellular Generation & Phenotyping Core Facility, Wellcome Sanger Institute.Back

Interview with Dr Naomi Parks, Senior Staff Scientist in DNA pipelines R&D, Wellcome Trust Sanger Institute.Back

Interview with Lesley Shirley, Head of Automation, Wellcome Sanger Institute.Back

Interview with Dr John Sillitoe, Head of Surveillance Operations, Wellcome Sanger Institute.Back

Respond to or comment on this page on our feeds on Facebook, Instagram, Mastodon or Twitter.