Biology and antigens
SARS-CoV-2 is an enveloped positive-stranded RNA virus in the family Coronaviridae. There are more than 35 known coronaviruses (6) that infect various mammals, including bats, pigs, cattle, chickens, dogs and cats (7). Seven of these viruses can currently infect humans. The virus family is further divided into four genera: alpha, beta, gamma and delta coronavirus. SARS-CoV-2 is a beta-coronavirus, as are two other feared human-pathogenic coronaviruses with epidemic potential, namely SARS-CoV-1 and MERS-CoV (Middle-East respiratory syndrome coronavirus).
In 2003, SARS-CoV-1 caused a major global outbreak that spread rapidly from the Guangdong province in China to at least 26 different countries, resulting in 8 096 infections and 774 deaths (8). MERS-CoV has to date mainly caused outbreaks limited to the Arabian Peninsula, with more than 2 000 confirmed cases and over 700 deaths (9). Additionally, four coronaviruses circulate in the population and give rise to colds and respiratory tract infections of varying severities, but rarely cause serious illness. These viruses, two of which are beta-coronaviruses (HCoV-HKU1 and HCoV-OC43) and two of which are alpha-coronaviruses (HCoV-229E and HCoV-NL63), are thought to cause 15–30 % of all respiratory tract infections (7).
The genome of SARS-CoV-2 consists of approximately 30 000 nucleotides and encodes four structural proteins referred to as spike protein (S), membrane protein (M), nucleocapsid (N) and envelope protein (E) (10) (Figure 1). The genome also encodes 23 non-structural proteins, including an RNA polymerase (10, 11). S proteins characterise all coronaviruses: they cover the surface of the virus and resemble a corona on electron microscopy, hence the family name Coronaviridae. The S protein of SARS-CoV-2 contains a domain that interacts with the human receptor angiotensin-converting enzyme 2 (ACE2), which facilitates uptake of the virus into the body's cells (12). SARS-CoV-1 and the common cold virus HcoV-NL63 also use this uptake mechanism, but the other coronaviruses do not (2, 13). ACE2 is expressed on epithelial and endothelial cells, including lung tissue, intestines, kidneys and heart (14).
The S protein is the primary antigenic target in the development of vaccines against SARS-CoV-2 (Figure 2), the aim being to generate an immune response that prevents the virus from interacting with ACE2. Specifically, the vaccine molecule should mimic epitopes in the area of the S protein that is in direct contact with the human receptor, the receptor binding domain (RBD). It is now well established that an immune response to the S protein is generated as a result of infection with SARS-CoV-2, and in vitro studies with convalescent sera have shown that anti-S antibodies are able to neutralise the virus in cell culture and prevent its interaction with ACE2.
The three other structural proteins (M, N and E) are not currently considered as important in the context of a vaccine, and the relevance of antibodies against these antigens is unclear. The N protein is not a surface protein, but is associated with viral RNA within the membrane that surrounds the virus. However, antibodies against the N protein are also formed during infection with SARS-CoV-2 (15, 16), and previous development of vaccines against SARS-CoV-1 has shown that the N protein can generate T cell responses that confer immunological memory (17). The inclusion of T cell epitopes located on the N protein may therefore be important for achieving immunity to COVID-19.