We know less about viruses than any other lifeform. Fortunately, metagenomics has led to a massive expansion in the known diversity of the virosphere. Here, we discuss how metagenomics has changed our understanding of RNA viruses and present some of the remaining challenges, including characterization of the “dark matter” of divergent viral genomes. We know less about viruses than any other lifeform. Fortunately, metagenomics has led to a massive expansion in the known diversity of the virosphere. Here, we discuss how metagenomics has changed our understanding of RNA viruses and present some of the remaining challenges, including characterization of the “dark matter” of divergent viral genomes. Viruses are the most abundant source of genetic material on Earth, likely infecting all cellular organisms and even other viruses. They are ubiquitous in all environments and can account for a substantial proportion of RNA within hosts. However, since viruses were discovered at the end of the 19th century, they have been mainly studied as agents of disease in humans or economically important animals and plants. Although of undoubted importance, these only represent a tiny proportion of the total universe of viruses—the virosphere—and have led to a biased view of virus diversity and function. In reality, we know less about viruses than any group of organisms, with likely only a fraction of one percent of all viruses characterized (Geoghegan and Holmes, 2017Geoghegan J.L. Holmes E.C. Predicting virus emergence amid evolutionary noise.Open Biol. 2017; 7: 170189Crossref PubMed Scopus (96) Google Scholar). This restricted view of the virosphere has two inter-linked causes. First, most sampling has been directed to a subset of hosts, particularly those that experience overt disease or that act as important reservoirs for viruses that jump species boundaries. Second, virus discovery was traditionally a laborious process, involving complex isolation and sometimes culturing procedures, which meant that characterizing a novel virus necessarily took time. Fortunately, both of these limitations are being overcome with metagenomics, which is rapidly transforming our understanding of the virosphere. Although a variety of metagenomic techniques are available, including those that specifically enrich for viruses, arguably the most informative approach is total transcriptome sequencing, or “meta-transcriptomics” (Shi et al., 2016Shi M. Lin X.-D. Tian J.-H. Chen L.-J. Chen X. Li C.-X. Qin X.-C. Li J. Cao J.-P. Eden J.-S. et al.Redefining the invertebrate RNA virosphere.Nature. 2016; 540: 539-543Crossref PubMed Scopus (872) Google Scholar). This involves the large-scale RNA sequencing of individual tissues, species, or assemblages of species—usually with a depletion of ribosomal (r) RNA—and is able to reveal, in an unbiased manner, the entire viromes (and other micro-organisms) present in a sample. The transcriptome data generated is a rich source of evolutionary, genomic, and functional information and has revealed enormous levels of untapped virus genetic diversity. For example, meta-transcriptomics has revealed remarkable levels of RNA virus diversity in invertebrates (Li et al., 2015Li C.-X. Shi M. Tian J.-H. Lin X.-D. Kang Y.-J. Qin X.-C. Chen L.-J. Xu J. Holmes E.C. Zhang Y.-Z. Unprecedented RNA virus diversity in arthropods reveals the ancestry of negative-sense RNA viruses.eLife. 2015; 4: e05378Crossref Scopus (481) Google Scholar, Shi et al., 2016Shi M. Lin X.-D. Tian J.-H. Chen L.-J. Chen X. Li C.-X. Qin X.-C. Li J. Cao J.-P. Eden J.-S. et al.Redefining the invertebrate RNA virosphere.Nature. 2016; 540: 539-543Crossref PubMed Scopus (872) Google Scholar), with new viral taxa (species, genera, families, and orders) identified on a regular basis. These newly described RNA viruses have also helped to fill some of the evolutionary gaps among known families or orders and have shown that the RNA virosphere is very different from what we thought based on culturable or disease-causing agents (Figure 1). Although there were glimpses of this new virosphere before metagenomics, such as the discovery of giant DNA viruses from eukaryotes that blurred the formerly clearly defined line between viral and cellular life, the metagenomics transformation has been truly dramatic. Metagenomics is heralding a new era in virology, in which virus discovery is primarily performed with genomic technology and full phenotypic characterization reserved for that small subset of special interest (Shi et al., 2018Shi M. Zhang Y.-Z. Holmes E.C. Meta-transcriptomics and the evolutionary biology of RNA viruses.Virus Res. 2018; 243: 83-90Crossref PubMed Scopus (78) Google Scholar). Paradoxically, however, the more we delve into the virosphere, the more apparent it becomes that we have only characterized a miniscule proportion, with a systematic bias against identifying the most divergent genomes. For example, the most likely explanation for the existence of “gaps” (i.e., an absence of branching events) on phylogenetic trees, particularly those that exist between different virus families and orders, is that the taxa that exist on these branches have yet to be sampled. With more expansive surveys, the phylogeny of RNA viruses is likely to turn from a tree into a bush. It is clear that the metagenomic discovery and characterization of more viruses, particularly those that are divergent in structure, will provide critical new data on virus genetic and phenotypic diversity, transforming our understanding of the virosphere and of the evolutionary processes that have shaped it, as well as fundamental aspects of virus biology, including the virus-host interactions that can on occasion lead to disease emergence. Herein, we briefly review some of the key advances of this new virological age of discovery and some of the challenges for the future. We necessarily focus on RNA viruses, in which the expansion in diversity has been the greatest and the challenges are perhaps the most profound—although a similar story could and should be told for DNA viruses. The most obvious and dramatic impact of metagenomics has been to transform our understanding of the extent and structure of viral biodiversity, including the discovery of a plethora of new viruses (Shi et al., 2018Shi M. Zhang Y.-Z. Holmes E.C. Meta-transcriptomics and the evolutionary biology of RNA viruses.Virus Res. 2018; 243: 83-90Crossref PubMed Scopus (78) Google Scholar). The breadth of every known family of RNA virus has expanded during the metagenomics revolution, and there is a greater evolutionary continuity between virus taxa than we ever thought existed (Shi et al., 2016Shi M. Lin X.-D. Tian J.-H. Chen L.-J. Chen X. Li C.-X. Qin X.-C. Li J. Cao J.-P. Eden J.-S. et al.Redefining the invertebrate RNA virosphere.Nature. 2016; 540: 539-543Crossref PubMed Scopus (872) Google Scholar). Indeed, phylogenetic analysis of metagenomics data has revealed that virus genera, families, orders, and currently unclassified lineages can often be clumped into larger groups. For example, the so-called “Hepe-Virga” group (Shi et al., 2016Shi M. Lin X.-D. Tian J.-H. Chen L.-J. Chen X. Li C.-X. Qin X.-C. Li J. Cao J.-P. Eden J.-S. et al.Redefining the invertebrate RNA virosphere.Nature. 2016; 540: 539-543Crossref PubMed Scopus (872) Google Scholar), defined on the basis of a relatively conserved cluster of RNA-dependent RNA polymerase (RdRp) sequences, contains 11 orders/families/floating genera with distinct host groups and genome structures, including members of order Tymovirales, the families Virgaviridae, Togaviridae, Bromoviridae, Closteroviridae, Endornaviridae, Alphatetraviridae, and Hepeviridae, as well as the floating genera Idaeovirus, Negevirus, and Cilevirus. However, it is also the case that, to date, the main impact of metagenomics has been to expand the diversity of existing families rather than to define new ones. Although, at face value, this suggests that we are beginning to exhaust the deepest levels of viral diversity, it is more probable that we are unable to detect highly divergent groups of viruses because of limitations in the currently available similarity-searching algorithms. Indeed, an inherent limitation of metagenomics is that it is heavily dependent on the accuracy and sensitivity of sequence assembly and searching. A notable exception was the recent discovery of chuviruses, qinviruses, and yueviruses (Li et al., 2015Li C.-X. Shi M. Tian J.-H. Lin X.-D. Kang Y.-J. Qin X.-C. Chen L.-J. Xu J. Holmes E.C. Zhang Y.-Z. Unprecedented RNA virus diversity in arthropods reveals the ancestry of negative-sense RNA viruses.eLife. 2015; 4: e05378Crossref Scopus (481) Google Scholar, Shi et al., 2016Shi M. Lin X.-D. Tian J.-H. Chen L.-J. Chen X. Li C.-X. Qin X.-C. Li J. Cao J.-P. Eden J.-S. et al.Redefining the invertebrate RNA virosphere.Nature. 2016; 540: 539-543Crossref PubMed Scopus (872) Google Scholar), which have now been recognized as new families (or orders) of negative-sense RNA viruses. Not only do these viruses fall between the major clusters of segmented and unsegmented viruses on phylogenetic trees, but they contain either a bi-segment form or both bi-segmented and unsegmented genomes, as well as circular genomes in the case of some chuviruses (Li et al., 2015Li C.-X. Shi M. Tian J.-H. Lin X.-D. Kang Y.-J. Qin X.-C. Chen L.-J. Xu J. Holmes E.C. Zhang Y.-Z. Unprecedented RNA virus diversity in arthropods reveals the ancestry of negative-sense RNA viruses.eLife. 2015; 4: e05378Crossref Scopus (481) Google Scholar), highlighting the complexity of genome-scale evolution (see below). The most dramatic expansion of virus genetic diversity has occurred in the invertebrates. Until recently, invertebrates, which usually meant arthropods, were almost exclusively only thought of as the vectors of viruses that were transmitted among vertebrates (such as the agents of dengue and zika), with little understanding of their “natural” viromes outside of these pathogens. This view has radically changed with metagenomics, with a massive expansion in the RNA virus diversity present in these animals such that disease-causing viruses are now the exception rather than the rule (Junglen and Drosten, 2013Junglen S. Drosten C. Virus discovery and recent insights into virus diversity in arthropods.Curr. Opin. Microbiol. 2013; 16: 507-513Crossref PubMed Scopus (69) Google Scholar, Li et al., 2015Li C.-X. Shi M. Tian J.-H. Lin X.-D. Kang Y.-J. Qin X.-C. Chen L.-J. Xu J. Holmes E.C. Zhang Y.-Z. Unprecedented RNA virus diversity in arthropods reveals the ancestry of negative-sense RNA viruses.eLife. 2015; 4: e05378Crossref Scopus (481) Google Scholar, Marklewitz et al., 2015Marklewitz M. Zirkel F. Kurth A. Drosten C. Junglen S. Evolutionary and phenotypic analysis of live virus isolates suggests arthropod origin of a pathogenic RNA virus family.Proc. Natl. Acad. Sci. USA. 2015; 112: 7536-7541Crossref PubMed Scopus (122) Google Scholar, Shi et al., 2016Shi M. Lin X.-D. Tian J.-H. Chen L.-J. Chen X. Li C.-X. Qin X.-C. Li J. Cao J.-P. Eden J.-S. et al.Redefining the invertebrate RNA virosphere.Nature. 2016; 540: 539-543Crossref PubMed Scopus (872) Google Scholar, Webster et al., 2015Webster C.L. Waldron F.M. Robertson S. Crowson D. Ferrari G. Quintana J.F. Brouqui J.M. Bayne E.H. Longdon B. Buck A.H. et al.The discovery, distribution, and evolution of viruses associated with Drosophila melanogaster.PLoS Biol. 2015; 13: e1002210Crossref PubMed Scopus (190) Google Scholar). Intriguingly, in some cases, the RNA viruses that are present in invertebrates may be ancestral to those viruses found in vertebrates, which in turn suggests that some families of RNA viruses have existed for the entire evolutionary history of the animals, although the phylogenetic distance between viruses from different animal groups is often substantial. Although the expansion in the diversity of invertebrate RNA viruses has been dramatic, it should not surprise us, as these animals are highly diverse and often possess the huge population sizes that are the ecological prerequisite for high viral diversities. The case of the invertebrates also reminds us of the sampling biases inherent in virus discovery. For example, the vast majority of invertebrate viruses sampled still come from a single phylum—the arthropods. Although arthropods may be of particular importance because of their diversity, comprising more than 80% of all living animal species described to date, and because of their strong ecological relationship with both plants and vertebrates, it is possible that similar amounts of virus diversity will be discovered in other invertebrate phyla once they are sampled more intensively. In particular, we suggest that viruses from basal metazoa, such as from the phyla Cnidaria (corals, jellyfish) and Porifera (sponges), will provide important insights into the origin and evolution of animal viruses, and whether vertebrate viruses ultimately originate from invertebrates will clearly require denser sampling. Sampling biases are equally profound when we consider groups other than invertebrates. For example, our knowledge of vertebrate RNA viruses is heavily skewed toward mammals and, to a lesser extent, birds, with relatively little known about the viruses that infect amphibians, reptiles, and the diverse groups of fish. While the bias toward mammalian viruses is understandable from the perspective of human disease emergence, because most human viruses are of mammalian origin, it makes little sense in terms of animal biodiversity. Indeed, recent work has hinted that fish may be a particularly rich source of virus diversity (Lauber et al., 2017Lauber C. Seitz S. Mattei S. Suh A. Beck J. Herstein J. Börold J. Salzburger W. Kaderali L. Briggs J.A.G. Bartenschlager R. Deciphering the origin and evolution of hepatitis B viruses by means of a family of non-enveloped fish viruses.Cell Host Microbe. 2017; 22: 387-399.e6Abstract Full Text Full Text PDF PubMed Scopus (99) Google Scholar). Finally, it is obvious that the diversity of RNA viruses discovered by metagenomic methods will pose major challenges to virus taxonomy (Shi et al., 2018Shi M. Zhang Y.-Z. Holmes E.C. Meta-transcriptomics and the evolutionary biology of RNA viruses.Virus Res. 2018; 243: 83-90Crossref PubMed Scopus (78) Google Scholar, Simmonds et al., 2017Simmonds P. Adams M.J. Benkő M. Breitbart M. Brister J.R. Carstens E.B. Davison A.J. Delwart E. Gorbalenya A.E. Harrach B. et al.Consensus statement: Virus taxonomy in the age of metagenomics.Nat. Rev. Microbiol. 2017; 15: 161-168Crossref PubMed Scopus (402) Google Scholar). In particular, it is likely that metagenomics data will become the new standard for taxonomic studies, with full virological analysis restricted to a subset of interesting exemplars. However, as well as increasing biological diversity, metagenomic data also creates new challenges for those wishing to classify viruses. For example, the reliance on trees inferred from a single gene—usually the most conserved RdRp—may paint an incomplete picture of phylogenetic relationships if different genes have different evolutionary histories, as commonly appears to be the case. Similarly, simple bifurcating phylogenetic trees are evidently neither the best nor most accurate descriptor of evolutionary relationships in the face of widespread lateral gene transfer and recombination. Not only has metagenomics led to a revolution in our view of the phylogenetic diversity of RNA viruses, but it has led to a new understanding of the range and structure of virus genomes and the evolutionary processes that have given rise to them. Indeed, it now appears that despite their (currently) universally small genomes, RNA viruses experience as complex processes of genome evolution as those large DNA viruses and utilize a wide range of replication-expression strategies. Hence, RNA virus genomes are more diverse and have more intricate structures and a wider range of lengths—including mixes of segmented, unsegmented, and circular genomes—than previously anticipated. Similarly, processes such as gene duplication and loss, genomic rearrangements, and lateral gene transfer occur far more frequently than ever imagined (with some viruses even lacking structural genes), and there are an increasing number of examples of cellular genes integrated into viral genomes (Shi et al., 2016Shi M. Lin X.-D. Tian J.-H. Chen L.-J. Chen X. Li C.-X. Qin X.-C. Li J. Cao J.-P. Eden J.-S. et al.Redefining the invertebrate RNA virosphere.Nature. 2016; 540: 539-543Crossref PubMed Scopus (872) Google Scholar) (Figure 2). For example, phylogenetic analysis revealed that two related exonuclease genes of eukaryotic origin that are present in divergent sea slater (Ligia oceanica) RNA viruses were acquired independently and relatively recently from the cellular organisms (Shi et al., 2016Shi M. Lin X.-D. Tian J.-H. Chen L.-J. Chen X. Li C.-X. Qin X.-C. Li J. Cao J.-P. Eden J.-S. et al.Redefining the invertebrate RNA virosphere.Nature. 2016; 540: 539-543Crossref PubMed Scopus (872) Google Scholar). Lateral gene transfer events that occurred earlier in evolutionary history are likely to remain undetected because sequence homology will be obscured by rapid evolution and intensive positive selection upon integration and because the length of insertions in RNA viruses are usually restricted by constraints on genome size. RNA virus genomes are clearly dynamic and flexible entities. For example, until recently, it was thought that families of RNA viruses were usually characterized by a specific segmentation type, such as the presence or absence of segmented genomes or a certain number of segments. Metagenomic data now tells us that segmentation is not a strong taxon defining trait, and a combination of segmented and unsegmented genomes have been observed within families of RNA viruses, such as the Flaviviridae, Partitiviridae, Picorbirnaviridae, Tombusviridae, and Luteoviridae. Even the order Mononegavirales, whose name invokes a single segment of negative-sense RNA, in fact has a small number of segmented members. Genome segmentation is therefore a flexible evolutionary trait, illustrating that RNA viruses are able to re-program their genomes more easily than previously thought. Again, invertebrates appear to be a particularly rich source of genomic diversity and often have genomes that are more complex in both size and structure than in related viruses found in vertebrates (although, again, major sampling biases may be at play). A good example is provided by the flaviviruses and the related “flavi-like” viruses. Flaviviruses were conventionally thought to be unsegmented positive-sense RNA viruses that infected vertebrates, with any association with invertebrates (usually mosquitoes and ticks) simply a reflection of their role in vector-borne transmission. Meta-transcriptomic studies have radically changed this view, and “insect-specific” flaviviruses are now commonplace (Qin et al., 2014Qin X.-C. Shi M. Tian J.-H. Lin X.-D. Gao D.-Y. He J.-R. Wang J.-B. Li C.-X. Kang Y.-J. Yu B. et al.A tick-borne segmented RNA virus contains genome segments derived from unsegmented viral ancestors.Proc. Natl. Acad. Sci. USA. 2014; 111: 6744-6749Crossref PubMed Scopus (112) Google Scholar). More dramatically, these insect-associated flaviviruses can possess very large genomes (∼26 kb), and these can be arranged in four or five segments (Qin et al., 2014Qin X.-C. Shi M. Tian J.-H. Lin X.-D. Gao D.-Y. He J.-R. Wang J.-B. Li C.-X. Kang Y.-J. Yu B. et al.A tick-borne segmented RNA virus contains genome segments derived from unsegmented viral ancestors.Proc. Natl. Acad. Sci. USA. 2014; 111: 6744-6749Crossref PubMed Scopus (112) Google Scholar), that may even exist as different virus particles (Ladner et al., 2016Ladner J.T. Wiley M.R. Beitzel B. Auguste A.J. Dupuis 2nd, A.P. Lindquist M.E. Sibley S.D. Kota K.P. Fetterer D. Eastwood G. et al.A multicomponent animal virus isolated from mosquitoes.Cell Host Microbe. 2016; 20: 357-367Abstract Full Text Full Text PDF PubMed Scopus (86) Google Scholar) (Figure 2). The latter observation is particularly striking, as such multicomponent viruses were thought to be exclusive to plant RNA viruses. This raises questions about how viruses with this distinctive structure originated and what selective processes are responsible for their evolutionary maintenance. Similarly, the existence of viruses that lack glycoprotein or nucleoprotein genes, or even both (Li et al., 2015Li C.-X. Shi M. Tian J.-H. Lin X.-D. Kang Y.-J. Qin X.-C. Chen L.-J. Xu J. Holmes E.C. Zhang Y.-Z. Unprecedented RNA virus diversity in arthropods reveals the ancestry of negative-sense RNA viruses.eLife. 2015; 4: e05378Crossref Scopus (481) Google Scholar), challenge what we mean by a “complete” virus genome and raise questions about how these viruses function within hosts. Metagenomics is also beginning to shed new light on the interactions between viruses and their hosts. Central to this are measures of viral abundance. Meta-transcriptomics provides an in-built way to measure the relative of abundance of viruses within hosts: as the proportion of the total number of transcripts (excluding rRNA) from a host that map to RNA viruses. Although this is only a relative measure and is likely subject to a variety of biases, one of the most striking results is that RNA viruses appear to be at consistently higher abundance in invertebrates than vertebrates. It is tempting to think that this reflects, at least in part, the evolution of adaptive immunity in the latter. At the same time, measures of abundance are an important means to assess the true host range of an RNA virus, which is always a challenge to measure from metagenomic data alone. In particular, the higher the abundance of a virus in a host (particularly if it infects multiple tissues), then the more likely that it really infects the host it was sampled from rather than being associated its diet or with another (micro)organism present in that host. More fundamentally, data on virus abundance are challenging the long-held view that viruses are usually the agents of overt disease in the hosts they infect. Although there is a growing list of examples in which viruses (or endogenous retroviruses) can be beneficial to hosts, viruses have traditionally been considered to be agents of disease, particularly given the number of “immune” resources that hosts of all types devote to their removal. However, metagenomics studies question the automatic association between virus and illness. Again, invertebrates present the most interesting case, as not only can these animals carry an enormous diversity of viruses, but these can be at huge abundance within hosts, sometimes forming the majority of the non-rRNA component of the host transcriptome (Shi et al., 2016Shi M. Lin X.-D. Tian J.-H. Chen L.-J. Chen X. Li C.-X. Qin X.-C. Li J. Cao J.-P. Eden J.-S. et al.Redefining the invertebrate RNA virosphere.Nature. 2016; 540: 539-543Crossref PubMed Scopus (872) Google Scholar). How are invertebrates able to tolerate such remarkably high viral loads? While it is possible that some of these high-abundance viruses will ultimately cause disease in their hosts, it is tempting to think that these viruses do not cause overt illness and that some species have evolved mechanisms to tolerate such a richness of viruses. Should this be the case, determining exactly how invertebrates might be refractory to viral infection will clearly be an important research question for the future. Similarly, it will also be important to test the idea that overt disease is most commonly associated with cross-species transmission (i.e., host-jumping), as often appears to be the case with emerging human diseases. If so, this would have important implications for our understanding of the evolution of virulence in viruses. Metagenomic data, particularly when combined with phylogenetic analysis, have also told us that long-term virus evolution reflects a complex mix of virus-host co-divergence over many millions of years and frequent cross-species transmission and that it is sometimes difficult to disentangle these two processes, particularly as single viruses can be associated with multiple hosts (Geoghegan et al., 2017Geoghegan J.L. Duchêne S. Holmes E.C. Comparative analysis estimates the relative frequencies of co-divergence and cross-species transmission within viral families.PLoS Pathog. 2017; 13: e1006215Crossref PubMed Scopus (137) Google Scholar). Indeed, an emerging view is that virus-host co-divergence can extend back many millions—and perhaps even billions—of years and that cross-species transmission can occur frequently on this back-bone of co-divergence: it is this pattern that shapes the macro-evolution of RNA viruses and forms the evolutionary background to virus emergence. As a case in point, major clades of RNA viruses can be sampled from diverse eukaryotic groups in manner that does not always mirror the host phylogeny, including fungi, plants, invertebrates, and vertebrates (Shi et al., 2016Shi M. Lin X.-D. Tian J.-H. Chen L.-J. Chen X. Li C.-X. Qin X.-C. Li J. Cao J.-P. Eden J.-S. et al.Redefining the invertebrate RNA virosphere.Nature. 2016; 540: 539-543Crossref PubMed Scopus (872) Google Scholar). On a more localized scale, although hantaviruses were conventionally thought to have strictly co-diverged with their rodent hosts, more detailed sampling and analysis has revealed multiple instances of cross-species transmission (Holmes and Zhang, 2015Holmes E.C. Zhang Y.-Z. The evolution and emergence of hantaviruses.Curr. Opin. Virol. 2015; 10: 27-33Crossref PubMed Scopus (62) Google Scholar). Metagenomics is transforming our understanding of the breadth and diversity of the virosphere, showing it to be far larger, richer, complex, and thus, more interesting than previously envisioned. It is now clear that RNA virus evolution is not simply the product of rampant mutation, but that a variety of genomic processes such as lateral gene transfer and changes in the number of segments have played a major role in rewiring viral genomes. At the same time, the more that we dig into the virosphere, the more it becomes clear that there are fundamental gaps in our understanding of the virus world and that our view of what it means to be a virus is largely a function of that achingly small subset of viruses that we have sampled so far. It is therefore evident that we have only just begun to scratch the surface of the true diversity of viruses and that we know little of the factors that shape this diversity and evolution within ecosystems and over long evolutionary scales.