The chosen ones or how to select species to genome sequence

The last years have seen an increasing number of sequencing consortia being established in support of the Earth Biogenome Project’s (EBP) goal of sequencing the genomes of each eukaryotic species. In Europe, these consortia are, for example, Darwin Tree of Life (DToL), EBP-Norway (EBP-Nor), ATLAsea or Biodiversity Genomics Europe (BGE). One of the aspects they have in common at present is that there a more species should be sequenced than there are resources to do so. Hence, a selection process has to be implemented. In most consortia, this is usually done in a top-down procedure. A committee established by the institutional consortia members governs the selection process. The BGE consortium has a task, in which scientists can suggest European species they want to provide for genome sequencing. Species are then selected from these suggestions. This naturally led to the question: How do we select the species?

Results of empirical data showing the relative enrichment of a category among the selected species versus all species and the overlap of selected species between different models applied.

Instead of establishing a selection committee from representatives of the institutional BGE members, we chose another procedure. We involved the larger European Reference Genome Atlas (ERGA) initiative in two ways. ERGA is different from the other genomic consortia as it is a bottom-up community of individual researchers rather than of scientific institutions. Any European researcher can become a member in contrast to consortia such as DToL, EBP-Nor or BGE, which are consortia of institutions. The Sampling and Sample Processing (SSP) committee of the ERGA initiative developed different possible species selection processes involving the larger community including researchers not part of the BGE project. The different selection models were tested with simulated and empirical data to assess the effect of the models and if they actually had the desired effect. The final decision on the species selection process was done by the ERGA council, which has at least one representative (but mostly two) from each European country.

Procedure of the species selection process with four different stages and the models and criteria applied at each stage.

This bottom-up, community-based decision process resulted in a unique selection procedure that is based on objective criteria instead of subjective motivation statements or other subjective criteria about the suggested species as well as, except for the last stage, fully-automated without human intervention in the selection process itself. Hence, no committee is necessary and the decision on the species selection can be done in a few hours. It is even possible to explore the outcome of different selection models. The process is a four-stage process including (1) an exclusion stage, (2) a prioritization stage employing a decision-tree model and additional ranking to ensure country and researcher representation, (3) a feasibility check with additional adjustment for genera with multiple species suggestions and (4) a final check of legal compliance. The species selection process is based on a total of 28 criteria. The details of the process are given above. The whole procedure as well as the selection process itself might provide a template for other projects as well. The strength was the involvement of the larger community, the desire to base it on objective criteria and the potential for automation allowing an easy scaling-up for projects with the need to select from not only hundreds of suggestions but thousands to hundreds of thousands.

Author

3 Comments on “The chosen ones or how to select species to genome sequence

Leave a Reply

Your email address will not be published. Required fields are marked *

Please reload

Please Wait