The Biodiversity Genomics Europe (BGE) Project is a community-driven effort funded by the EU to establish a framework for coordinating DNA barcoding and genome sequencing across Europe. As part of the genome stream of BGE, reference genomes from species from critical European biodiversity are sequenced. An important task in this effort was the task ‘Critical biodiversity community sequencing’.
This task is one of the six interconnected tasks within the Work Package “Genome Sampling”. The overarching goal of the work package is to provide high-quality biological material to support the generation of reference genomes from both European critical biodiversity and biodiversity hot-spots while ensuring legal and ethical compliance during the collection procedures. Additional contributions include the development and implementation of species prioritization guidelines, species vouchering and biobanking, comprehensive metadata documentation and standard definitions, and research and development of standard operating procedures (SOPs) for genome size estimation, karyotyping and cell cultures.
In a previous task on ‘Gap Analysis and Species Prioritising’ also lead by us, the scientific community nominated species to BGE and subsequently provided biological samples for those species that were finally selected by BGE for DNA sequencing to obtain reference genomes as part of this task ‘Critical biodiversity community sequencing‘. The last month, this task successfully finished and we learned a lot about the challenges to coordinate such a complex task across the European landscape.
Two community calls for species suggestions resulted in 596 species nominated. The final goal was to deliver samples with a total genome span of up to 150 Gb. In the end, the European community achieved the delivery of 94 species totaling 143.7 Gb, which is 96% of the target. Out of the 596 species nominated by the European community, 137 were initially selected and 43 of the 137 were subsequently deselected. Even though we had initially selected 50% more than we had capacity to sequence, we completed exhausted our waiting list in the end as species had to be deselected. Species deselection occurred due to factors such as sampling failure or sample loss by the provider, overlap of species with ongoing sequencing by other genome consortia, maximum allowable delays in sample delivery within the project time frame, and difficulties in extracting DNA for genome sequencing of sufficient amount and quality.
All remaining selected species (94) have been sampled and shipped to the corresponding sequencing centres of BGE. In addition, contextual data concerning each sample (e.g., where and when it was collected; for details see Böhne et al. 2024) have been described in a metadata manifest transferred to BGE. Hence, the BGE samples come also along with a high standard concerning the non-sequence data. Additionally, the 94 species have been or will be accessioned in a biobank: 66 are already stored, while the remaining 28 will be accessioned after sequencing if there is leftover (extracted DNA or tissue) material.
As many museums were involved in this consortium and this work package, vouchering of specimens was regarded as an very important part. Providing a specimen voucher or an e-voucher was therefore mandatory in the nomination process. Accordingly, 90 of the 94 species (96.8%) already have an accession number for a specimen voucher and/or deposited an e-voucher (digital voucher, i.e. photo) of the specimen in the EBI BioImages repository. An ideal voucher would be what remains of the specimen after subsamples have been removed for sequencing and biobanking. Moreover, these leftovers should enable morphological identification. For small specimens that are fully consumed during DNA/RNA extraction, a proxy voucher from the same clone or population may be provided. As a last option for vouchering, a photo of the original specimen can serve as an e-voucher which ideally always accompanies the specimen. Reliance on an e-voucher is necessary in certain circumstances such as endangered species (for which only tissue samples can be taken), very large specimens (for which it is often not possible to preserve the entire organism), species with low abundance and very small specimens (for which all specimens are needed for sequencing and/or biobanking) or for specimens that cannot be permanently preserved (e.g., gastrotrichs). More specifically, for 40 species (42.6%), both an e-voucher and a morphological voucher were provided, while for 15 species (16.0%), only an e-voucher was chosen and for 35 species (37.2%), only a morphological voucher. The sample providers of the remaining four species (4.3%) still need to obtain an accession number or upload a photo of their voucher specimen.
The progress of the task with all it challenges also allowed us to identify ways of optimizing the sample collection process in future projects. Potential mitigating measures for future projects include strategies for broader recruitment of species nominations, increased community education regarding the selection criteria, and improvements in low-cost technology for effectively collecting, storing, and shipping samples.