Machine learning-based classification of roses using 18 SNP markers for optimized genebank management
Abstract
Background: Reliable classification of rose cultivars is complicated by their long and complex breeding history, frequent hybridization, and the coexistence of traditional horticultural categories with genetically heterogeneous groups. While molecular marker sets such as SSRs have been applied to assess genetic relationships, studies across cultivated roses and species are rare. Using 18 SNP markers on 1,345 accessions in combination with machine learning now offers an opportunity to systematically evaluate how well horticultural classes align with underlying genomic structure and to provide robust tools for the management of large germplasm collections. Results: Using a panel of SNP markers across 1,345 rose accessions from the Europa Rosarium Sangerhausen, multiple unsupervised (hierarchical, spectral, k-means, DBSCAN, HDBSCAN) and supervised (SVM, decision tree, naive Bayes, XGBoost) machine learning approaches were applied to identify genetic clusters and predict horticultural classifications. Across clustering methods, certain groups consistently emerged as genetically distinct, such as the alba and damask roses, which clustered together with low internal diversity, reflecting their shared historic origin. In contrast, tea, bengal, lutea, and remontant hybrids were repeatedly grouped together and predicted with high classification accuracies (up to 100%) but displayed high within-group diversity, which is consistent with complex breeding backgrounds. Miniature, kordesii, and rubiginosa hybrids also tended to cluster together, despite their differing horticultural labels. Overall, the labels obtained from unsupervised clustering were consistently confirmed by supervised models, which achieved balanced accuracies of up to 100%, highlighting the robustness of the observed groupings. Conclusions: Our results demonstrate that machine learning applied to SNP marker data can robustly resolve genetic relationships among rose cultivars and provide novel insights into the alignment of horticultural classifications with genomic structure. The high predictive accuracies obtained suggest that marker-based classification can serve as a reliable complementary tool for genebank management, cultivar identification, and reassessment of traditional rose categories.
Details
- Organisation(s)
-
Section Molecular Plant Breeding
- Type
- Article
- Journal
- PLANT METHODS
- Volume
- 22
- ISSN
- 1746-4811
- Publication date
- 06.01.2026
- Publication status
- Published
- Peer reviewed
- Yes
- ASJC Scopus subject areas
- Biotechnology, Genetics, Plant Science
- Electronic version(s)
-
https://doi.org/10.1186/s13007-025-01496-0 (Access:
Open
)