Machine learning-based classification of roses using 18 SNP markers for optimized genebank management

Authored by

Laurine Patzer, Marcus Linde, Thomas Debener

Abstract

Background: Reliable classification of rose cultivars is complicated by their long and complex breeding history, frequent hybridization, and the coexistence of traditional horticultural categories with genetically heterogeneous groups. While molecular marker sets such as SSRs have been applied to assess genetic relationships, studies across cultivated roses and species are rare. Using 18 SNP markers on 1,345 accessions in combination with machine learning now offers an opportunity to systematically evaluate how well horticultural classes align with underlying genomic structure and to provide robust tools for the management of large germplasm collections. Results: Using a panel of SNP markers across 1,345 rose accessions from the Europa Rosarium Sangerhausen, multiple unsupervised (hierarchical, spectral, k-means, DBSCAN, HDBSCAN) and supervised (SVM, decision tree, naive Bayes, XGBoost) machine learning approaches were applied to identify genetic clusters and predict horticultural classifications. Across clustering methods, certain groups consistently emerged as genetically distinct, such as the alba and damask roses, which clustered together with low internal diversity, reflecting their shared historic origin. In contrast, tea, bengal, lutea, and remontant hybrids were repeatedly grouped together and predicted with high classification accuracies (up to 100%) but displayed high within-group diversity, which is consistent with complex breeding backgrounds. Miniature, kordesii, and rubiginosa hybrids also tended to cluster together, despite their differing horticultural labels. Overall, the labels obtained from unsupervised clustering were consistently confirmed by supervised models, which achieved balanced accuracies of up to 100%, highlighting the robustness of the observed groupings. Conclusions: Our results demonstrate that machine learning applied to SNP marker data can robustly resolve genetic relationships among rose cultivars and provide novel insights into the alignment of horticultural classifications with genomic structure. The high predictive accuracies obtained suggest that marker-based classification can serve as a reliable complementary tool for genebank management, cultivar identification, and reassessment of traditional rose categories.

Details

Organisation(s)
Section Molecular Plant Breeding
Type
Article
Journal
PLANT METHODS
Volume
22
ISSN
1746-4811
Publication date
06.01.2026
Publication status
Published
Peer reviewed
Yes
ASJC Scopus subject areas
Biotechnology, Genetics, Plant Science
Electronic version(s)
https://doi.org/10.1186/s13007-025-01496-0 (Access: Open )