Input Data
Paired Tumor (●) and Normal (●) samples with both Long (ONT) and Short (Illumina) reads.
An Ensemble Method for Calling and Ranking Somatic Structural Variants Using Long and Short Reads
Walter Gallego Gomez, Elena Grassi, Andrea Bertotti, Gianvito Urgese
Politecnico di Torino & University of Torino
Structural Variants (SVs) are large-scale DNA changes—deletions, insertions, and rearrangements. In cancer, they are not just errors; they are often the key drivers of tumor growth. But finding them is like spotting a single incorrect piece in a million-piece puzzle. 🧩
We must distinguish new tumor-specific (somatic) changes from inherited (germline) variants.
Tumors are a mix of normal and cancer cells, with variants often present at low frequencies.
Traditional short-read sequencing struggles to see large SVs, while newer long-read tools can be imprecise.
We don't rely on a single source of truth. Our method combines the strengths of multiple tools and data types to build a high-confidence, unified result.
Paired Tumor (●) and Normal (●) samples with both Long (ONT) and Short (Illumina) reads.
Run three specialized callers in parallel: NanomonSV, SAVANA, and CuteSV.
Combine the results, identifying overlapping calls to increase confidence.
Use short-read data to find supporting evidence (gaps, soft-clipping, insert size) for each potential SV.
Calculate a final score for each SV based on all evidence from all sources.
Produce a single, ranked list of high-confidence somatic deletions, ready for validation.
Tested on the EspejoValle-Inclan benchmark (COLO829 cell line).
Found 35 of 38 true somatic deletions.
Only 3 high-scoring false positives.
Most of the 71 false positives received a low rank.
Our final score successfully separates high-confidence true positives from low-confidence noise.
● True Positives ● False Positives
Each component plays a crucial role in the final, accurate result.
The Finder: High recall, finds almost everything, but with some noise.
The Confirmer: High precision, very stringent, misses some real events.
The Validator: The ultimate ground truth, confirming events with orthogonal data.
Our method provides a robust, prioritized list of somatic SVs, which means researchers can:
Prioritize experimental validation on the most promising candidates.
More reliably identify cancer-driving SVs for downstream analysis.
Move towards a reproducible, gold-standard pipeline for somatic SV detection.
Our ensemble approach successfully leverages the strengths of long-read callers and short-read validation to produce a high-quality, ranked list of somatic deletions, significantly improving on individual tools.