Phase transition in the computational complexity of the shortest common superstring and genome assembly

Click here to flash read.

arXiv:2210.09986v2 Announce Type: replace-cross
Abstract: Genome assembly, the process of reconstructing a long genetic sequence by aligning and merging short fragments, or reads, is known to be NP-hard, either as a version of the shortest common superstring problem or in a Hamiltonian-cycle formulation. That is, the computing time is believed to grow exponentially with the the problem size in the worst case. Despite this fact, high-throughput technologies and modern algorithms currently allow bioinformaticians to handle datasets of billions of reads. Using methods from statistical mechanics, we address this conundrum by demonstrating the existence of a phase transition in the computational complexity of the problem and showing that practical instances always fall in the 'easy' phase (solvable by polynomial-time algorithms). In addition, we propose a Markov-chain Monte Carlo method that outperforms common deterministic algorithms in the hard regime.

Click here to read this post out

ID: 776613; Unique Viewers: 0

Unique Voters: 0

Total Votes: 0

Votes:

Latest Change: March 12, 2024, 7:34 a.m. Changes:

/u/anonymous

Dictionaries:

Words:

Spaces:

CC:
No creative common's license

Comments: