Data di Pubblicazione:
2015
Abstract:
The analysis of the 3D structures of proteins is a very important problem in life
sciences, since the geometric set-up of proteins has a deep relevance in many
biological processes. The complexity of the analysis and the continuous increase
10 in the number of proteins whose 3D structure is known call for ecient and quick
algorithms. Parallel processing is becoming an enabling tool for such research.
A key component in the geometric description of a protein is the structural
motif, a 3D element which appears in a variety of molecules and is usually made
of just a few simpler structures, the secondary structures elements (SSEs).
15 This paper is an extended version of Ferretti and Musci [1], and presents the
Cross Motif Search (CMS) and the Complete CMS (CCMS) algorithms, two
highly optimized and ecient parallel methods to detect the presence and location
of all common motifs of secondary structures in a given protein pair (CMS)
or across an arbitrary large dataset of proteins (CCMS). The analysis builds on
20 existing approaches, such as Secondary Structure Co-Occurrences (SSC), based
on the General Hough Transform (GHT) technique. The main dierence between
our proposal and the state of the art is the innovative focus that CMS
puts on the geometric description of the structural motifs, which could be simply
viewed as vectors in a 3D space, rather than on the topological/biological
25 description employed by competing algorithms, such as Prosmos, Promotif or
MASS. The advantage of a geometrical approach is that it enables to retrieve
the exact location of the common substructures in a protein pair.
The paper analyzes all possible forms of serial and parallelism optimization
of the proposed algorithms, both shared memory and message passing. It introduces
a complete parallel implementation of CMS, based on OpenMP, and
discusses its scalability on shared-memory architectures. Both small-scale and
medium-scale testing shows that the methods produces very interesting results
in real applications, and scales nicely up to the eight-processor limit. More indepth
testing also shows that the scalability limit is due to the inner structure of
the problem, and that the similarities among proteins and the chosen tolerance
for the analysis greatly impact the overall performance.
Tipologia CRIS:
1.1 Articolo in rivista
Elenco autori:
Ferretti, Marco; Musci, Mirto
Link alla scheda completa:
Link al Full Text:
Pubblicato in: