RFQAmodel: Random Forest Quality Assessment to identify a predicted protein structure in the correct fold
Autoři:
Clare E. West aff001; Saulo H. P. de Oliveira aff002; Charlotte M. Deane aff001
Působiště autorů:
Department of Statistics, University of Oxford, Oxford, England, United Kingdom
aff001; SLAC National Accelerator Laboratory, Stanford University, Menlo Park, California, United States of America
aff002; Bioengineering, Stanford University, Stanford, California, United States of America
aff003
Vyšlo v časopise:
PLoS ONE 14(10)
Kategorie:
Research Article
doi:
https://doi.org/10.1371/journal.pone.0218149
Souhrn
While template-free protein structure prediction protocols now produce good quality models for many targets, modelling failure remains common. For these methods to be useful it is important that users can both choose the best model from the hundreds to thousands of models that are commonly generated for a target, and determine whether this model is likely to be correct. We have developed Random Forest Quality Assessment (RFQAmodel), which assesses whether models produced by a protein structure prediction pipeline have the correct fold. RFQAmodel uses a combination of existing quality assessment scores with two predicted contact map alignment scores. These alignment scores are able to identify correct models for targets that are not otherwise captured. Our classifier was trained on a large set of protein domains that are structurally diverse and evenly balanced in terms of protein features known to have an effect on modelling success, and then tested on a second set of 244 protein domains with a similar spread of properties. When models for each target in this second set were ranked according to the RFQAmodel score, the highest-ranking model had a high-confidence RFQAmodel score for 67 modelling targets, of which 52 had the correct fold. At the other end of the scale RFQAmodel correctly predicted that for 59 targets the highest-ranked model was incorrect. In comparisons to other methods we found that RFQAmodel is better able to identify correct models for targets where only a few of the models are correct. We found that RFQAmodel achieved a similar performance on the model sets for CASP12 and CASP13 free-modelling targets. Finally, by iteratively generating models and running RFQAmodel until a model is produced that is predicted to be correct with high confidence, we demonstrate how such a protocol can be used to focus computational efforts on difficult modelling targets. RFQAmodel and the accompanying data can be downloaded from http://opig.stats.ox.ac.uk/resources.
Klíčová slova:
Multiple alignment calculation – Protein domains – Protein structure – Protein structure comparison – Protein structure prediction – Sequence alignment – Trees
Zdroje
1. de Oliveira SHP, Law EC, Shi J, Deane CM. Sequential search leads to faster, more efficient fragment-based de novo protein structure prediction. Bioinformatics. 2017;10.
2. Kryshtafovych A, Monastyrskyy B, Fidelis K, Schwede T, Tramontano A. Assessment of model accuracy estimations in CASP12. Proteins Struct Funct Bioinforma. 2018;86:345–360. doi: 10.1002/prot.25371
3. Moult J, Fidelis K, Kryshtafovych A, Schwede T, Tramontano A. Critical assessment of methods of protein structure prediction (CASP)—Round XII. Proteins Struct Funct Bioinforma. 2018. doi: 10.1002/prot.25415
4. Uziela K, Hurtado DM, Shu N, Wallner B, Elofsson A. ProQ3D: Improved model quality assessments using deep learning. Bioinformatics. 2017;33(10):1578–1580. doi: 10.1093/bioinformatics/btw819 28052925
5. Andrew Leaver-fay MT, Leaver-Fay A, Tyka M, Lewis SM, Lange OF, Thompson J, et al. ROSETTA3: An Object-Oriented Software Suite for the Simulation and Design of Macromolecules. Methods Enzymol. 2011. doi: 10.1016/B978-0-12-381270-4.00019-6 21187238
6. Pawlowski M, Kozlowski L, Kloczkowski A. MQAPsingle: A quasi single-model approach for estimation of the quality of individual protein structure models. Proteins Struct Funct Bioinforma. 2016. doi: 10.1002/prot.24787
7. Michel M, Skwark MJ, Hurtado DM, Ekeberg M, Elofsson A. Predicting accurate contacts in thousands of Pfam domain families using PconsC3. Bioinformatics. 2017. doi: 10.1093/bioinformatics/btx332
8. de Oliveira SHP, Shi J, Deane CM. Comparing co-evolution methods and their application to template-free protein structure prediction. Bioinformatics. 2016;33(3):btw618. doi: 10.1093/bioinformatics/btw618
9. Maghrabi AHA, Mcguffin LJ. ModFOLD6: An accurate web server for the global and local quality estimation of 3D protein models. Nucleic Acids Res. 2017;45(W1):W416–W421. doi: 10.1093/nar/gkx332 28460136
10. Ovchinnikov S, Park H, Varghese N, Huang PS, Pavlopoulos GA, Kim DE, et al. Protein structure determination using metagenome sequence data. Science (80-). 2017. doi: 10.1126/science.aah4043
11. Buchan DWA, Jones DT. EigenTHREADER: analogous protein fold recognition by efficient contact map threading. Bioinformatics. 2017;33(17):2684–2690. doi: 10.1093/bioinformatics/btx217 28419258
12. Michel M, Menéndez Hurtado D, Uziela K, Elofsson A. Large-scale structure prediction by improved contact predictions and model quality assessment. Bioinformatics. 2017;33:23–29. doi: 10.1093/bioinformatics/btx239
13. Fox NK, Brenner SE, Chandonia JM. SCOPe: Structural Classification of Proteins–extended, integrating SCOP and ASTRAL data and classification of new structures. Nucleic Acids Res. 2014;42(Database issue):D304–9. doi: 10.1093/nar/gkt1240 24304899
14. Punta M, Coggill PC, Eberhardt RY, Mistry J, Tate J, Boursnell C, et al. The Pfam protein families database. Nucleic Acids Res. 2012;40(D1):D290–D301. doi: 10.1093/nar/gkr1065 22127870
15. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, et al. The Protein Data Bank. Nucleic Acids Res. 2000;28(1):235–42. doi: 10.1093/nar/28.1.235 10592235
16. Wang S, Peng J, Ma J, Xu J. Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields. Sci Rep. 2016;6(1):18962. doi: 10.1038/srep18962
17. Zhang Y, Skolnick J. Scoring function for automated assessment of protein structure template quality. Proteins Struct Funct Genet. 2004;57(4):702–710. doi: 10.1002/prot.20264 15476259
18. Xu J, Zhang Y. How significant is a protein structure similarity with TM-score = 0.5? Bioinformatics. 2010;26(7):889–895. doi: 10.1093/bioinformatics/btq066 20164152
19. Liaw, A., Wiener, M. 2002. Classification and Regression by randomForest. R news.
20. Wallner B, Elofsson A. Identification of correct regions in protein models using structural, alignment, and consensus information. Protein Sci. 2006;15(4):900–913. doi: 10.1110/ps.051799606 16522791
21. Cock PJA, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A, et al. Biopython: Freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009;25(11):1422–1423. doi: 10.1093/bioinformatics/btp163 19304878
22. Jones DT, Singh T, Kosciolek T, Tetchner S. MetaPSICOV: Combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins. Bioinformatics. 2015;31(7):999–1006. doi: 10.1093/bioinformatics/btu791 25431331
23. Manavalan B, Lee J, Lee J. Random Forest-Based Protein Model Quality Assessment (RFMQA) Using Structural Features and Potential Energy Terms. PLOS ONE. 2014;9(9):1–11. doi: 10.1371/journal.pone.0106542
Článek vyšel v časopise
PLOS One
2019 Číslo 10
- Tisícileté topoly, mokří psi, stárnoucí kočky a ospalé octomilky – „jednohubky“ z výzkumu 2024/41
- Jaké jsou aktuální trendy v léčbě karcinomu slinivky?
- Může hubnutí souviset s vyšším rizikem nádorových onemocnění?
- Menstruační krev má značný diagnostický potenciál, mimo jiné u diabetu
- Metamizol jako analgetikum první volby: kdy, pro koho, jak a proč?
Nejčtenější v tomto čísle
- Correction: Low dose naltrexone: Effects on medication in rheumatoid and seropositive arthritis. A nationwide register-based controlled quasi-experimental before-after study
- Combining CDK4/6 inhibitors ribociclib and palbociclib with cytotoxic agents does not enhance cytotoxicity
- Experimentally validated simulation of coronary stents considering different dogboning ratios and asymmetric stent positioning
- Prevalence of pectus excavatum (PE), pectus carinatum (PC), tracheal hypoplasia, thoracic spine deformities and lateral heart displacement in thoracic radiographs of screw-tailed brachycephalic dogs
Zvyšte si kvalifikaci online z pohodlí domova
Všechny kurzy