CavBench: A benchmark for protein cavity detection methods
Autoři:
Sérgio Dias aff001; Tiago Simões aff001; Francisco Fernandes aff003; Ana Mafalda Martins aff005; Alfredo Ferreira aff003; Joaquim Jorge aff003; Abel J. P. Gomes aff001
Působiště autorů:
Instituto de Telecomunicações, Delegação da Covilhã, Covilhã, Portugal
aff001; Universidade da Beira Interior, Departamento de Informática, Covilhã, Portugal
aff002; INESC-ID, Lisboa, Portugal
aff003; Universidade de Lisboa, IST, Lisboa, Portugal
aff004; Universidade Europeia, Lisboa, Portugal
aff005
Vyšlo v časopise:
PLoS ONE 14(10)
Kategorie:
Research Article
doi:
https://doi.org/10.1371/journal.pone.0223596
Souhrn
Extensive research has been applied to discover new techniques and methods to model protein-ligand interactions. In particular, considerable efforts focused on identifying candidate binding sites, which quite often are active sites that correspond to protein pockets or cavities. Thus, these cavities play an important role in molecular docking. However, there is no established benchmark to assess the accuracy of new cavity detection methods. In practice, each new technique is evaluated using a small set of proteins with known binding sites as ground-truth. However, studies supported by large datasets of known cavities and/or binding sites and statistical classification (i.e., false positives, false negatives, true positives, and true negatives) would yield much stronger and reliable assessments. To this end, we propose CavBench, a generic and extensible benchmark to compare different cavity detection methods relative to diverse ground truth datasets (e.g., PDBsum) using statistical classification methods.
Klíčová slova:
Database and informatics methods – Drug discovery – Protein interactions – Protein structure – Software design – Statistical data – Parsers – Drug design
Zdroje
1. Laskowski R, Hutchinson E, Michie A, Wallace A, Jones M, Thornton J. PDBsum: a Web-based database of summaries and analyses of all PDB structures. Trends in Biochemical Sciences. 1997;22(12):488–490. doi: 10.1016/s0968-0004(97)01140-7 9433130
2. de Beer T, Berka K, Thornton J, Laskowski R. PDBsum additions. Nucleic Acids Research. 2014;42(D):292–296. doi: 10.1093/nar/gkt940
3. Kellenberger E, Muller P, Schalon C, Bret G, Foata N, Rognan D. sc-PDB: an annotated database of druggable binding sites from the Protein Data Bank. Journal of Chemical Information and Modeling. 2006;46(2):717–727. doi: 10.1021/ci050372x 16563002
4. Kuntz ID, Blaney JM, Oatley SJ, Langridge R, Ferrin TE. A geometric approach to macromolecule-ligand interactions. Journal of Molecular Biology. 1982;161(2):269–288. doi: 10.1016/0022-2836(82)90153-x 7154081
5. Shoichet B, Kuntz I, Bodian D. Molecular docking using shape descriptors. Journal of Computational Chemistry. 1992;13(3):380–397. doi: 10.1002/jcc.540130311
6. Dias S, Gomes AJP. GPU-Based Detection of Protein Cavities using Gaussian Surfaces. BMC Bioinformatics. 2017;18:493:1–493:10.
7. Voorintholt R, Kosters MT, Vegter G, Vriend G, Hol WG. A very fast program for visualizing protein surfaces, channels and cavities. Journal of Molecular Graphics. 1989;7(4):243–245. doi: 10.1016/0263-7855(89)80010-4 2486827
8. Ho CW, Marshall G. Cavity search: An algorithm for the isolation and display of cavity-like binding regions. Journal of Computer-Aided Molecular Design. 1990;4(4):337–354. doi: 10.1007/BF00117400 2092080
9. Caprio C, Takahashi Y, Sasaki S. A new approach to the automatic identification of candidates for ligand receptor sites in proteins: (I). Search for pocket regions. Journal of Molecular Graphics. 1993;11(1)23–29. doi: 10.1016/0263-7855(93)85003-9
10. Kleywegt GJ, Jones TA. Detection, delineation, measurement and display of cavities in macromolecular structures. Acta Crystallographica. 1994;50, Part 2:178–185. doi: 10.1107/S0907444993011333 15299456
11. Edelsbrunner H, Facello M, Fu P, Liang J. Measuring proteins and voids in proteins. In: Proceedings of the 28th Hawaii International Conference on System Sciences (HICSS’95). Washington, DC, USA: IEEE Computer Society; 1995. p. 256–264.
12. Voss NR, Gerstein M. 3V: cavity, channel and cleft volume calculator and extractor. Nucleic Acids Research. 2010;38:W555–W562. doi: 10.1093/nar/gkq395 20478824
13. Zhu H, Pisabarro MT. MSPocket: an orientation-independent algorithm for the detection of ligand binding pockets. Bioinformatics. 2011;27(3):351–358. doi: 10.1093/bioinformatics/btq672 21134896
14. Schneider S, Zacharias M. Combining geometric pocket detection and desolvation properties to detect putative ligand binding sites on proteins. Journal of Structural Biology. 2012;180(3):546–550. doi: 10.1016/j.jsb.2012.09.010 23023089
15. Oliveira SHP, Ferraz FAN, Honorato RV, Xavier-Neto J, Sobreira TJP, de Oliveira PSL. KVFinder: steered identification of protein cavities as a PyMOL plugin. BMC Bioinformatics. 2014;15(197):1–8.
16. Czirják G. PrinCCes: Continuity-based geometric decomposition and systematic visualization of the void repertoire of proteins. Journal of Molecular Graphics and Modelling. 2015;62:118–127. doi: 10.1016/j.jmgm.2015.09.013 26409191
17. Kim B, Lee JE, Kim YJ, Kim KJ. GPU Accelerated Finding of Channels and Tunnels for a Protein Molecule. International Journal of Parallel Programming. 2016;44(1):87–108. doi: 10.1007/s10766-014-0331-8
18. Xenarios I, Rice DW, Salwinski L, Baron MK, Marcotte EM, Eisenberg D. DIP: the Database of Interacting Proteins. Nucleic Acids Research. 2000;28(1):289–291. doi: 10.1093/nar/28.1.289 10592249
19. Bader GD, Hogue CW. BIND—a data specification for storing and describing biomolecular interactions, molecular complexes and pathways. Bioinformatics. 2000;16(5):465–477. doi: 10.1093/bioinformatics/16.5.465 10871269
20. Chen X, Liu M, Gilson MK. BindingDB: A Web-Accessible Molecular Recognition Database. Combinatorial Chemistry & High Throughput Screening. 2001;4(8):719–725. doi: 10.2174/1386207013330670
21. Liu T, Lin Y, Wen X, Jorissen RN, Gilson MK. BindingDB: a web-accessible database of experimentally determined protein-ligand binding affinities. Nucleic acids research. 2007;35(suppl 1):D198–D201. doi: 10.1093/nar/gkl999 17145705
22. Ivanisenko V, Grigorovich D, Kolchanov N. PDBSite: a database on biologically active sites and their spatial surroundings in proteins with known tertiary structure. In: Proceedings of the 2nd International Conference on Bioinformatics of Genome Regulation and Structure (BGRS’2000). Novosibirsk, Russia; 2000. p. 173.
23. Puvanendrampillai D, Mitchell JB. Protein Ligand Database (PLD): additional understanding of the nature and specificity of protein-ligand complexes. Bioinformatics. 2003;19(14):1856–1857. doi: 10.1093/bioinformatics/btg243 14512362
24. Gold ND, Jackson RM. SitesBase: a database for structure-based protein-ligand binding site comparisons. Nucleic Acids Research. 2006;34(suppl. 1):D231–D234. doi: 10.1093/nar/gkj062 16381853
25. Hu L, Benson ML, Smith RD, Lerner MG, Carlson HA. Binding MOAD (mother of all databases). Proteins: Structure, Function, and Bioinformatics. 2005;60(3):333–340. doi: 10.1002/prot.20512
26. Benson ML, Smith RD, Khazanov NA, Dimcheff B, Beaver J, Dresslar P, et al. Binding MOAD, a high-quality protein ligand database. Nucleic Acids Research. 2008;36(D):2977–2980.
27. Lopez G, Valencia A, Tress M. FireDB–a database of functionally important residues from proteins of known structure. Nucleic Acids Research. 2007;35(suppl. 1):D219–D223. doi: 10.1093/nar/gkl897 17132832
28. Ito JI, Tabei Y, Shimizu K, Tsuda K, Tomii K. PoSSuM: a database of similar protein-ligand binding and putative pockets. Nucleic Acids Research. 2012;40(D):D541–D548. doi: 10.1093/nar/gkr1130 22135290
29. Singh H, Chauhan JS, Gromiha MM, Raghava GPS. ccPDB: compilation and creation of data sets from Protein Data Bank. Nucleic Acids Research. 2012;40(D):D486–D489. doi: 10.1093/nar/gkr1150 22139939
30. Kufareva I, Ilatovskiy AV, Abagyan R. Pocketome: an encyclopedia of small-molecule binding sites in 4D. Nucleic Acids Research. 2012;40(D1):D535–D540. doi: 10.1093/nar/gkr825 22080553
31. Yang J, Roy A, Zhang Y. BioLiP: a semi-manually curated database for biologically relevant ligand-protein interactions. Nucleic Acids Research. 2013;41(D):D1096–D1103. doi: 10.1093/nar/gks966 23087378
32. Desaphy J, Rognan D. sc-PDB-Frag: A Database of Protein-Ligand Interaction Patterns for Bioisosteric Replacements. Journal of Chemical Information and Modeling. 2014;54(7):1908–1918. doi: 10.1021/ci500282c 24991975
33. Kawabata T, Go N. Detection of pockets on protein surfaces using small and large probe spheres to find putative ligand binding sites. Proteins: Structure, Function, and Bioinformatics. 2007;68(2):516–529. doi: 10.1002/prot.21283
34. Kalidas Y, Chandra N. PocketDepth: A new depth based algorithm for identification of ligand binding sites in proteins. Journal of Structural Biology. 2008;161(1):31–42. doi: 10.1016/j.jsb.2007.09.005 17949996
35. Capra JA, Laskowski RA, Thornton JM, Singh M, Funkhouser TA. Predicting protein ligand binding sites by combining evolutionary sequence conservation and 3D structure. PLoS Computational Biology. 2009;5(12):e1000585. doi: 10.1371/journal.pcbi.1000585 19997483
36. Kawabata T. Detection of multiscale pockets on protein surfaces using mathematical morphology. Proteins: Structure, Function, and Bioinformatics. 2010;78(5):1195–1211. doi: 10.1002/prot.22639
37. Volkamer A, Griewel A, Grombacher T, Rarey M. Analyzing the Topology of Active Sites: On the Prediction of Pockets and Subpockets. Journal of Chemical Information and Modeling. 2010;50(11):2041–2052. doi: 10.1021/ci100241y 20945875
38. Guo F, Wang L. Computing the protein binding sites. BMC Bioinformatics. 2012;13(10):S2. doi: 10.1186/1471-2105-13-S10-S2 22759425
39. Lo YT, Wang HW, Pai TW, Tzou WS, Hsu HH, Chang HT. Protein-ligand binding region prediction (PLB-SAVE) based on geometric features and CUDA acceleration. BMC Bioinformatics. 2013;14(Suppl 4). doi: 10.1186/1471-2105-14-S4-S4
40. Conte LL, Ailey B, Hubbard TJP, Brenner SE, Murzin AG, et al. SCOP: a Structural Classification of Proteins database. Nucleic Acids Research. 2000;28(1):257–259. doi: 10.1093/nar/28.1.257 10592240
41. Hendlich M, Bergner A, Günther J, Klebe G. Relibase: design and development of a database for comprehensive analysis of protein-ligand interactions. Journal of Molecular Biology. 2003;326(2):607–620. doi: 10.1016/s0022-2836(02)01408-0 12559926
42. Wang R, Fang X, Lu Y, Wang S. The PDBbind Database: Collection of Binding Affinities for Protein-Ligand Complexes with Known Three-Dimensional Structures. Journal of Medicinal Chemistry. 2004;47(12):2977–2980. doi: 10.1021/jm030580l 15163179
43. Liu Z, Li Y, Han L, Li J, Liu J, Zhao Z, et al. PDB-wide collection of binding data: current status of the PDBbind database. Bioinformatics. 2014; p. btu626.
44. Dessailly BH, Lensink MF, Wodak SJ. LigASite: a database of biologically relevant binding sites in proteins with known apo-structures. Acid Nucleic Research. 2008;36:D667–673. doi: 10.1093/nar/gkm839
45. Saito T, Rehmsmeier M. The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets. PLoS ONE. 2015;10(3):1–21. doi: 10.1371/journal.pone.0118432
46. Laskowski RA. SURFNET: A program for visualizing molecular surfaces, cavities, and intermolecular interactions. Journal of Molecular Graphics. 1995;13(5):323–330. doi: 10.1016/0263-7855(95)00073-9 8603061
47. Petřek M, Košinová P, Koča J, Otyepka M. MOLE: A Voronoi Diagram-Based Explorer of Molecular Channels, Pores, and Tunnels. Structure. 2007;15(11):1357–1363. doi: 10.1016/j.str.2007.10.007 17997961
48. Le Guilloux V, Schmidtke P, Tuffery P. Fpocket: an open source platform for ligand pocket detection. BMC Bioinformatics. 2009;10(1):1–11. doi: 10.1186/1471-2105-10-168
49. Sehnal D, Vařeková RS, Berka K, Pravda L, Navrátilová V, Banáš P, et al. MOLE 2.0: advanced approach for analysis of biomacromolecular channels. Journal of Cheminformatics. 2013;5(1):39. doi: 10.1186/1758-2946-5-39 23953065
50. Ester M, Kriegel HP, Sander J, Xu X. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Simoudis E, Han J, Fayyad U, editors. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD’96). AAAI Press; 1996. p. 226–231.
51. Forgy EW. Cluster analysis of multivariate data: efficiency versus interpretability of classifications. Biometrics. 1982;21(3):768–769.
52. Lloyd SP. Least square quantization in PCM. IEEE Transactions on Information Theory. 1982;28(2):129–137. doi: 10.1109/TIT.1982.1056489
53. Brady GP, Stouten PFW. Fast prediction and visualization of protein binding pockets with PASS. Journal of Computer-Aided Molecular Design. 2000;14(4):383–401. doi: 10.1023/A:1008124202956 10815774
54. Huang B, Schroeder M. LIGSITEcsc: predicting ligand binding sites using the Connolly surface and degree of conservation. BMC Structural Biology. 2006;6(1):19. doi: 10.1186/1472-6807-6-19 16995956
55. Weisel M, Proschak E, Schneider G. PocketPicker: analysis of ligand binding-sites with shape descriptors. Chemistry Central Journal. 2007;1(1):7. doi: 10.1186/1752-153X-1-7 17880740
56. Powers D. Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness and Correlation. Journal of Machine Learning Technologies. 2011;2(1):37–63.
Článek vyšel v časopise
PLOS One
2019 Číslo 10
- S diagnostikou Parkinsonovy nemoci může nově pomoci AI nástroj pro hodnocení mrkacího reflexu
- Je libo čepici místo mozkového implantátu?
- Pomůže v budoucnu s triáží na pohotovostech umělá inteligence?
- AI může chirurgům poskytnout cenná data i zpětnou vazbu v reálném čase
- Nová metoda odlišení nádorové tkáně může zpřesnit resekci glioblastomů
Nejčtenější v tomto čísle
- Correction: Low dose naltrexone: Effects on medication in rheumatoid and seropositive arthritis. A nationwide register-based controlled quasi-experimental before-after study
- Combining CDK4/6 inhibitors ribociclib and palbociclib with cytotoxic agents does not enhance cytotoxicity
- Experimentally validated simulation of coronary stents considering different dogboning ratios and asymmetric stent positioning
- Risk factors associated with IgA vasculitis with nephritis (Henoch–Schönlein purpura nephritis) progressing to unfavorable outcomes: A meta-analysis
Zvyšte si kvalifikaci online z pohodlí domova
Všechny kurzy