A new resolution function to evaluate tree shape statistics
Autoři:
Maryam Hayati aff001; Bita Shadgar aff001; Leonid Chindelevitch aff001
Působiště autorů:
School of Computing Science, Simon Fraser University, Burnaby, BC, Canada
aff001
Vyšlo v časopise:
PLoS ONE 14(11)
Kategorie:
Research Article
doi:
https://doi.org/10.1371/journal.pone.0224197
Souhrn
Phylogenetic trees are frequently used in biology to study the relationships between a number of species or organisms. The shape of a phylogenetic tree contains useful information about patterns of speciation and extinction, so powerful tools are needed to investigate the shape of a phylogenetic tree. Tree shape statistics are a common approach to quantifying the shape of a phylogenetic tree by encoding it with a single number. In this article, we propose a new resolution function to evaluate the power of different tree shape statistics to distinguish between dissimilar trees. We show that the new resolution function requires less time and space in comparison with the previously proposed resolution function for tree shape statistics. We also introduce a new class of tree shape statistics, which are linear combinations of two existing statistics that are optimal with respect to a resolution function, and show evidence that the statistics in this class converge to a limiting linear combination as the size of the tree increases. Our implementation is freely available at https://github.com/WGS-TB/TreeShapeStats.
Klíčová slova:
Computing methods – Eigenvalues – Epidemiological statistics – Leaves – Phylogenetic analysis – Phylogenetics – Speciation – Viral evolution
Zdroje
1. Steel M, Mckenzie A. Properties of phylogenetic trees generated by Yule-type speciation models. Mathematical Biosciences. 2001;170:91–112. doi: 10.1016/s0025-5564(00)00061-4 11259805
2. Purvis A. Using interspecies phylogenies to test macroevolutionary hypotheses. In: New Uses for New Phylogenies. Oxford University Press; 1996. p. 153–168.
3. Blum MG, François O. On statistical tests of phylogenetic tree imbalance: the Sackin and other indices revisited. Mathematical biosciences. 2005;195(2):141–153. doi: 10.1016/j.mbs.2005.03.003 15893336
4. Felsenstein J. Inferring phylogenies. 2nd ed. Sinauer Associates Sunderland; 2003.
5. Shao KT. Tree balance. Systematic Zoology. 1990;39(3):266–276. doi: 10.2307/2992186
6. Kirkpatrick M, Slatkin M. Searching for evolutionary patterns in the shape of a phylogenetic tree. Evolution. 1993;47(4):1171–1181. doi: 10.1111/j.1558-5646.1993.tb02144.x
7. Aldous DJ. Stochastic models and descriptive statistics for phylogenetic trees, from Yule to today. Statistical Science. 2001; p. 23–34. doi: 10.1214/ss/998929474
8. Blum MG, François O. Which random processes describe the tree of life? A large-scale study of phylogenetic tree imbalance. Systematic Biology. 2006;55(4):685–691. doi: 10.1080/10635150600889625 16969944
9. Mooers AO, Heard SB. Inferring evolutionary process from phylogenetic tree shape. Quarterly Review of Biology. 1997; p. 31–54. doi: 10.1086/419657
10. Pompei S, Loreto V, Tria F. Phylogenetic properties of RNA viruses. PLoS One. 2012;7(9):e44849. doi: 10.1371/journal.pone.0044849 23028645
11. Stich M, Manrubia S. Topological properties of phylogenetic trees in evolutionary models. The European Physical Journal B. 2009;70(4):583–592. doi: 10.1140/epjb/e2009-00254-8
12. Sackin MJ. “Good” and “Bad” Phenograms. Systematic Zoology. 1972;21(2):225–226. doi: 10.2307/2412292
13. Colless DH. Relative symmetry of cladograms and phenograms: an experimental study. Systematic Biology. 1995;. doi: 10.2307/2413487
14. Agapow PM, Purvis A. Power of eight tree shape statistics to detect nonrandom diversification: a comparison by simulation of two models of cladogenesis. Systematic Biology. 2002;51(6):866–872. doi: 10.1080/10635150290102564 12554452
15. Purvis A, Katzourakis A, Agapow PM. Evaluating phylogenetic tree shape: two modifications to Fusco & Cronk’s method. Journal of Theoretical Biology. 2002;214(1):99–103. doi: 10.1006/jtbi.2001.2443 11786035
16. Purvis A, Agapow PM. Phylogeny imbalance: taxonomic level matters. Systematic Biology. 2002;51(6):844–854. doi: 10.1080/10635150290102546 12554450
17. Fusco G, Cronk QC. A new method for evaluating the shape of large phylogenies. Journal of Theoretical Biology. 1995;175(2):235–243. doi: 10.1006/jtbi.1995.0136
18. McKenzie A, Steel M. Distributions of cherries for two models of trees. Mathematical Biosciences. 2000;164(1):81–92. doi: 10.1016/s0025-5564(99)00060-7 10704639
19. Harding E. The probabilities of rooted tree-shapes generated by random bifurcation. Advances in Applied Probability. 1971;3(1):44–77. doi: 10.2307/1426329
20. Udny Yule G. A mathematical theory of evolution, based on the conclusions of Dr. JC Willis, F. R. S. Philosophical Transactions of the Royal Society of London Series B. 1925;213:21–87. doi: 10.1098/rstb.1925.0002
21. Rogers JS. Response of Colless’s Tree Imbalance to Number of Terminal Taxa. Systematic Biology. 1993;42(1):102–105. doi: 10.1093/sysbio/42.1.102
22. Mir A, Rosselló F, Rotger L. A new balance index for phylogenetic trees. Mathematical Biosciences. 2013;241(1):125–136. doi: 10.1016/j.mbs.2012.10.005 23142312
23. Colijn C, Gardy J. Phylogenetic tree shapes resolve disease transmission patterns. Evol Med Public Health. 2014; p. 96–108. doi: 10.1093/emph/eou018 24916411
24. Leventhal GE, Kouyos R, Stadler T, Von Wyl V, Yerly S, Böni J, et al. Inferring epidemic contact structure from phylogenetic trees. PLoS computational biology. 2012;8(3):e1002413. doi: 10.1371/journal.pcbi.1002413 22412361
25. Frost SD, Volz EM. Modelling tree shape and structure in viral phylodynamics. Phil Trans R Soc B. 2013;368(1614):20120208. doi: 10.1098/rstb.2012.0208 23382430
26. Neher RA, Russell CA, Shraiman BI. Predicting evolution from the shape of genealogical trees. Elife. 2014;3:e03568. doi: 10.7554/eLife.03568
27. Hayati M, Biller P, Colijn C. Predicting the short-term success of human influenza A variants with machine learning. bioRxiv. 2019;.
28. Matsen FA. A Geometric Approach to Tree Shape Statistics. Systematic Biology. 2006;55(4):652–661. doi: 10.1080/10635150600889617 16969941
29. Durbin R, Eddy SR, Krogh A, Mitchison G. Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge University Press; 1998.
30. DasGupta B, He X, Jiang T, Li M, Tromp J, Zhang L. On computing the nearest neighbor interchange distance. In: Discrete Mathematical Problems with Medical Applications. vol. 55. American Mathematical Soc.; 2000. p. 125–143.
31. Schliep KP. Phangorn: phylogenetic analysis in R. Bioinformatics. 2011;27(4):592–593. doi: 10.1093/bioinformatics/btq706 21169378
32. R Development Core Team. R: A Language and Environment for Statistical Computing; 2008.
33. Csardi G, Nepusz T. The igraph software package for complex network research. InterJournal. 2006;Complex Systems:1695.
34. Brown C. hash: Full feature implementation of hash/associated arrays/dictionaries; 2013. Available from: https://CRAN.R-project.org/package=hash.
35. Bortolussi N, Durand E, Blum M, François O. apTreeshape: Analyses of Phylogenetic Treeshape; 2012. Available from: https://CRAN.R-project.org/package=apTreeshape.
36. Qiu Y, Mei J. RSpectra: Solvers for Large-Scale Eigenvalue and SVD Problems. R package version 0.15-0. 2019. Available from: https://CRAN.R-project.org/package=RSpectra.
37. Chasalow S. combinat: combinatorics utilities; 2012. Available from: https://CRAN.R-project.org/package=combinat.
38. Paradis E, Claude J, Strimmer K. APE: Analyses of Phylogenetics and Evolution in R language. Bioinformatics. 2004;20:289–290. doi: 10.1093/bioinformatics/btg412 14734327
39. Guattery S, Miller GL. Graph embeddings and Laplacian eigenvalues. SIAM Journal on Matrix Analysis and Applications. 2000;21(3):703–723. doi: 10.1137/S0895479897329825
40. Golub GH, Van Loan CF. Matrix computations. 3rd ed. JHU Press; 2012.
41. Fiedler M. Algebraic connectivity of graphs. Czechoslovak Mathematical Journal. 1973;23(2):298–305.
Článek vyšel v časopise
PLOS One
2019 Číslo 11
- Jak a kdy u celiakie začíná reakce na lepek? Možnou odpověď poodkryla čerstvá kanadská studie
- Pomůže v budoucnu s triáží na pohotovostech umělá inteligence?
- Spermie, vajíčka a mozky – „jednohubky“ z výzkumu 2024/38
- Metamizol jako analgetikum první volby: kdy, pro koho, jak a proč?
- Infekce se v Americe po příjezdu Kolumba šířily nesrovnatelně déle, než se traduje
Nejčtenější v tomto čísle
- A daily diary study on maladaptive daydreaming, mind wandering, and sleep disturbances: Examining within-person and between-persons relations
- A 3’ UTR SNP rs885863, a cis-eQTL for the circadian gene VIPR2 and lincRNA 689, is associated with opioid addiction
- A substitution mutation in a conserved domain of mammalian acetate-dependent acetyl CoA synthetase 2 results in destabilized protein and impaired HIF-2 signaling
- Molecular validation of clinical Pantoea isolates identified by MALDI-TOF
Zvyšte si kvalifikaci online z pohodlí domova
Všechny kurzy