Repository logo
 
Publication

Data-specific substitution models improve protein-based phylogenetics

dc.contributor.authorBrazão, João
dc.contributor.authorFoster, Peter G.
dc.contributor.authorJ. Cox, Cymon
dc.date.accessioned2023-12-18T10:41:19Z
dc.date.available2023-12-18T10:41:19Z
dc.date.issued2023-08
dc.description.abstractCalculating amino-acid substitution models that are specific for individual protein data sets is often difficult due to the computational burden of estimating large numbers of rate parameters. In this study, we tested the computational efficiency and accuracy of five methods used to estimate substitution models, namely Codeml, FastMG, IQ-TREE, P4 (maximum likelihood), and P4 (Bayesian inference). Data-specific substitution models were estimated from simulated alignments (with different lengths) that were generated from a known simulation model and simulation tree. Each of the resulting data-specific substitution models was used to calculate the maximum likelihood score of the simulation tree and simulated data that was used to calculate the model, and compared with the maximum likelihood scores of the known simulation model and simulation tree on the same simulated data. Additionally, the commonly-used empirical models, cpREV and WAG, were assessed similarly. Data-specific models performed better than the empirical models, which under-fitted the simulated alignments, had the highest difference to the simulation model maximum-likelihood score, clustered further from the simulation model in principal component analysis ordination, and inferred less accurate trees. Data-specific models and the simulation model shared statistically indistinguishable maximum-likelihood scores, indicating that the five methods were reasonably accurate at estimating substitution models by this measure. Nevertheless, tree statistics showed differences between optimal maximum likelihood trees. Unlike other model estimating methods, trees inferred using data-specific models generated with IQ-TREE and P4 (maximum likelihood) were not significantly different from the trees derived from the simulation model in each analysis, indicating that these two methods alone were the most accurate at estimating data-specific models. To show the benefits of using data-specific protein models several published data sets were reanalysed using IQ-TREE-estimated models. These newly estimated models were a better fit to the data than the empirical models that were used by the original authors, often inferred longer trees, and resulted in different tree topologies in more than half of the re-analysed data sets. The results of this study show that software availability and high computation burden are not limitations to generating better-fitting data-specific amino-acid substitution models for phylogenetic analyses.pt_PT
dc.description.sponsorshipCRESC ALG-01-0145-FEDER-022121, ALG-01-0145-FEDER-022231pt_PT
dc.description.versioninfo:eu-repo/semantics/publishedVersionpt_PT
dc.identifier.doi10.7717/peerj.15716pt_PT
dc.identifier.issn2167-8359
dc.identifier.urihttp://hdl.handle.net/10400.1/20245
dc.language.isoengpt_PT
dc.peerreviewedyespt_PT
dc.publisherPEERJpt_PT
dc.relationAlgarve Centre for Marine Sciences
dc.relationAlgarve Centre for Marine Sciences
dc.relationCentre for Marine and Environmental Research
dc.relationApplication of a novel amino-acid substitution model to determining land plant evolution
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/pt_PT
dc.subjectAmino-acid substitution modelspt_PT
dc.subjectPhylogeneticspt_PT
dc.subjectModel estimationpt_PT
dc.subjectProtein evolutionpt_PT
dc.subjectData-specific modelspt_PT
dc.titleData-specific substitution models improve protein-based phylogeneticspt_PT
dc.typejournal article
dspace.entity.typePublication
oaire.awardTitleAlgarve Centre for Marine Sciences
oaire.awardTitleAlgarve Centre for Marine Sciences
oaire.awardTitleCentre for Marine and Environmental Research
oaire.awardTitleApplication of a novel amino-acid substitution model to determining land plant evolution
oaire.awardURIinfo:eu-repo/grantAgreement/FCT/6817 - DCRRNI ID/UIDB%2F04326%2F2020/PT
oaire.awardURIinfo:eu-repo/grantAgreement/FCT/6817 - DCRRNI ID/UIDP%2F04326%2F2020/PT
oaire.awardURIinfo:eu-repo/grantAgreement/FCT/6817 - DCRRNI ID/LA%2FP%2F0101%2F2020/PT
oaire.awardURIinfo:eu-repo/grantAgreement/FCT//SFRH%2FBD%2F134422%2F2017/PT
oaire.citation.startPagee15716pt_PT
oaire.citation.titlePeerJpt_PT
oaire.citation.volume11pt_PT
oaire.fundingStream6817 - DCRRNI ID
oaire.fundingStream6817 - DCRRNI ID
oaire.fundingStream6817 - DCRRNI ID
person.familyNameBrazão
person.familyNameCox
person.givenNameJoão
person.givenNameCymon
person.identifier.ciencia-id3D16-F5B7-F333
person.identifier.ciencia-id6B15-9771-1D04
person.identifier.orcid0000-0003-0212-3023
person.identifier.orcid0000-0002-4927-979X
person.identifier.ridD-1303-2012
person.identifier.scopus-author-id7402112716
project.funder.identifierhttp://doi.org/10.13039/501100001871
project.funder.identifierhttp://doi.org/10.13039/501100001871
project.funder.identifierhttp://doi.org/10.13039/501100001871
project.funder.identifierhttp://doi.org/10.13039/501100001871
project.funder.nameFundação para a Ciência e a Tecnologia
project.funder.nameFundação para a Ciência e a Tecnologia
project.funder.nameFundação para a Ciência e a Tecnologia
project.funder.nameFundação para a Ciência e a Tecnologia
rcaap.rightsopenAccesspt_PT
rcaap.typearticlept_PT
relation.isAuthorOfPublication33c67880-e6b8-4711-9313-2d34cdd2e246
relation.isAuthorOfPublication82c3689c-60b6-440d-9d7b-49e6dbd6861b
relation.isAuthorOfPublication.latestForDiscovery33c67880-e6b8-4711-9313-2d34cdd2e246
relation.isProjectOfPublicationfafa76a6-2cd2-4a6d-a3c9-772f34d3b91f
relation.isProjectOfPublication15f91d45-e070-47d8-b6b8-efd4de31d9a8
relation.isProjectOfPublication794d4c77-c731-471e-bc96-5a41dcd3d872
relation.isProjectOfPublication1a0e4b7e-880f-45a0-8983-2e6e8af21ea5
relation.isProjectOfPublication.latestForDiscovery794d4c77-c731-471e-bc96-5a41dcd3d872

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
peerj-15716.pdf
Size:
1.24 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
3.46 KB
Format:
Item-specific license agreed upon to submission
Description: