Published at 2020-06-26 10:05
Author:zhixy
View:552
为消除序列组成异质性对系统发育重建的影响,除了采用异构模型之外,还可以通过对序列重编码在一定程度上消除其影响。
对DNA序列来说,一种常见的重编码为RY法:
RY编码可以去除数据中的组成异质性,但它也可以去除信息丰富的转换型碱基替代。
对蛋白序列来说,常见的编码有根据氨基酸物理化学属性差异而来的Dayhoff6法:
在此基础上,还可将F,Y,W,I,L,M,V合为一组,将C视为missing data,这样就可以用沿用DNA的编码方式:
此外,根据氨基酸的疏水性/极性,还可编码为两组(hp):
可实现序列重编码的有PhyloBayes (串行单核版本,通过-recode
参数实现) 和P4 (见以下示例)。
(base) [user@server ~]# p4
p4 v 1.3.0 [2018-07-28], 28 July, 2018
usage:
p4
or
p4 [-i] [-x] [-d] [yourScriptOrDataFile] [anotherScriptOrDataFile ...]
or
p4 --help
p4 is a Python package for phylogenetics.
p4 is also the name of a Python script that loads the p4 package.
There is documentation at http://p4.nhm.ac.uk
Using the p4 script, after reading in the (optional) files on the
command line, p4 goes interactive unless one of the files on the
command line is a Python script. Use the -i option if you want to go
interactive even if you are running a script. Use the -x option to
force exit, even if there was no Python script read. If you use the
-d option, then p4 draws any trees that are read in on the command
line, and then exits.
Peter Foster
The Natural History Museum, London
p.foster@nhm.ac.uk
(Control-d to quit.)
p4> read("example.phylip")
p4> aln = var.alignments[0]
p4> aln.recodeDayhoff() # 根据Dayhoff6规则重编码,此时aln可用于后续的系统发育分析。
p4> aln.writePhylip('example_recoded.phy')
(base) [user@server ~]# head -n 20 example_recoded.phy
10 599
Aeropyrum0 25432462253223412512324331224624535242555246236564
Arabidopsi 25232452253445413215235231224626535242564526542552
Archaeoglo 25532452553225414212242231224624535222554526245524
Candida_al 25332452253436413212335231224624535242564526524544
Chytridiom ------------------------31224624535242564326522544
Cryptococc 25532452253436413215325231224624535242564326554544
Cryptospor 25332552223645513214535421224624535242564526522554
Dictyostel 25532252253423413212225231224624535242564526532554
Drosophila 25532432553422413212235231224624535242564526524254
Encephalit 25535452223456513212233621224624535242564526524544
55322145125423424236552255552252542242523554333355
55415163126324624236555355252222542243525554433645
55322123125324224236655255252252542244553554533645
21325154126333624236555255252222542264523554233645
55325156125323624236555255252223542264523554233645
55315162125323624236555255252222542264523554633645
55415162126325413226653255252222542254533554533645
55415164125323624236552255252222542244523554233645
Dayhoff, M.O.; Schwartz, R.M. A Model of Evolutionary Change in Proteins. In Atlas of Protein Sequence and Structure; National Biomedical Research Foundation: Washington, DC, USA, 1978.