読者です 読者をやめる 読者になる 読者になる

CORDEA blog

Programming及びFedora21等のLinux OSのことが多めです。

Pythonで行/列名付きのmatrixを出力するいくつかの方法

awk Python

正攻法, 変わり種など4つ。

  • 2014/11/12 追記

 方法3に問題が合ったので修正いたしました。
 それと方法4を追加。

 

今回出力したいのはこのようなマトリクス

	Ile	Val	Leu	Phe	Cys	Met	Ala	Gly	Thr	Trp	Ser	Tyr	Pro	His	Asp	Glu	Asn	Gln	Lys	Arg
Ile	IleIle	IleVal	IleLeu	IlePhe	IleCys	IleMet	IleAla	IleGly	IleThr	IleTrp	IleSer	IleTyr	IlePro	IleHis	IleAsp	IleGlu	IleAsn	IleGln	IleLys	IleArg
Val	ValIle	ValVal	ValLeu	ValPhe	ValCys	ValMet	ValAla	ValGly	ValThr	ValTrp	ValSer	ValTyr	ValPro	ValHis	ValAsp	ValGlu	ValAsn	ValGln	ValLys	ValArg
Leu	LeuIle	LeuVal	LeuLeu	LeuPhe	LeuCys	LeuMet	LeuAla	LeuGly	LeuThr	LeuTrp	LeuSer	LeuTyr	LeuPro	LeuHis	LeuAsp	LeuGlu	LeuAsn	LeuGln	LeuLys	LeuArg
Phe	PheIle	PheVal	PheLeu	PhePhe	PheCys	PheMet	PheAla	PheGly	PheThr	PheTrp	PheSer	PheTyr	PhePro	PheHis	PheAsp	PheGlu	PheAsn	PheGln	PheLys	PheArg
Cys	CysIle	CysVal	CysLeu	CysPhe	CysCys	CysMet	CysAla	CysGly	CysThr	CysTrp	CysSer	CysTyr	CysPro	CysHis	CysAsp	CysGlu	CysAsn	CysGln	CysLys	CysArg
Met	MetIle	MetVal	MetLeu	MetPhe	MetCys	MetMet	MetAla	MetGly	MetThr	MetTrp	MetSer	MetTyr	MetPro	MetHis	MetAsp	MetGlu	MetAsn	MetGln	MetLys	MetArg
Ala	AlaIle	AlaVal	AlaLeu	AlaPhe	AlaCys	AlaMet	AlaAla	AlaGly	AlaThr	AlaTrp	AlaSer	AlaTyr	AlaPro	AlaHis	AlaAsp	AlaGlu	AlaAsn	AlaGln	AlaLys	AlaArg
Gly	GlyIle	GlyVal	GlyLeu	GlyPhe	GlyCys	GlyMet	GlyAla	GlyGly	GlyThr	GlyTrp	GlySer	GlyTyr	GlyPro	GlyHis	GlyAsp	GlyGlu	GlyAsn	GlyGln	GlyLys	GlyArg
Thr	ThrIle	ThrVal	ThrLeu	ThrPhe	ThrCys	ThrMet	ThrAla	ThrGly	ThrThr	ThrTrp	ThrSer	ThrTyr	ThrPro	ThrHis	ThrAsp	ThrGlu	ThrAsn	ThrGln	ThrLys	ThrArg
Trp	TrpIle	TrpVal	TrpLeu	TrpPhe	TrpCys	TrpMet	TrpAla	TrpGly	TrpThr	TrpTrp	TrpSer	TrpTyr	TrpPro	TrpHis	TrpAsp	TrpGlu	TrpAsn	TrpGln	TrpLys	TrpArg
Ser	SerIle	SerVal	SerLeu	SerPhe	SerCys	SerMet	SerAla	SerGly	SerThr	SerTrp	SerSer	SerTyr	SerPro	SerHis	SerAsp	SerGlu	SerAsn	SerGln	SerLys	SerArg
Tyr	TyrIle	TyrVal	TyrLeu	TyrPhe	TyrCys	TyrMet	TyrAla	TyrGly	TyrThr	TyrTrp	TyrSer	TyrTyr	TyrPro	TyrHis	TyrAsp	TyrGlu	TyrAsn	TyrGln	TyrLys	TyrArg
Pro	ProIle	ProVal	ProLeu	ProPhe	ProCys	ProMet	ProAla	ProGly	ProThr	ProTrp	ProSer	ProTyr	ProPro	ProHis	ProAsp	ProGlu	ProAsn	ProGln	ProLys	ProArg
His	HisIle	HisVal	HisLeu	HisPhe	HisCys	HisMet	HisAla	HisGly	HisThr	HisTrp	HisSer	HisTyr	HisPro	HisHis	HisAsp	HisGlu	HisAsn	HisGln	HisLys	HisArg
Asp	AspIle	AspVal	AspLeu	AspPhe	AspCys	AspMet	AspAla	AspGly	AspThr	AspTrp	AspSer	AspTyr	AspPro	AspHis	AspAsp	AspGlu	AspAsn	AspGln	AspLys	AspArg
Glu	GluIle	GluVal	GluLeu	GluPhe	GluCys	GluMet	GluAla	GluGly	GluThr	GluTrp	GluSer	GluTyr	GluPro	GluHis	GluAsp	GluGlu	GluAsn	GluGln	GluLys	GluArg
Asn	AsnIle	AsnVal	AsnLeu	AsnPhe	AsnCys	AsnMet	AsnAla	AsnGly	AsnThr	AsnTrp	AsnSer	AsnTyr	AsnPro	AsnHis	AsnAsp	AsnGlu	AsnAsn	AsnGln	AsnLys	AsnArg
Gln	GlnIle	GlnVal	GlnLeu	GlnPhe	GlnCys	GlnMet	GlnAla	GlnGly	GlnThr	GlnTrp	GlnSer	GlnTyr	GlnPro	GlnHis	GlnAsp	GlnGlu	GlnAsn	GlnGln	GlnLys	GlnArg
Lys	LysIle	LysVal	LysLeu	LysPhe	LysCys	LysMet	LysAla	LysGly	LysThr	LysTrp	LysSer	LysTyr	LysPro	LysHis	LysAsp	LysGlu	LysAsn	LysGln	LysLys	LysArg
Arg	ArgIle	ArgVal	ArgLeu	ArgPhe	ArgCys	ArgMet	ArgAla	ArgGly	ArgThr	ArgTrp	ArgSer	ArgTyr	ArgPro	ArgHis	ArgAsp	ArgGlu	ArgAsn	ArgGln	ArgLys	ArgArg


 

方法1: 大人しくnumpy使う


正攻法 (numpy入ってる人にとっては)。

import numpy as np

Aminos = ["Ile", "Val", "Leu", "Phe", "Cys", "Met", "Ala", "Gly", "Thr", "Trp", "Ser", "Tyr", "Pro", "His", "Asp", "Glu", "Asn", "Gln", "Lys", "Arg"]

data = [[a1+a2 for a2 in Aminos] for a1 in Aminos]
with open("matrix.txt", "w") as f:
        f.write("\t" + "\t".join(Aminos) + "\n")
        np.savetxt(f, np.hstack([zip(Aminos), data]), fmt='%s', delimiter="\t")

 

方法2: とりあえずprintしてawkに投げる


awkに馴染みがあれば、まぁ楽といえば楽か

Aminos = ["Ile", "Val", "Leu", "Phe", "Cys", "Met", "Ala", "Gly", "Thr", "Trp", "Ser", "Tyr", "Pro", "His", "Asp", "Glu", "Asn", "Gln", "Lys", "Arg"]
print
for a1 in Aminos:
    print a1
for a1 in Aminos:
    print a1
    for a2 in Aminos:
        print a1+a2
% python matrix_test.py | awk 'BEGIN{c=1}{if(c%21 == 0){print $0}else{printf $0"\t"};c++}' > matrix.txt

方法3: printによる出力


見た目上は出来てるが実際は出来てない方法、とその解決方法。

Aminos = ["Ile", "Val", "Leu", "Phe", "Cys", "Met", "Ala", "Gly", "Thr", "Trp", "Ser", "Tyr", "Pro", "His", "Asp", "Glu", "Asn", "Gln", "Lys", "Arg"]

print "\t",
for a1 in Aminos:
    if Aminos[-1] == a1:
        print "%s\n" % a1,
    else:
        print "%s\t" % a1,
for a1 in Aminos:
    print "%s\t" % a1,
    for a2 in Aminos:
        if Aminos[-1] == a2:
            print "%s\n" % (a1+a2),
        else:
            print "%s\t" % (a1+a2),
問題点1

なぜ問題かは次のようにすれば分かる

for i in range(10):
    print "#",
% python print_test.py
# # # # # # # # # #

つまりprint,には改行コードの代わりに空白が入る。
これでは見かけは出来ていても再利用する際に空白とタブの混在でsplitが面倒になる。

これはPython3系であれば

for i in range(10):
    print("#", end="")

Python2系の場合は

import sys
for i in range(10):
    sys.stdout.write("#")

とすることで解消できる。

 

問題点2

今回の場合は出来ているが、listは重複を許すので

for a1 in Aminos:
    if Aminos[-1] == a1:
        print "%s\n" % a1,
    else:
        print "%s\t" % a1,

ではなく

for i in range(len(Aminos)):
    if len(Aminos) == i+1:
        print "%s\n" % Aminos[i],
    else:
        print "%s\t" % Aminos[i],

の方が確実かと。

方法4: 普通にwrite

Aminos = ["Ile", "Val", "Leu", "Phe", "Cys", "Met", "Ala", "Gly", "Thr", "Trp", "Ser", "Tyr", "Pro", "His", "Asp", "Glu", "Asn", "Gln", "Lys", "Arg"]

with open("matrix.txt", "w") as f:
    f.write("\t")
    for i in range(len(Aminos)):
        if len(Aminos) == i+1:
            f.write("%s\n" % Aminos[i])
        else:
            f.write("%s\t" % Aminos[i])
    for j in range(len(Aminos)):
        f.write("%s\t" % Aminos[j])
        for i in range(len(Aminos)):
            if len(Aminos) == i+1:
                f.write("%s\n" % (Aminos[j]+Aminos[i]))
            else:
                f.write("%s\t" % (Aminos[j]+Aminos[i]))

 

参考


How to print in Python without newline or space? - Stack Overflow