2014-10-28

docker run ...のオプション指定が面倒になったのでコマンド作った

Python

はじめに

user名とかを使いまわす人向けです。

最近dockerが楽しくて一日中弄ってます。

　
ただ、毎回思うのがオプション指定が多くて面倒

docker run -it -u cordea -w /home/cordea --name hoge hoge/hoge /usr/bin/zsh

　
/usr/bin/zsh に関してはDockerfileで指定すれば問題はない

ただ user と working directory はどうしたものやら
調べてもそもそも指定している人が少ない

というわけでpythonで簡単に書いてみました。

詳しいことはGitHub(下にリンクが貼ってあります)のREADME.mdを見て下さい。
　

使用方法

導入方法等はGitHubで。
ただ、調べればREADME.mdより詳しく書いてあるところがあるかと思います。

　
使用前

docker run -it -u cordea -w /home/cordea --name hoge hoge/hoge /usr/bin/zsh

使用後

dockerun hoge/hoge

もちろん以下のように一部のオプションだけ変更したり、もともとrunコマンドにあるオプションを使用するのも可能です。

dockerun -w / --name huge hoge/hoge

ただ、他のオプションを指定する場合はoptparseの仕様で"-- -a"のようにしないといけないみたいです。
まぁ、他のオプションを多用するならdockerunを書き換えるか、docker runを使うかするのが楽かと思いますが...

　
急ごしらえで作ったものでバグも多いかと思います。

バグを見つけたらご自身で直していただくか、ご連絡頂けますと幸いです。

　
探せば似たようなのはありそうだけどとりあえず満足。

GitHub

<a href="https://github.com/CORDEA/DockerFiles">CORDEA/DockerFiles</a>
CORDEA/DockerFiles · GitHub

2014-10-12

echo ${array[0]} ...それ大丈夫？

echo ${array[0]}

で出力される結果がshellによっては必ずしも望んだものにはならないという話

要するに配列のインデックスの開始がshellの種類によって0だったり1だったりするということです。
複数のshellを使う人には当たり前の知識かもしれませんが、私には新鮮だったのでメモ。

sh

sh-3.2$ array=(0 1 2 3)
sh-3.2$ echo ${array[*]}
0 1 2 3
sh-3.2$ echo ${array[0]}
0
sh-3.2$ echo ${array[1]}
1

bash

CORDEA@macrou:~$ array=(0 1 2 3)
CORDEA@macrou:~$ echo ${array[*]}
0 1 2 3
CORDEA@macrou:~$ echo ${array[0]}
0
CORDEA@macrou:~$ echo ${array[1]}
1

zsh

CORDEA@macrou% array=(0 1 2 3)
CORDEA@macrou% echo ${array[*]}
0 1 2 3
CORDEA@macrou% echo ${array[0]}

CORDEA@macrou% echo ${array[1]}
0

csh

[CORDEA@macrou ~]$ set array = ( 0 1 2 3 )
[CORDEA@macrou ~]$ echo ${array[*]}
0 1 2 3
[CORDEA@macrou ~]$ echo ${array[0]}

[CORDEA@macrou ~]$ echo ${array[1]}
0

tcsh

[CORDEA@macrou ~]$ set array = ( 0 1 2 3 )
[CORDEA@macrou ~]$ echo ${array[*]}
0 1 2 3
[CORDEA@macrou ~]$ echo ${array[0]}

[CORDEA@macrou ~]$ echo ${array[1]}
0

GC含量を計算する

Bash

はじめに

multifasta形式のファイルを投げるとGC含量を計算します。
ATGC以外の文字(Nとか) が入った場合の動作は保証していません。

　
動作確認にはNucleic and/or Amino Acid contentsを使用させていただきました。

2018/7/16 追記

今でもたまに参照されているようなので、少し修正を加えたものを以下に置いておきました

github.com

Mathematica (Wolfram)

fasta := URLDownload["http://example.com/hoge.fasta"]
lines = Import[fasta, {"FASTA", "LabeledData"}]
gcContent = Map[Module[{map = Counts[Characters[Last[#]]]}, First[#] -> Interpreter["PercentFraction"][N[(map["G"] + map["C"]) / Total[map]]]] &, lines]
Do[Echo[Last[x], First[x] <> ": "], {x, prc}]

Python

使いかた

python gc.py sequence.ffn

コード

#!/bin/env python
# encoding:utf-8
#
# Author:   Yoshihiro Tanaka
# Created:  2014-10-09
#
import sys
infile = open(sys.argv[1], "r")
lines = infile.readlines()
infile.close()
atgc = ['A', 'T', 'G', 'C']
fflag = False
lst = []
for l in range(len(lines)):
    line = lines[l]
    if ">" in line:
        if fflag:
            sys.stdout.write(seq)
            print("gc%: " + str( (lst[2]+lst[3]) / float(sum(lst)) ))
            lst = []
        seq = line
        fflag = True
    else:
        for i in range(len(atgc)):
            try:
                lst[i] += line.count(atgc[i])
            except:
                lst.append(line.count(atgc[i]))
sys.stdout.write(seq)
print("gc%: " + str( (lst[2]+lst[3]) / float(sum(lst)) ))

awk

awk version 20070501
　
動作確認はMac OS Xのみ

使い方

awk -f gc.awk sequence.ffn

コード

#!/bin/awk -f
# encoding:utf-8
#
# Author:   Yoshihiro Tanaka
# Created:  2014-10-26
#

function calc_gc(atgc) {
    sum = 0;
    for (i in atgc) {
        sum += atgc[i]
    }
    gc = ((atgc["G"] + atgc["C"]) / sum)*100
    return gc
}
function plus_atgc(atgc) {
    if (substr($0, i, 1) in atgc) {
        return substr($0, i, 1)
    }
}
{
    if ($0~/^>/) {
        if (length(atgc) > 0) {
            printf("%s%0.2f\n","gc%: ",calc_gc(atgc))
        }
        print $0
        for (i=1; i<=4; ++i) {
            atgc[substr("ATGC", i, 1)] = 0
        }
    }
    else {
        for (i=1; i<=length($0); ++i) {
            ++atgc[plus_atgc(atgc)]
        }
    }
}
END {
    printf("%s%0.2f\n","gc%: ",calc_gc(atgc))
}

【awk】ファイルを複数の区切り文字を使って出力する場合

awk

最近小ネタばっかりですが

こんなファイルがあったとして

0,1,2,3,4,5,6,7,8,9
a,b,c,d,e,f,g,h,i,j

このような感じで出力したい場合

0	1	2	3	4:5:6:7:8:9
a	b	c	d	e:f:g:h:i:j

#!/bin/awk -f

BEGIN {
    FS=","
}
{
    for(i=1; i<=NF; i++) {
        if (i>5) {
            if (i==NF) {
                print $i
            } else {
                printf $i ":"
            }
        } else {
            printf $i "\t"
        }
    }
}

もう少しきれいに書けないものか...

2014-09-29

matplotlibのPolar chartで重ね順を指定する

Python

棒グラフや凡例の重なり順についてはStack Overflowにかかれていますが、Polar chartの重なり順については慣れていない人では少し戸惑うかもしれませんのでメモ。

matplotlibにおける重なり順の指定にはzorderを使用します。

こんな感じのものを作る時。
f:id:CORDEA:20140929143312p:plain,h200

きちんと重なり順を指定しないとこうなることがあります。
f:id:CORDEA:20140929143632p:plain,h200

set_zorderで大きい値に指定したものが表に出ます。

sample

この場合は内側からプロットされるため、zorderを順番に9, 8, 7, 6, 5... というように指定しています。

import numpy as np
import matplotlib.pyplot as plt

theta = np.tile(0, 10)
radii = np.arange(2.0, 22.0, 2.0)
width  = np.pi*2
colors = theta

ax = plt.subplot(111, polar=True)
bars = ax.bar(theta, radii, width=width, bottom=0.0)

c = 0
for r, bar in zip(radii, bars):
    # ここでzorderの指定
    bar.set_zorder(10-c)
    if c % 2 == 0 or c == 0:
        bar.set_facecolor('#000000')
    else:
        bar.set_facecolor('#ffffff')
    bar.set_alpha(1.0)
    c += 1

plt.show()

余談ですがmatplotlibでplotした画像の背景が透過されたpngで保存したい場合はsavefigを使います。

plt.savefig('***.png', transparent=True)

参考

<a href="http://stackoverflow.com/questions/16770049/strange-matplotlib-zorder-behavior-with-legend-and-errorbar">strange matplotlib zorder behavior with legend and errorbar</a>
<a href="http://stackoverflow.com/questions/22019789/matplotlib-zorder-of-elements-in-polar-plot-superimposed-on-cartesian-plot">matplotlib zorder of elements in polar plot superimposed on cartesian plot</a>
artists — Matplotlib 1.4.0 documentation

2014-09-23

matplotlibと円周率でロゴ的なもの

Python

Wolfman Alphaのブログ記事を見ていたらふと円周率を使って何か作りたくなったのでmatplotlibを使って書いてみた。

結果として出来たものはこんな感じ

f:id:CORDEA:20140923181237p:plain,h200

桁数に応じて距離が遠くなり、数字に応じて(0-9)円が大きくなります。
桁数が増えるほど円が大きくなるよう調整しているのは外側に行くほど空白が増えるのが少し気に入らなかったからです。

円形でよさ気なものが出来たのでロゴにでも使おうかなぁと思っております。
　

コード

Gistにもあります。

import numpy as np
import matplotlib.pyplot as plt

pi="14159265358979323846264338327950288419716939937510582097494459230781640628620899862803482534211706798214"

lst = []
ind = []
for n in range(len(pi)):
    lst.append(int(pi[n]))
    ind.append(int(n+1))

lst = np.array(lst)
ind = np.array(ind)

r      = 2 * ind
theta  = (np.pi * ind) / 10
area   = 120 * lst * (ind * 0.012)
colors = theta

ax = plt.subplot(111, polar=True)
c  = plt.scatter(theta, r, c=colors, s=area, cmap=plt.cm.hsv, edgecolors='#636363')
c.set_alpha(0.75)

ax.spines['polar'].set_visible(False)
ax.axes.get_xaxis().set_visible(False)
ax.axes.get_yaxis().set_visible(False)

plt.show()

参考

Introducing Tweet-a-Program—Wolfram|Alpha Blog
Pi - Wolfram|Alpha
pie_and_polar_charts example code: polar_scatter_demo.py — Matplotlib 1.4.0 documentation
Adding new scales and projections to matplotlib — Matplotlib 1.4.0 documentation
python - turn off axis border for polar matplotlib plot - Stack Overflow
python - Hiding axis text in matplotlib plots - Stack Overflow
Python scatter plot. Size and style of the marker - Stack Overflow

2014-08-19

CodernityDBのご紹介

Database Python

今回はpure pythonのDatabase、CodernityDBのご紹介です。
日本語の紹介も無さそうでしたので簡単に紹介させていただきます。
Documentationとして作成しようかと思ったんですがそこまで纏められなかったので...
間違っている部分などありましたらご指摘いただけますと幸いです。

CodernityDBとは

Opensource
Native Python database
fast
マルチプラットフォーム
スキーマレス
複合インデックス(Multiple indexes)

などの特徴を持つNoSQLデータベースです。
　
　

Indexについて

現在、CodernityDBには大きく分けて2つのIndexが実装されています。

Hash Index

利点

　　速い

欠点

　　レコードはInsert/Update/Deleteの順番には並びません。
　　出来る処理は　特定のキーを問い合わせる or 全てのキーの反復処理　のみ
　

B Plus Tree Index

利点

　　レコードが順番に並びます(キーに依存します)
　　範囲クエリを問合せできます　

欠点

　　Hash based indexesよりも遅いです

formatについて

CodernityDBには多くのformatがあります。

公式:Key-format及びフォーマット文字 - Python Documentationを確認して下さい。

基本的な使い方

CodernityDBの使い方は非常に単純です。

ここではInsert/Count/Update/Delete/Getの5つについて実行するための簡単なコードを書いておきます。

ここには書いていない処理について、もしくはもっと詳しく知りたいような場合には公式:Quick tutorialをご覧下さい。
　
　

Insert(simple)

from CodernityDB.database import Database

db = Database('/tmp/tut1')
db.create()

insertDict = {'x': 1}
print db.insert(insertDict)

この方法は最も単純なInsertですが、_id fieldが自動生成され、特定のレコードを検索することは出来ません。

Insert

from CodernityDB.database import Database
from CodernityDB.hash_index import HashIndex

class WithXIndex(HashIndex):
    def __init__(self, *args, **kwargs):
        kwargs['key_format'] = 'I'
        super(WithXIndex, self).__init__(*args, **kwargs)

    def make_key_value(self, data):
        a_val = data.get("x")
        if a_val is not None:
            return a_val, None
        return None

    def make_key(self, key):
        return key

db = Database('/tmp/tut2')
db.create()

x_ind = WithXIndex(db.path, 'x')
db.add_index(x_ind)

print db.insert({'x': 1})

Count

from CodernityDB.database import Database

db = Database('/tmp/tut1')
db.open()

print db.count(db.all, 'x')

Get

from CodernityDB.database import Database

db = Database('/tmp/tut2')
db.open()

print db.get('x', 1, with_doc=True)

Delete

from CodernityDB.database import Database

db = Database('/tmp/tut2')
db.open()

curr = db.get('x', 1, with_doc=True)
doc  = curr['doc']

db.delete(doc)

Update

from CodernityDB.database import Database

db = Database('/tmp/tut2')
db.create()

curr = db.get('x', 1, with_doc=True)
doc  = curr['doc']

doc['Updated'] = True
db.update(doc)

Tips

error

raise IndexConflict("Already exists")

db.create, db.add_indexで既にある場合に発生するエラーです。

struct.error: 'I' format requires 0 <= number <= 4292967295

indexが4GBを超えた時に発生するエラーです。
エラーが発生した場合には、formatを'Q'に変更したり、indexを変更するなどの対応が必要です。

もし、db.path/id_buckが4GBを超えている場合、それはあなたが作成したindexが4GBを超えているのではなく、CodernityDBがDefaultで作成するmain indexによるものです。この場合はdb.createにwith_id_index=Falseを指定することで解決できるかもしれません。これはmain indexを作成させないようにするオプションです。

ただし、with_id_index=Falseを指定した場合は、id indexをUniqueHashIndexを用いてformat 'Q'で作成するか、Sharded indexesを使用してシャーディングを行う必要があります。Sharded Indexでformat 'I'を用いる場合には、10 shardsで10*4GBのindexを持つことが出来ます。shardの数はsh_numで設定します。

UniqueHashIndexを使用した例

from CodernityDB.database import Database
from CodernityDB.hash_index import HashIndex, UniqueHashIndex

class BigIDIndex(UniqueHashIndex):
    def __init__(self, *args, **kwargs):
        kwargs['key_format'] = '<32s8sQIcQ'
        super(BigIDIndex, self).__init__(*args, **kwargs)

class MyIDIndex(HashIndex):
    def __init__(self, *args, **kwargs):
        kwargs['key_format'] = 'Q'
        super(MyIDIndex, self).__init__(*args, **kwargs)

    def make_key_value(self, data):
        a_val = data.get("x")
        if a_val is not None:
            return a_val, None
        return None

    def make_key(self, key):
        return key

db = Database('/tmp/tut1')
db.create(with_id_index=False)

db.add_index(BigIDIndex(db.path, 'id'))
db.add_index(MyIDIndex(db.path, 'x'))

db.insert({'x': 1})

ValueError: bad marshal data, TypeError: 'str' object does not support item assignmentが発生することを確認していますが、今のところ解決策を見つけられていません。
膨大な量のデータをInsertする場合にはSharded Indexを利用するか、データを分けることを検討した方が良いかもしれません。

Sharded Indexを使用した例

from CodernityDB.sharded_hash import ShardedUniqueHashIndex, ShardedHashIndex
from CodernityDB.tree_index import TreeBasedIndex

class CustomIdSharded(ShardedUniqueHashIndex):
    custom_header = 'from CodernityDB.sharded_hash import ShardedUniqueHashIndex'
    def __init__(self, *args, **kwargs):
        kwargs['sh_nums'] = 10
        super(CustomIdSharded, self).__init__(*args, **kwargs)

class TreeIndex(TreeBasedIndex):
    def __init__(self, *args, **kwargs):
        kwargs['node_capacity'] = 10
        kwargs['key_format'] = 'I'
        super(TreeIndex, self).__init__(*args, **kwargs)

    def make_key_value(self, data):
        t_val = data.get('x')
        if t_val is not None:
            return t_val, None
        return None

    def make_key(self, key):
        return key

db = Database('/tmp/tut1')
db.create(with_id_index=False)

db.add_index(CustomIdSharded(db.path, 'id'))
db.add_index(TreeIndex(db.path, 'x'))

db.insert({'x': 1})

このエラーについて、詳しくはこちらを参照して下さい。

参考

CodernityDB pure python, fast, NoSQL database — CodernityDB
codernity / CodernityDB / issues / #14 - id_stor not greater than 4G — Bitbucket
7.3. struct — Interpret strings as packed binary data — Python v2.7.8 documentation
CodernityDB, Pure Python NoSQL database - YouTube

はじめに

使用方法

sh

はじめに

2018/7/16 追記

Mathematica (Wolfram)

使いかた

コード

使い方

コード

使い方

コード

軽く説明

sample

参考

コード

参考

CodernityDBとは

Indexについて

Hash Index

B Plus Tree Index

formatについて

基本的な使い方

Insert(simple)

Insert

Count

Get

Delete

Update

Tips

raise IndexConflict("Already exists")

struct.error: 'I' format requires 0 <= number <= 4292967295

参考