============================== Slicing and dicing with k-mers ============================== (Note, this won't work with amplified data.) Extra resources: * `plotting notebook `__ --- At the command line, create a new directory and extract some data:: cd /mnt mkdir slice cd slice We're going to work with half the read data set for speed reasons -- :: gunzip -c ../mapping/SRR1976948.abundtrim.subset.pe.fq.gz | \ head -6000000 > SRR1976948.half.fq In a Jupyter Notebook (go to 'http://' + machine name + ':8000'), password 'davis', create new Python notebook "conda root", run:: cd /mnt/slice and then in another cell:: !load-into-counting.py -M 4e9 -k 31 SRR1976948.kh SRR1976948.half.fq and in another cell:: !abundance-dist.py SRR1976948.kh SRR1976948.half.fq SRR1976948.dist and in yet another cell:: %matplotlib inline import numpy from pylab import * dist1 = numpy.loadtxt('SRR1976948.dist', skiprows=1, delimiter=',') plot(dist1[:,0], dist1[:,1]) axis(ymax=10000, xmax=1000) Then:: python2 ~/khmer/sandbox/calc-median-distribution.py SRR1976948.kh \ SRR1976948.half.fq SRR1976948.readdist And:: python2 ~/khmer/sandbox/slice-reads-by-coverage.py SRR1976948.kh SRR1976948.half.fq slice.fq -m 0 -M 60 Assemble the slice ------------------ (Re)install megahit:: cd git clone https://github.com/voutcn/megahit.git cd megahit make Go back to the slice directory and extract paired ends:: cd /mnt/slice extract-paired-ends.py slice.fq Assemble! :: ~/megahit/megahit --12 slice.fq.pe -o slice The contigs will be in ``slice/final.contigs.fa``.