Slicing and dicing with k-mers

(Note, this won’t work with amplified data.)

Extra resources:

At the command line, create a new directory and extract some data:

cd /mnt
mkdir slice
cd slice

We’re going to work with half the read data set for speed reasons –

gunzip -c ../mapping/ | \
   head -6000000 > SRR1976948.half.fq

In a Jupyter Notebook (go to ‘http://‘ + machine name + ‘:8000’), password ‘davis’, create new Python notebook “conda root”, run:

cd /mnt/slice

and then in another cell:

! -M 4e9 -k 31 SRR1976948.half.fq

and in another cell:

! SRR1976948.half.fq SRR1976948.dist

and in yet another cell:

%matplotlib inline
import numpy
from pylab import *
dist1 = numpy.loadtxt('SRR1976948.dist', skiprows=1, delimiter=',')
plot(dist1[:,0], dist1[:,1])
axis(ymax=10000, xmax=1000)


python2 ~/khmer/sandbox/ \
   SRR1976948.half.fq SRR1976948.readdist


python2 ~/khmer/sandbox/ SRR1976948.half.fq slice.fq -m 0 -M 60

Assemble the slice

(Re)install megahit:

git clone
cd megahit

Go back to the slice directory and extract paired ends:

cd /mnt/slice slice.fq


~/megahit/megahit --12 -o slice

The contigs will be in slice/final.contigs.fa.

LICENSE: This documentation and all textual/graphic site content is released under Creative Commons - 0 (CC0) -- fork @ github.