Fork me on GitHub!
mutationSeq is software for somatic SNV detection using next generation sequencing (NGS) data from matched tumour/normal samples. It uses
a feature-based classifier trained on validated somatic mutation samples while benefiting from other available information such as base quality,
mapping quality, strand bias and tail distance.
model_single_v4.0.2_anaconda_sk_0.13.1.npz are missing from this repository, and can be found at
ftp://ftp.bcgsc.ca/public/shahlab/MutationSeq/. Once you’ve acquired those files, place
model_single_v4.0.1.npz in the root of this repo, and place
model_single_v4.0.2_anaconda_sk_0.13.1.npz in the
MutationSeq can be downloaded from http://compbio.bccrc.ca/software/mutationseq or https://bitbucket.org/shahlabbcca/mutationseq. Once you have downloaded it, extract it:
tar -xzvf museq_4.3.8.tar.gz
This will extract the content into a folder
mutationseq. We will move the files into
$HOME/usr/museq/4.3.8 for organization purposes:
mkdir -p $HOME/usr/museq/4.3.8
mv mutationseq/* $HOME/usr/museq/4.3.8
Now we need to install MutationSeq. MutationSeq requires python (v2.7) and several key package dependencies.
The best way to install all of this is to use either Miniconda or anaconda. We will use miniconda here. First download miniconda (for python 2.7) and then run:
Then follow the instructions. When you have finished following the instructions, you should have python installed:
Now we can install the dependencies needed (NOTE: Versions and specified to ensure compatibility)
conda install -c bioconda numpy=1.7.1 scipy=0.12.0 scikit-learn=0.13.1 matplotlib=1.2.1 intervaltree
One last thing that is needed before we can install MutationSeq is the Boost C libraries. We only need to download them from http://www.boost.org/. Once you have downloaded (tested on 1.51) just extract them to a location. For example, you could put it into
Once this has been installed, we can now proceed to compiling a dependency
make PYTHON=python BOOSTPATH=$HOME/usr/boost/1.51 -B
Now when you run:
python $HOME/usr/museq/4.3.8/museq/classify.py --version
This indicates that you have successfully installed MutationSeq.
An important thing to note is that MutationSeq comes bundled with a trained classifer using the scikit-learn library and packaged as a pickle. The pickle is only compatible with the specific scikit-learn version that it was built with. As we are using miniconda here with scikit-learn version 0.13.1, we will have to use the corresponding models associated with that version. See Calling Variants - Using MutationSeq for more details.
To call variants using MutationSeq, we use the following command:
mkdir -p museq/results; \
python $(HOME)/usr/museq/4.3.8/museq/classify.py \
-c $(HOME)/usr/museq/4.3.8/museq/metadata.config \
Notice how for the
model parameter, we used the
$(HOME)/museq/4.3.8/museq/models_anaconda/model_v4.1.2_anaconda_sk_0.13.1.npz. As mentioned in the Installing MutationSeq section, this model specifically works with conda and the scikit-learn library (v0.13.1) which is how we installed MutationSeq in the workshop. If you are not using conda, then you will have to use the model version which is compatible with the scikit-learn library you have installed for your python.