3. Running the Actual Assembly

Make sure you’ve got the PROJECT location defined, and your data is there:

set -u
printf "\nMy diginormed files are in $PROJECT/diginorm/, and consist of $(ls -1 ${PROJECT}/diginorm/*.keep.abundfilt.fq.gz | wc -l) files\n\n"
set +u

Important: If you get an error above or the count of files is wrong... STOP!! Revisit the installation instructions for your compute platform!

Also, be sure you have loaded the right Python packages

source ~/venv/bin/activate

Build the files to assemble

Let’s make another working directory for the assembly

cd ${PROJECT}
mkdir -p assembly
cd assembly

For paired-end data, Trinity expects two files, ‘left’ and ‘right’; there can be orphan sequences present, however. So, below, we split all of our interleaved pair files in two, and then add the single-ended seqs to one of ‘em. :

for file in ../diginorm/*.pe.qc.keep.abundfilt.fq.gz
do
   split-paired-reads.py ${file}
done

cat *.1 > left.fq
cat *.2 > right.fq

gunzip -c ../diginorm/orphans.keep.abundfilt.fq.gz >> left.fq

Assembling with Trinity

Run the assembler! You should prepare a slurm script to run the assembly on the cluster. I saved this file as ‘assemble.sh’. See the documentation on running jobs on boqueron.

#!/bin/bash
#SBATCH --mem-per-cpu=7000
#SBATCH --time=5:00:00
#SBATCH --job-name=Trinity
#SBATCH --mail-user=humberto.ortiz@upr.edu
#SBATCH --mail-type=ALL
#SBATCH --workdir=/work/eelpondworkshop/humberto/assembly
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=2

/work/eelpondworkshop/trinityrnaseq-Trinity-v2.4.0/Trinity --no_bowtie --left left.fq \
 --right right.fq --seqType fq --max_memory 14G \
 --CPU 2

Note that these last two parts (--max_memory 14G --CPU 2) configure the maximum amount of memory and CPUs to use. You can increase (or decrease) them based on what machines you are running on.

To submit and run this script, use the ‘sbatch’ command:

sbatch assemble.sh

Once this completes, you’ll have an assembled transcriptome in ${PROJECT}/assembly/trinity_out_dir/Trinity.fasta.

Next: 4. Evaluating your transcriptome assembly


LICENSE: This documentation and all textual/graphic site content is licensed under the Creative Commons - 0 License (CC0) -- fork @ github.
comments powered by Disqus