=========================================
4. Evaluating your transcriptome assembly
=========================================

We will be using Transrate and Busco!

.. shell start

.. ::

   set -x
   set -e

Be sure you have loaded the right Python packages
::

   source ~/venv/bin/activate

Transrate
----------

`Transrate <http://hibberdlab.com/transrate/getting_started.html>`__ serves two main purposes. It can compare two assemblies to see how similar they are. Or, it can give you a score which represents proportion of input reads that provide positive support for the assembly. We will use transrate to get a score for the assembly. Use the trimmed reads. For a further explanation of metrics and how to run the reference-based transrate, see the documentation: http://hibberdlab.com/transrate/metrics.html and the paper by `Smith-Unna et al. 2016 <http://genome.cshlp.org/content/early/2016/06/01/gr.196469.115>`__. 


Make a new directory and get the reads together:

::

   cd ${PROJECT}
   mkdir -p evaluation
   cd evaluation

   cat ${PROJECT}/quality/*R1*.qc.fq.gz > left.fq.gz
   cat ${PROJECT}/quality/*R2*.qc.fq.gz > right.fq.gz

Transrate doesn't like pipes in sequence names. This version of Trinity doesn't output pipes into the sequence names, but others do. Let's just fix to make sure.

::

   sed 's_|_-_g' ${PROJECT}/assembly/trinity_out_dir/Trinity.fasta > Trinity.fixed.fasta
  
Now, run the actual command::

   module load transrate
   
   transrate --assembly=Trinity.fixed.fasta --threads=2 \
     --left=left.fq.gz \
     --right=right.fq.gz \
     --output=${PROJECT}/evaluation/nema