5. Tutorial

Through this step-by-step tutorial you could make use of key options from miRNAture to annotate the bona fide miRNA complement on selected contigs from the coelacanth (Latimeria chalumnae) genome. All the required files to execute this tutorial are included in the miRNAture source files in the miRNAture/Tutorial folder.

5.1. Annotating let-7 on coelacanth

Coelacanth image

Fig. 5.1 Latimeria chalumnae. Source: Alberto Fernandez Fernandez / CC BY-SA

Based on the miRNA annotation retrieved from Ensembl release 100, the coelacanth genome featured 8 let-7 locus distributed along 6 contigs. For purposes of this tutorial, 3 contigs were selected with variable number of miRNA/let-7 annotations, as follows:

Contig

Length (Mb)

Numb. miRNAs

Let-7 loci

JH126571.1

5.981

5

1

JH129429.1

0.248

3

3

AFYH01291077.1

0.001

1

0

The main goal is the identification of the let-7 loci on the referred contigs. To do so, the Tutorial folder contained all the input files in Data/, required wrapper to miRNAture on Code/ and a Results/ folder where all prediction will be stored. As you can imagine, homology comparisons are prone to create both, a high number of input and output files. miRNAture avoids manual curation to the detected hits, life is too short to perform all of those steps by hand!

5.1.1. Folder structure

The folder tree on miRNAture looks like:

$ tree -L 1 miRNAture/
miRNAture/
├── Build.PL
├── Changes
├── ignore.txt
├── index.md
├── lib/
├── LICENSE
├── MANIFEST
├── META.json
├── META.yml
├── mirnature_logo.png
├── miRNAture-Manual/
├── miRNAture.yml
├── README
├── README.md
├── README.rst
├── script/
├── t/
├── Tutorial/
└── xt/

Our target folder is located in Tutorial/:

$ cd Tutorial/
$ tree -L2 .
Tutorial/
├── Code
│   ├── list_miRNAs_to_search.txt
│   ├── Precalculated-Data-tutorial
│   ├── tutorial_test_selected_models.sh
│   └── User_Test_Data
├── Data
│   ├── latimeria_chalumnae_genome.fa
│   └── QueriesToTest
└── Results

The Tutorial folder is composed by the subfolders: Code/, where all the necessary scripts to run miRNAture are located. Data/ keeps the described contigs from coelacanth in a multi-fasta file:latimeria_chalumnae_genome.fa. In the same folder, in QueriesToTest/ let-7 annotations from 11 metazoans 1 were provided as queries.

Note

Together with the query files, the file queries_description.txt is required to control which dataset of sequences that will be used by the blastn comparisons. Three columns are needed to be recognized:

<Name_fasta_file.fa> miRNA <Origin_of_sequence>

The first one corresponds to the file name, the second one have to be miRNA, the third one is the name of the source specie in the format: Genera specie. If you do not know the source, a valid name would be: Unknown specie. If ommited, miRNAture will create automatically this file using all fasta files in this folder with an Unknown origin.

The Results/ folder will conserve all the output files generated by miRNAture.

5.1.2. Input files

To run miRNAture just go directly to Code/ folder:

$ cd Code/
$ tree -L 1 .
.
├── list_miRNAs_to_search.txt
├── Precalculated-Data-tutorial
├── tutorial_test_selected_models.sh
└── User_Test_Data

In this path, the tutorial_test_selected_models.sh file is bash script that will organize all our code to run miRNAture. This way is preferred in terms of reproducibility means of your computational experiments. This code will give you a general idea to run miRNAture, let’s explain this in detail:

#!/bin/bash

current=$( pwd )
specie_tag="Lach"
specie_genome="$current/../Data/latimeria_chalumnae_genome.fa"
specie_name="Latimeria_chalumnae"

workdir="$current/../Results"
mkdir -p $workdir
mode="Blast,HMM,Infernal,Other_CM,Final"
strategy="5,6,ALL"
blastQueriesFolder="$current/../Data/QueriesToTest"
user_models="$current/User_Test_Data"
data_precalculated_folder="$current/Precalculated-Data-tutorial"

### Step by step: homology->validation->evaluation->summarise
# Run only homology-searches
#miRNAture -stage homology -sublist $current/list_miRNAs_to_search.txt \
#    -dataF $data_precalculated_folder -speG $specie_genome -speN $specie_name \
#    -speT $specie_tag -w $workdir -m $mode -pe 0 -str $strategy \
#    -blastq $blastQueriesFolder -rep relax,150,100 -usrM $user_models
# Run detection matures
#miRNAture -stage validation -dataF $data_precalculated_folder -speG $specie_genome \
#    -speN $specie_name -speT $specie_tag -w $workdir -m $mode -pe 0 -usrM $user_models
# Run the complete analysis
#miRNAture -stage evaluation -dataF $data_precalculated_folder -speG $specie_genome \
#    -speN $specie_name -speT $specie_tag -w $workdir -m $mode -pe 0
# Create summarise report
#miRNAture -stage summarise -dataF $data_precalculated_folder -speG $specie_genome \
#    -speN $specie_name -speT $specie_tag -w $workdir -m $mode -pe 0

# Run miRNAture complete
miRNAture -stage complete -sublist $current/list_miRNAs_to_search.txt \
    -dataF $data_precalculated_folder -speG $specie_genome -speN $specie_name \
    -speT $specie_tag -w $workdir -m $mode -pe 0 -str $strategy \
    -blastq $blastQueriesFolder -rep relax,150,100 -usrM $user_models

Activate the conda environment called miRNAture. The installation and activation of this environment is required previously to run miRNAture. All the dependences are described on the file miRNAture.yml, located on the miRNAture/Code/ folder.

The last script shows two steps that are required to run miRNAture:

Declare the name of input and output locations. This will help to assign miRNAture flags and easily reproduce the experiment. In this case, we used the following options (flags indicated in parenthesis):

  • Processing stage (-stage): Running stage on miRNAture. In this case was selected complete to run all the stages. To run step by step, this flag accepts: homology, validation, evaluation and summarise. You should run all of them in this order to obtain the same final results as complete option.

  • Subset of miRNA models to run (-sublist): Subset of miRNA families references to be searched on the target sequence. See list_miRNAs_to_search.txt file as an example. If not provided all miRNA RFAM models will be searched.

  • Pre-calculated data location (-dataF): Location of pre-calculated data required by miRNAture. It included hidden markov, covariance models and curated input files to annotate mature sequences 2.

  • Specie genome (-speG): Current target sequence.

  • Specie name (-speN): Scientific name of the specie which belongs the subject sequence(s).

  • Specie tag (-speT): Tag of the specie name, suggested one takes the first two letters from the Genera joined with the first two from the specie (i.e Homo sapiens = Hosa).

  • Working directory (-w): Output directory, final path of miRNAture results.

  • Running mode (-m): Select at least one, or any combination of the miRNA search strategies between: Blast, HMM, Infernal and Other_CM. At the same time, to merge the complete results from those homology search modes, write at the end Final.

  • Parallel jobs using SLURM (-pe): Activate (1) or not (0).

  • Blast strategies (-str): Write the numbers of desired blastn strategies. Possible strategies are: 1,2,3,4,5,6. To merge all results put at the end ALL.

  • Path of blastn queries (-blastq): Declare the path of annotated query sequences of miRNAs. In this case is enough to indicate the folder name.

  • Homology repetition detection (-rep): Setup number of maximum loci number that will be evaluated by the mature annotation stage. By default, miRNAture will detect miRNA families that report high number of loci (> 200 loci). Then, it will select the top 100 candidates in terms of alignment scores, as candidates for the validation stage (default,200,100). Modify this values using relax,Number_Loci,Candidates_to_evaluate.

  • User hidden markov/covariance models (-usrM): Directory with additional hidden Markov (HMMs) or covariance models (CMs) provided by the user to be searched on the target sequence.

Then, run miRNAture through this script:

$ ./tutorial_test_selected_models.sh

Note

The list of complete flags can be found typing: miRNAture -h or miRNAture -man.

Footnotes

1

Anolis carolinensis, Branchiostoma belcheri, Branchiostoma floridae, Ciona robusta, Ciona savignyi, Danio rerio, Eptatretus burgeri, Petromyzon marinus, Strongylocentrotus purpuratus, Xenopus laevis and Xenopus tropicalis.

2

Pre-calculated data should be downloaded from https://zenodo.org/record/4531376#.YDqO4-bTVTZ