======== Tutorial ======== Through this step-by-step tutorial you could make use of key options from ``miRNAture`` to annotate the *bona fide* miRNA complement on selected contigs from the coelacanth (*Latimeria chalumnae*) genome. All the required files to execute this tutorial are included in the ``miRNAture`` source files in the `miRNAture/Tutorial` folder. Annotating let-7 on coelacanth ****************************** .. figure:: coel.jpg :width: 320px :align: center :height: 116px :alt: Coelacanth image :figclass: align-center *Latimeria chalumnae*. Source: `Alberto Fernandez Fernandez / CC BY-SA `_ Based on the miRNA annotation retrieved from ``Ensembl`` release 100, the coelacanth genome featured 8 let-7 locus distributed along 6 contigs. For purposes of this tutorial, 3 contigs were selected with variable number of miRNA/let-7 annotations, as follows: ============== =========== ============ ========== Contig Length (Mb) Numb. miRNAs Let-7 loci ============== =========== ============ ========== JH126571.1 5.981 5 1 JH129429.1 0.248 3 3 AFYH01291077.1 0.001 1 0 ============== =========== ============ ========== The main goal is the identification of the let-7 loci on the referred contigs. To do so, the `Tutorial` folder contained all the input files in `Data/`, required wrapper to `miRNAture` on `Code/` and a `Results/` folder where all prediction will be stored. As you can imagine, homology comparisons are prone to create both, a high number of input and output files. `miRNAture` avoids manual curation to the detected hits, life is too short to perform all of those steps by hand! Folder structure ================= The folder tree on miRNAture looks like:: $ tree -L 1 miRNAture/ miRNAture/ ├── Build.PL ├── Changes ├── ignore.txt ├── index.md ├── lib/ ├── LICENSE ├── MANIFEST ├── META.json ├── META.yml ├── mirnature_logo.png ├── miRNAture-Manual/ ├── miRNAture.yml ├── README ├── README.md ├── README.rst ├── script/ ├── t/ ├── Tutorial/ └── xt/ Our target folder is located in ``Tutorial/``:: $ cd Tutorial/ $ tree -L2 . Tutorial/ ├── Code │   ├── list_miRNAs_to_search.txt │   ├── Precalculated-Data-tutorial │   ├── tutorial_test_selected_models.sh │   └── User_Test_Data ├── Data │   ├── latimeria_chalumnae_genome.fa │   └── QueriesToTest └── Results The ``Tutorial`` folder is composed by the subfolders: ``Code/``, where all the necessary scripts to run ``miRNAture`` are located. ``Data/`` keeps the described contigs from coelacanth in a multi-fasta file:``latimeria_chalumnae_genome.fa``. In the same folder, in ``QueriesToTest/`` let-7 annotations from 11 metazoans [#species]_ were provided as queries. .. note:: Together with the query files, the file `queries_description.txt` is required to control which dataset of sequences that will be used by the `blastn` comparisons. Three columns are needed to be recognized: miRNA The first one corresponds to the file name, the second one have to be miRNA, the third one is the name of the source specie in the format: `Genera specie`. If you do not know the source, a valid name would be: Unknown specie. If ommited, ``miRNAture`` will create automatically this file using all fasta files in this folder with an Unknown origin. The ``Results/`` folder will conserve all the output files generated by ``miRNAture``. Input files =========== To run ``miRNAture`` just go directly to ``Code/`` folder:: $ cd Code/ $ tree -L 1 . . ├── list_miRNAs_to_search.txt ├── Precalculated-Data-tutorial ├── tutorial_test_selected_models.sh └── User_Test_Data In this path, the ``tutorial_test_selected_models.sh`` file is ``bash`` script that will organize all our code to run ``miRNAture``. This way is preferred in terms of reproducibility means of your computational experiments. This code will give you a general idea to run ``miRNAture``, let's explain this in detail: .. code-block:: bash #!/bin/bash current=$( pwd ) specie_tag="Lach" specie_genome="$current/../Data/latimeria_chalumnae_genome.fa" specie_name="Latimeria_chalumnae" workdir="$current/../Results" mkdir -p $workdir mode="Blast,HMM,Infernal,Other_CM,Final" strategy="5,6,ALL" blastQueriesFolder="$current/../Data/QueriesToTest" user_models="$current/User_Test_Data" data_precalculated_folder="$current/Precalculated-Data-tutorial" ### Step by step: homology->validation->evaluation->summarise # Run only homology-searches #miRNAture -stage homology -sublist $current/list_miRNAs_to_search.txt \ # -dataF $data_precalculated_folder -speG $specie_genome -speN $specie_name \ # -speT $specie_tag -w $workdir -m $mode -pe 0 -str $strategy \ # -blastq $blastQueriesFolder -rep relax,150,100 -usrM $user_models # Run detection matures #miRNAture -stage validation -dataF $data_precalculated_folder -speG $specie_genome \ # -speN $specie_name -speT $specie_tag -w $workdir -m $mode -pe 0 -usrM $user_models # Run the complete analysis #miRNAture -stage evaluation -dataF $data_precalculated_folder -speG $specie_genome \ # -speN $specie_name -speT $specie_tag -w $workdir -m $mode -pe 0 # Create summarise report #miRNAture -stage summarise -dataF $data_precalculated_folder -speG $specie_genome \ # -speN $specie_name -speT $specie_tag -w $workdir -m $mode -pe 0 # Run miRNAture complete miRNAture -stage complete -sublist $current/list_miRNAs_to_search.txt \ -dataF $data_precalculated_folder -speG $specie_genome -speN $specie_name \ -speT $specie_tag -w $workdir -m $mode -pe 0 -str $strategy \ -blastq $blastQueriesFolder -rep relax,150,100 -usrM $user_models Activate the ``conda`` environment called miRNAture. The installation and activation of this environment is required previously to run ``miRNAture``. All the dependences are described on the file ``miRNAture.yml``, located on the ``miRNAture/Code/`` folder. The last script shows two steps that are required to run ``miRNAture``: Declare the name of input and output locations. This will help to assign ``miRNAture`` flags and easily reproduce the experiment. In this case, we used the following options (flags indicated in parenthesis): * Processing stage (``-stage``): Running stage on ``miRNAture``. In this case was selected ``complete`` to run all the stages. To run step by step, this flag accepts: ``homology``, ``validation``, ``evaluation`` and ``summarise``. You should run all of them in this order to obtain the same final results as ``complete`` option. * Subset of miRNA models to run (``-sublist``): Subset of miRNA families references to be searched on the target sequence. See ``list_miRNAs_to_search.txt`` file as an example. If not provided all miRNA RFAM models will be searched. * Pre-calculated data location (``-dataF``): Location of pre-calculated data required by ``miRNAture``. It included hidden markov, covariance models and curated input files to annotate mature sequences [#ImportantNote]_. * Specie genome (``-speG``): Current target sequence. * Specie name (``-speN``): Scientific name of the specie which belongs the subject sequence(s). * Specie tag (``-speT``): Tag of the specie name, suggested one takes the first two letters from the Genera joined with the first two from the specie (i.e `Homo sapiens` = Hosa). * Working directory (``-w``): Output directory, final path of ``miRNAture`` results. * Running mode (``-m``): Select at least one, or any combination of the miRNA search strategies between: ``Blast``, ``HMM``, ``Infernal`` and ``Other_CM``. At the same time, to merge the complete results from those homology search modes, write at the end ``Final``. * Parallel jobs using SLURM (``-pe``): Activate (1) or not (0). * Blast strategies (``-str``): Write the numbers of desired ``blastn`` strategies. Possible strategies are: ``1,2,3,4,5,6``. To merge all results put at the end ``ALL``. * Path of ``blastn`` queries (``-blastq``): Declare the path of annotated query sequences of miRNAs. In this case is enough to indicate the folder name. * Homology repetition detection (``-rep``): Setup number of maximum loci number that will be evaluated by the mature annotation stage. By default, miRNAture will detect miRNA families that report high number of loci (> 200 loci). Then, it will select the top 100 candidates in terms of alignment scores, as candidates for the validation stage (``default,200,100``). Modify this values using ``relax,Number_Loci,Candidates_to_evaluate``. * User hidden markov/covariance models (``-usrM``): Directory with additional hidden Markov (HMMs) or covariance models (CMs) provided by the user to be searched on the target sequence. Then, run ``miRNAture`` through this script:: $ ./tutorial_test_selected_models.sh .. note:: The list of complete flags can be found typing: ``miRNAture -h`` or ``miRNAture -man``. .. rubric:: Footnotes .. [#species] *Anolis carolinensis*, *Branchiostoma belcheri*, *Branchiostoma floridae*, *Ciona robusta*, *Ciona savignyi*, *Danio rerio*, *Eptatretus burgeri*, *Petromyzon marinus*, *Strongylocentrotus purpuratus*, *Xenopus laevis* and *Xenopus tropicalis*. .. [#ImportantNote] Pre-calculated data should be downloaded from https://zenodo.org/record/4531376#.YDqO4-bTVTZ