Quick Start Guide
=================

This guide will help you get started with ePLACE quickly.

Prerequisites
-------------

Before you begin, ensure you have:

1. Installed ePLACE and its dependencies (see :doc:`installation`)
2. BLAST+ tools installed (``blastn``, ``blastdbcmd``)
3. TaxonKit installed
4. (Optional) MAFFT and IQTree for alignment and phylogenetic analysis

Your First Analysis
-------------------

Step 1: Download the NCBI Database
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

First, download the NCBI BLAST database:

.. code-block:: bash

   eplace download

This will download the ``core_nt`` database to your BLASTDB location (``$BLASTDB`` or ``~/blastdb``).

.. note::
   This is a large download (several GB) and may take some time. You only need to do this once.

Step 2: Prepare Your Query Sequences
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Create a FASTA file with your query sequences:

.. code-block:: text

   >query_sequence_1
   ATGCATGCATGCATGCATGCATGC
   >query_sequence_2
   GCTAGCTAGCTAGCTAGCTAGCTA

Step 3: Run Search Analysis
~~~~~~~~~~~~~~~~~~~~~~~~~~~

Run a basic search analysis (BLAST by default; use ``--search-tool mmseqs2`` for MMseqs2):

.. code-block:: bash

   eplace search query.fasta output_dir

This will:

1. Run BLAST search against the core_nt database
2. Filter results by identity (≥90%) and coverage (≥80%)
3. Extract representative sequences for each query at the genus level
4. Align sequences using MAFFT
5. Build phylogenetic trees using IQTree

View Results
~~~~~~~~~~~~

After the analysis completes, check the output directory:

.. code-block:: bash

   ls -R output_dir/

You'll find:

* ``blast_results.txt`` - Raw BLAST results
* ``blast_results_annotated.txt`` - BLAST results with taxonomic annotations
* One directory per query sequence containing:
  
  * Representative sequences (FASTA)
  * Multiple sequence alignment
  * Phylogenetic tree files

Common Workflows
----------------

Individual Analysis
~~~~~~~~~~~~~~~~~~~

Analyze each query sequence independently with custom parameters:

.. code-block:: bash

   eplace search query.fasta output_dir \
       --rank genus \
       --min-identity 95 \
       --min-coverage 85 \
       --num-threads 4

This creates one phylogenetic tree per query sequence.

Grouped Analysis
~~~~~~~~~~~~~~~~

Group queries by taxonomic classification:

.. code-block:: bash

   eplace grouped query.fasta output_dir \
       --rank genus \
       --group-rank family \
       --num-threads 4

This groups queries that match to the same family and creates one tree per group.

Relabel Existing Trees
~~~~~~~~~~~~~~~~~~~~~~~

Relabel an existing tree with taxonomic names at different ranks:

.. code-block:: bash

   # Relabel tree with genus names
   eplace relabel blast_results.txt input.treefile output_genus.treefile --rank genus
   
   # Relabel tree with species names (binomial nomenclature)
   eplace relabel blast_results.txt input.treefile output_species.treefile --rank species

This is useful when you want to create multiple versions of the same tree with different taxonomic labels without rebuilding the tree.

BLAST Only (No Alignment)
~~~~~~~~~~~~~~~~~~~~~~~~~~

If you only want BLAST results without alignment and tree building:

.. code-block:: bash

   eplace search query.fasta output_dir --skip-alignment

High Stringency Search
~~~~~~~~~~~~~~~~~~~~~~

For more stringent matching:

.. code-block:: bash

   eplace search query.fasta output_dir \
       --min-identity 98 \
       --min-coverage 95

Custom Database Location
~~~~~~~~~~~~~~~~~~~~~~~~~

If your BLAST database is in a non-standard location:

.. code-block:: bash

   eplace search query.fasta output_dir \
       --blastdb-path /path/to/custom/blastdb

Using as a Python Library
--------------------------

You can also use ePLACE programmatically in your Python scripts.

Basic BLAST Workflow
~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

   from pathlib import Path
   from eplace_lib import run_blast_search, process_blast_results_for_taxonomy

   # Run BLAST search with filtering
   success, filtered_hits = run_blast_search(
       query_fasta=Path("query.fasta"),
       output_file=Path("blast_results.txt"),
       min_identity=90.0,
       min_coverage=80.0
   )

   # Extract representative sequences by taxonomic rank
   results = process_blast_results_for_taxonomy(
       blast_hits=filtered_hits,
       output_dir=Path("output"),
       rank="genus"
   )

   # Print results
   for query_id, output_fasta in results.items():
       print(f"{query_id}: {output_fasta}")

Database Download
~~~~~~~~~~~~~~~~~

.. code-block:: python

   from eplace_lib import setup_ncbi_database

   # Download the core_nt database
   success, message = setup_ncbi_database()
   print(f"Success: {success}, Message: {message}")

FASTA File Reading
~~~~~~~~~~~~~~~~~~

.. code-block:: python

   from pathlib import Path
   from eplace_lib.blast_analysis import FastaReader

   # Read sequences from FASTA file
   sequences = FastaReader.read_fasta(Path("input.fasta"))

   # Get sequence lengths
   lengths = FastaReader.get_sequence_lengths(Path("input.fasta"))

   for seq_id, length in lengths.items():
       print(f"{seq_id}: {length} bp")

Understanding Output Files
---------------------------

BLAST Results
~~~~~~~~~~~~~

* ``blast_results.txt`` - Tabular BLAST output with standard columns
* ``blast_results_annotated.txt`` - Same as above but with taxonomic annotations

Per-Query Directories
~~~~~~~~~~~~~~~~~~~~~

Each query sequence gets its own directory (``query_id/``) containing:

* ``query_id_representatives.fasta`` - Representative sequences selected by taxonomic rank
* ``query_id_with_query.fasta`` - Query sequence plus representatives
* ``query_id_trimmed.fasta`` - Sequences trimmed to aligned regions
* ``query_id_aligned.fasta`` - Multiple sequence alignment (MAFFT output)
* ``query_id_tree.treefile`` - Phylogenetic tree (Newick format)
* ``query_id_tree_labeled.treefile`` - Tree with taxonomic labels
* Additional IQTree output files

Grouped Analysis Output
~~~~~~~~~~~~~~~~~~~~~~~

For grouped analysis, you'll additionally see directories named by taxonomic group:

* ``Taxonomic_Group_Name/``
  
  * ``Taxonomic_Group_Name_combined.fasta`` - All queries and unique references
  * ``Taxonomic_Group_Name_trimmed.fasta`` - Trimmed sequences
  * ``Taxonomic_Group_Name_aligned.fasta`` - Multiple sequence alignment
  * ``Taxonomic_Group_Name_tree.treefile`` - Phylogenetic tree
  * Additional tree files

Choosing Between Workflows
---------------------------

Individual Workflow (``eplace search``)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Use when:**

* You want to analyze each query in its own phylogenetic context
* Queries may be from diverse taxonomic groups
* You need separate trees for each sequence

**Advantages:**

* Independent analysis per query
* Clear interpretation per sequence
* No assumptions about relatedness

Grouped Workflow (``eplace grouped``)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Use when:**

* You have multiple queries from related organisms
* You want to see queries together in phylogenetic context
* You want to reduce computational time for related sequences

**Advantages:**

* Combined phylogenetic analysis
* Fewer alignment/tree operations
* Better for comparative analysis

Command Reference
-----------------

For detailed command-line options, see:

* :doc:`cli` - Complete CLI reference
* ``eplace --help`` - General help
* ``eplace download --help`` - Download command help
* ``eplace search --help`` - Search command help
* ``eplace grouped --help`` - Grouped command help
* ``eplace relabel --help`` - Relabel command help

Next Steps
----------

* Read the detailed :doc:`workflows` documentation
* Learn about :doc:`blast_workflow` process
* Explore the :doc:`api` for programmatic access
* Check :doc:`ncbi_download` for database management

Troubleshooting
---------------

No BLAST hits found
~~~~~~~~~~~~~~~~~~~

If you get no BLAST hits:

1. Check that your sequences are in correct FASTA format
2. Try lowering ``--min-identity`` and ``--min-coverage`` thresholds
3. Verify your sequences are nucleotide sequences (not protein)
4. Ensure the BLAST database is properly installed

Command not found
~~~~~~~~~~~~~~~~~

If ``eplace`` command is not found:

1. Verify installation: ``pip show eplace``
2. Check that installation directory is in PATH
3. Try reinstalling: ``pip install --force-reinstall .``

Out of memory
~~~~~~~~~~~~~

If you run out of memory:

1. Process fewer sequences at a time
2. Use ``--skip-alignment`` to skip memory-intensive steps
3. Reduce ``--num-threads`` parameter
4. Consider using a machine with more RAM

Getting Help
------------

If you encounter issues:

1. Check this documentation
2. Review error messages carefully
3. Open an issue on GitHub: https://github.com/linsalrob/eplace/issues
4. Include error messages, command used, and system information