Quick Start Guide

This guide will help you get started with ePLACE quickly.

Prerequisites

Before you begin, ensure you have:

  1. Installed ePLACE and its dependencies (see Installation)

  2. BLAST+ tools installed (blastn, blastdbcmd)

  3. TaxonKit installed

  4. (Optional) MAFFT and IQTree for alignment and phylogenetic analysis

Your First Analysis

Step 1: Download the NCBI Database

First, download the NCBI BLAST database:

eplace download

This will download the core_nt database to your BLASTDB location ($BLASTDB or ~/blastdb).

Note

This is a large download (several GB) and may take some time. You only need to do this once.

Step 2: Prepare Your Query Sequences

Create a FASTA file with your query sequences:

>query_sequence_1
ATGCATGCATGCATGCATGCATGC
>query_sequence_2
GCTAGCTAGCTAGCTAGCTAGCTA

Step 3: Run Search Analysis

Run a basic search analysis (BLAST by default; use --search-tool mmseqs2 for MMseqs2):

eplace search query.fasta output_dir

This will:

  1. Run BLAST search against the core_nt database

  2. Filter results by identity (≥90%) and coverage (≥80%)

  3. Extract representative sequences for each query at the genus level

  4. Align sequences using MAFFT

  5. Build phylogenetic trees using IQTree

View Results

After the analysis completes, check the output directory:

ls -R output_dir/

You’ll find:

  • blast_results.txt - Raw BLAST results

  • blast_results_annotated.txt - BLAST results with taxonomic annotations

  • One directory per query sequence containing:

    • Representative sequences (FASTA)

    • Multiple sequence alignment

    • Phylogenetic tree files

Common Workflows

Individual Analysis

Analyze each query sequence independently with custom parameters:

eplace search query.fasta output_dir \
    --rank genus \
    --min-identity 95 \
    --min-coverage 85 \
    --num-threads 4

This creates one phylogenetic tree per query sequence.

Grouped Analysis

Group queries by taxonomic classification:

eplace grouped query.fasta output_dir \
    --rank genus \
    --group-rank family \
    --num-threads 4

This groups queries that match to the same family and creates one tree per group.

Relabel Existing Trees

Relabel an existing tree with taxonomic names at different ranks:

# Relabel tree with genus names
eplace relabel blast_results.txt input.treefile output_genus.treefile --rank genus

# Relabel tree with species names (binomial nomenclature)
eplace relabel blast_results.txt input.treefile output_species.treefile --rank species

This is useful when you want to create multiple versions of the same tree with different taxonomic labels without rebuilding the tree.

BLAST Only (No Alignment)

If you only want BLAST results without alignment and tree building:

eplace search query.fasta output_dir --skip-alignment

Custom Database Location

If your BLAST database is in a non-standard location:

eplace search query.fasta output_dir \
    --blastdb-path /path/to/custom/blastdb

Using as a Python Library

You can also use ePLACE programmatically in your Python scripts.

Basic BLAST Workflow

from pathlib import Path
from eplace_lib import run_blast_search, process_blast_results_for_taxonomy

# Run BLAST search with filtering
success, filtered_hits = run_blast_search(
    query_fasta=Path("query.fasta"),
    output_file=Path("blast_results.txt"),
    min_identity=90.0,
    min_coverage=80.0
)

# Extract representative sequences by taxonomic rank
results = process_blast_results_for_taxonomy(
    blast_hits=filtered_hits,
    output_dir=Path("output"),
    rank="genus"
)

# Print results
for query_id, output_fasta in results.items():
    print(f"{query_id}: {output_fasta}")

Database Download

from eplace_lib import setup_ncbi_database

# Download the core_nt database
success, message = setup_ncbi_database()
print(f"Success: {success}, Message: {message}")

FASTA File Reading

from pathlib import Path
from eplace_lib.blast_analysis import FastaReader

# Read sequences from FASTA file
sequences = FastaReader.read_fasta(Path("input.fasta"))

# Get sequence lengths
lengths = FastaReader.get_sequence_lengths(Path("input.fasta"))

for seq_id, length in lengths.items():
    print(f"{seq_id}: {length} bp")

Understanding Output Files

BLAST Results

  • blast_results.txt - Tabular BLAST output with standard columns

  • blast_results_annotated.txt - Same as above but with taxonomic annotations

Per-Query Directories

Each query sequence gets its own directory (query_id/) containing:

  • query_id_representatives.fasta - Representative sequences selected by taxonomic rank

  • query_id_with_query.fasta - Query sequence plus representatives

  • query_id_trimmed.fasta - Sequences trimmed to aligned regions

  • query_id_aligned.fasta - Multiple sequence alignment (MAFFT output)

  • query_id_tree.treefile - Phylogenetic tree (Newick format)

  • query_id_tree_labeled.treefile - Tree with taxonomic labels

  • Additional IQTree output files

Grouped Analysis Output

For grouped analysis, you’ll additionally see directories named by taxonomic group:

  • Taxonomic_Group_Name/

    • Taxonomic_Group_Name_combined.fasta - All queries and unique references

    • Taxonomic_Group_Name_trimmed.fasta - Trimmed sequences

    • Taxonomic_Group_Name_aligned.fasta - Multiple sequence alignment

    • Taxonomic_Group_Name_tree.treefile - Phylogenetic tree

    • Additional tree files

Choosing Between Workflows

Grouped Workflow (eplace grouped)

Use when:

  • You have multiple queries from related organisms

  • You want to see queries together in phylogenetic context

  • You want to reduce computational time for related sequences

Advantages:

  • Combined phylogenetic analysis

  • Fewer alignment/tree operations

  • Better for comparative analysis

Command Reference

For detailed command-line options, see:

  • Command-Line Interface - Complete CLI reference

  • eplace --help - General help

  • eplace download --help - Download command help

  • eplace search --help - Search command help

  • eplace grouped --help - Grouped command help

  • eplace relabel --help - Relabel command help

Next Steps

Troubleshooting

No BLAST hits found

If you get no BLAST hits:

  1. Check that your sequences are in correct FASTA format

  2. Try lowering --min-identity and --min-coverage thresholds

  3. Verify your sequences are nucleotide sequences (not protein)

  4. Ensure the BLAST database is properly installed

Command not found

If eplace command is not found:

  1. Verify installation: pip show eplace

  2. Check that installation directory is in PATH

  3. Try reinstalling: pip install --force-reinstall .

Out of memory

If you run out of memory:

  1. Process fewer sequences at a time

  2. Use --skip-alignment to skip memory-intensive steps

  3. Reduce --num-threads parameter

  4. Consider using a machine with more RAM

Getting Help

If you encounter issues:

  1. Check this documentation

  2. Review error messages carefully

  3. Open an issue on GitHub: https://github.com/linsalrob/eplace/issues

  4. Include error messages, command used, and system information