Quick Start Guide

This guide will help you get started with ePLACE quickly.

Prerequisites

Before you begin, ensure you have:

Installed ePLACE and its dependencies (see Installation)
BLAST+ tools installed (blastn, blastdbcmd)
TaxonKit installed
(Optional) MAFFT and IQTree for alignment and phylogenetic analysis

Your First Analysis

Step 1: Download the NCBI Database

First, download the NCBI BLAST database:

eplace download

This will download the core_nt database to your BLASTDB location ($BLASTDB or ~/blastdb).

Note

This is a large download (several GB) and may take some time. You only need to do this once.

Step 2: Prepare Your Query Sequences

Create a FASTA file with your query sequences:

>query_sequence_1
ATGCATGCATGCATGCATGCATGC
>query_sequence_2
GCTAGCTAGCTAGCTAGCTAGCTA

Step 3: Run Search Analysis

Run a basic search analysis (BLAST by default; use --search-tool mmseqs2 for MMseqs2):

eplace search query.fasta output_dir

This will:

Run BLAST search against the core_nt database
Filter results by identity (≥90%) and coverage (≥80%)
Extract representative sequences for each query at the genus level
Align sequences using MAFFT
Build phylogenetic trees using IQTree

View Results

After the analysis completes, check the output directory:

ls -R output_dir/

You’ll find:

blast_results.txt - Raw BLAST results
blast_results_annotated.txt - BLAST results with taxonomic annotations
One directory per query sequence containing:
- Representative sequences (FASTA)
- Multiple sequence alignment
- Phylogenetic tree files

Common Workflows

Individual Analysis

Analyze each query sequence independently with custom parameters:

eplace search query.fasta output_dir \
    --rank genus \
    --min-identity 95 \
    --min-coverage 85 \
    --num-threads 4

This creates one phylogenetic tree per query sequence.

Grouped Analysis

Group queries by taxonomic classification:

eplace grouped query.fasta output_dir \
    --rank genus \
    --group-rank family \
    --num-threads 4

This groups queries that match to the same family and creates one tree per group.

Relabel Existing Trees

Relabel an existing tree with taxonomic names at different ranks:

# Relabel tree with genus names
eplace relabel blast_results.txt input.treefile output_genus.treefile --rank genus

# Relabel tree with species names (binomial nomenclature)
eplace relabel blast_results.txt input.treefile output_species.treefile --rank species

This is useful when you want to create multiple versions of the same tree with different taxonomic labels without rebuilding the tree.

BLAST Only (No Alignment)

If you only want BLAST results without alignment and tree building:

eplace search query.fasta output_dir --skip-alignment

High Stringency Search

For more stringent matching:

eplace search query.fasta output_dir \
    --min-identity 98 \
    --min-coverage 95

Custom Database Location

If your BLAST database is in a non-standard location:

eplace search query.fasta output_dir \
    --blastdb-path /path/to/custom/blastdb

Using as a Python Library

You can also use ePLACE programmatically in your Python scripts.

Basic BLAST Workflow

from pathlib import Path
from eplace_lib import run_blast_search, process_blast_results_for_taxonomy

# Run BLAST search with filtering
success, filtered_hits = run_blast_search(
    query_fasta=Path("query.fasta"),
    output_file=Path("blast_results.txt"),
    min_identity=90.0,
    min_coverage=80.0
)

# Extract representative sequences by taxonomic rank
results = process_blast_results_for_taxonomy(
    blast_hits=filtered_hits,
    output_dir=Path("output"),
    rank="genus"
)

# Print results
for query_id, output_fasta in results.items():
    print(f"{query_id}: {output_fasta}")

Database Download

from eplace_lib import setup_ncbi_database

# Download the core_nt database
success, message = setup_ncbi_database()
print(f"Success: {success}, Message: {message}")

FASTA File Reading

from pathlib import Path
from eplace_lib.blast_analysis import FastaReader

# Read sequences from FASTA file
sequences = FastaReader.read_fasta(Path("input.fasta"))

# Get sequence lengths
lengths = FastaReader.get_sequence_lengths(Path("input.fasta"))

for seq_id, length in lengths.items():
    print(f"{seq_id}: {length} bp")

Understanding Output Files

BLAST Results

blast_results.txt - Tabular BLAST output with standard columns
blast_results_annotated.txt - Same as above but with taxonomic annotations

Per-Query Directories

Each query sequence gets its own directory (query_id/) containing:

query_id_representatives.fasta - Representative sequences selected by taxonomic rank
query_id_with_query.fasta - Query sequence plus representatives
query_id_trimmed.fasta - Sequences trimmed to aligned regions
query_id_aligned.fasta - Multiple sequence alignment (MAFFT output)
query_id_tree.treefile - Phylogenetic tree (Newick format)
query_id_tree_labeled.treefile - Tree with taxonomic labels
Additional IQTree output files

Grouped Analysis Output

For grouped analysis, you’ll additionally see directories named by taxonomic group:

Taxonomic_Group_Name/
- Taxonomic_Group_Name_combined.fasta - All queries and unique references
- Taxonomic_Group_Name_trimmed.fasta - Trimmed sequences
- Taxonomic_Group_Name_aligned.fasta - Multiple sequence alignment
- Taxonomic_Group_Name_tree.treefile - Phylogenetic tree
- Additional tree files

Choosing Between Workflows

Individual Workflow (`eplace search`)

Use when:

You want to analyze each query in its own phylogenetic context
Queries may be from diverse taxonomic groups
You need separate trees for each sequence

Advantages:

Independent analysis per query
Clear interpretation per sequence
No assumptions about relatedness

Grouped Workflow (`eplace grouped`)