Quick Start Guide
This guide will help you get started with ePLACE quickly.
Prerequisites
Before you begin, ensure you have:
Installed ePLACE and its dependencies (see Installation)
BLAST+ tools installed (
blastn,blastdbcmd)TaxonKit installed
(Optional) MAFFT and IQTree for alignment and phylogenetic analysis
Your First Analysis
Step 1: Download the NCBI Database
First, download the NCBI BLAST database:
eplace download
This will download the core_nt database to your BLASTDB location ($BLASTDB or ~/blastdb).
Note
This is a large download (several GB) and may take some time. You only need to do this once.
Step 2: Prepare Your Query Sequences
Create a FASTA file with your query sequences:
>query_sequence_1
ATGCATGCATGCATGCATGCATGC
>query_sequence_2
GCTAGCTAGCTAGCTAGCTAGCTA
Step 3: Run Search Analysis
Run a basic search analysis (BLAST by default; use --search-tool mmseqs2 for MMseqs2):
eplace search query.fasta output_dir
This will:
Run BLAST search against the core_nt database
Filter results by identity (≥90%) and coverage (≥80%)
Extract representative sequences for each query at the genus level
Align sequences using MAFFT
Build phylogenetic trees using IQTree
View Results
After the analysis completes, check the output directory:
ls -R output_dir/
You’ll find:
blast_results.txt- Raw BLAST resultsblast_results_annotated.txt- BLAST results with taxonomic annotationsOne directory per query sequence containing:
Representative sequences (FASTA)
Multiple sequence alignment
Phylogenetic tree files
Common Workflows
Individual Analysis
Analyze each query sequence independently with custom parameters:
eplace search query.fasta output_dir \
--rank genus \
--min-identity 95 \
--min-coverage 85 \
--num-threads 4
This creates one phylogenetic tree per query sequence.
Grouped Analysis
Group queries by taxonomic classification:
eplace grouped query.fasta output_dir \
--rank genus \
--group-rank family \
--num-threads 4
This groups queries that match to the same family and creates one tree per group.
Relabel Existing Trees
Relabel an existing tree with taxonomic names at different ranks:
# Relabel tree with genus names
eplace relabel blast_results.txt input.treefile output_genus.treefile --rank genus
# Relabel tree with species names (binomial nomenclature)
eplace relabel blast_results.txt input.treefile output_species.treefile --rank species
This is useful when you want to create multiple versions of the same tree with different taxonomic labels without rebuilding the tree.
BLAST Only (No Alignment)
If you only want BLAST results without alignment and tree building:
eplace search query.fasta output_dir --skip-alignment
High Stringency Search
For more stringent matching:
eplace search query.fasta output_dir \
--min-identity 98 \
--min-coverage 95
Custom Database Location
If your BLAST database is in a non-standard location:
eplace search query.fasta output_dir \
--blastdb-path /path/to/custom/blastdb
Using as a Python Library
You can also use ePLACE programmatically in your Python scripts.
Basic BLAST Workflow
from pathlib import Path
from eplace_lib import run_blast_search, process_blast_results_for_taxonomy
# Run BLAST search with filtering
success, filtered_hits = run_blast_search(
query_fasta=Path("query.fasta"),
output_file=Path("blast_results.txt"),
min_identity=90.0,
min_coverage=80.0
)
# Extract representative sequences by taxonomic rank
results = process_blast_results_for_taxonomy(
blast_hits=filtered_hits,
output_dir=Path("output"),
rank="genus"
)
# Print results
for query_id, output_fasta in results.items():
print(f"{query_id}: {output_fasta}")
Database Download
from eplace_lib import setup_ncbi_database
# Download the core_nt database
success, message = setup_ncbi_database()
print(f"Success: {success}, Message: {message}")
FASTA File Reading
from pathlib import Path
from eplace_lib.blast_analysis import FastaReader
# Read sequences from FASTA file
sequences = FastaReader.read_fasta(Path("input.fasta"))
# Get sequence lengths
lengths = FastaReader.get_sequence_lengths(Path("input.fasta"))
for seq_id, length in lengths.items():
print(f"{seq_id}: {length} bp")
Understanding Output Files
BLAST Results
blast_results.txt- Tabular BLAST output with standard columnsblast_results_annotated.txt- Same as above but with taxonomic annotations
Per-Query Directories
Each query sequence gets its own directory (query_id/) containing:
query_id_representatives.fasta- Representative sequences selected by taxonomic rankquery_id_with_query.fasta- Query sequence plus representativesquery_id_trimmed.fasta- Sequences trimmed to aligned regionsquery_id_aligned.fasta- Multiple sequence alignment (MAFFT output)query_id_tree.treefile- Phylogenetic tree (Newick format)query_id_tree_labeled.treefile- Tree with taxonomic labelsAdditional IQTree output files
Grouped Analysis Output
For grouped analysis, you’ll additionally see directories named by taxonomic group:
Taxonomic_Group_Name/Taxonomic_Group_Name_combined.fasta- All queries and unique referencesTaxonomic_Group_Name_trimmed.fasta- Trimmed sequencesTaxonomic_Group_Name_aligned.fasta- Multiple sequence alignmentTaxonomic_Group_Name_tree.treefile- Phylogenetic treeAdditional tree files
Choosing Between Workflows
Individual Workflow (eplace search)
Use when:
You want to analyze each query in its own phylogenetic context
Queries may be from diverse taxonomic groups
You need separate trees for each sequence
Advantages:
Independent analysis per query
Clear interpretation per sequence
No assumptions about relatedness
Grouped Workflow (eplace grouped)
Use when:
You have multiple queries from related organisms
You want to see queries together in phylogenetic context
You want to reduce computational time for related sequences
Advantages:
Combined phylogenetic analysis
Fewer alignment/tree operations
Better for comparative analysis
Command Reference
For detailed command-line options, see:
Command-Line Interface - Complete CLI reference
eplace --help- General helpeplace download --help- Download command helpeplace search --help- Search command helpeplace grouped --help- Grouped command helpeplace relabel --help- Relabel command help
Next Steps
Read the detailed Workflows documentation
Learn about BLAST Sequence Comparison Module process
Explore the API Reference for programmatic access
Check NCBI Database Download Module for database management
Troubleshooting
No BLAST hits found
If you get no BLAST hits:
Check that your sequences are in correct FASTA format
Try lowering
--min-identityand--min-coveragethresholdsVerify your sequences are nucleotide sequences (not protein)
Ensure the BLAST database is properly installed
Command not found
If eplace command is not found:
Verify installation:
pip show eplaceCheck that installation directory is in PATH
Try reinstalling:
pip install --force-reinstall .
Out of memory
If you run out of memory:
Process fewer sequences at a time
Use
--skip-alignmentto skip memory-intensive stepsReduce
--num-threadsparameterConsider using a machine with more RAM
Getting Help
If you encounter issues:
Check this documentation
Review error messages carefully
Open an issue on GitHub: https://github.com/linsalrob/eplace/issues
Include error messages, command used, and system information