Command-Line Interface
The eplace command provides a unified interface to all ePLACE functionality through three subcommands.
Installation Verification
After installing the package with pip install . or pip install -e ., the eplace command will be available:
# Verify installation
eplace --help
# Check version
eplace --version
Commands Overview
Command |
Description |
|---|---|
|
Download NCBI BLAST database |
|
Run individual search workflow (one tree per query; BLAST by default, MMseqs2 via |
|
Run grouped BLAST workflow (one tree per taxonomic group) |
|
Relabel phylogenetic tree with taxonomic names |
eplace download
Download and setup the NCBI core_nt BLAST database and/or MMseqs2 NT database.
Usage
eplace download [--target {blast,mmseqs2,both}] [--force] [MMSEQS_OPTIONS]
Options
- --target {blast,mmseqs2,both}
Database backend(s) to download
Default:
blast
- --force
Force redownload even if database exists
- --mmseqs-db-dir PATH
Path to MMseqs2 database root directory. Defaults to
$MMSEQS_DB_DIR, then$MMSEQS2DB, then~/mmseqs2db.
- --mmseqs-threads INT
Number of threads for MMseqs2 download/taxonomy commands
- --add-taxonomy
Add taxonomy sidecar files to MMseqs2 NT database after download.
- --ncbi-taxonomy PATH
Path to NCBI taxonomy dump directory containing
nodes.dmp,names.dmp, andmerged.dmp. Required with--add-taxonomy.
- --acc2taxid-dir PATH
Path to accession2taxid files. Defaults to
$ACC2TAXID_DIRor<ncbi-taxonomy>/accession2taxid.
- --taxonomy-workdir PATH
Working directory for MMseqs taxonomy mapping files.
- --skip-memory-check
Skip RAM preflight checks for MMseqs2 download/taxonomy.
Examples
# Download BLAST database to default location ($BLASTDB or ~/blastdb)
eplace download
# Download MMseqs2 NT database
eplace download --target mmseqs2 --mmseqs-db-dir /path/to/mmseqs_db
# Download MMseqs2 NT database and add taxonomy sidecar files
eplace download --target mmseqs2 --add-taxonomy --ncbi-taxonomy /path/to/ncbi/taxonomy/current
Notes
BLAST DB location:
$BLASTDBor~/blastdbMMseqs2 DB location:
$MMSEQS_DB_DIR, then$MMSEQS2DB, or~/mmseqs2dbMMseqs2 NT download typically requires at least 64 GiB RAM
MMseqs2 taxonomy integration typically requires at least 128 GiB RAM
eplace search
Run BLAST search with individual taxonomy analysis. Creates one phylogenetic tree per query sequence.
Usage
eplace search QUERY_FASTA OUTPUT_DIR [OPTIONS]
Required Arguments
- QUERY_FASTA
Path to query FASTA file containing sequences to search
- OUTPUT_DIR
Output directory for results (will be created if it doesn’t exist)
Optional Arguments
Taxonomy Options
- --rank {phylum,class,order,family,genus,species}
Taxonomic rank for representative selection
Default:
genus
- --tree-label-rank {phylum,class,order,family,genus,species}
Taxonomic rank for tree labeling
Default:
genus
Filtering Options
- --min-identity FLOAT
Minimum percent identity for BLAST hits
Default:
90.0
- --min-coverage FLOAT
Minimum query coverage percentage
Default:
80.0
Database Options
- --database NAME
BLAST database name
Default:
core_nt
- --blastdb-path PATH
Path to BLAST database directory
Performance Options
- --num-threads INT
Number of threads for BLAST and alignment
Default:
1
Workflow Options
- --overwrite-existing-blast
Overwrite existing BLAST results
- --skip-alignment
Skip alignment and tree building steps
- --output-classification PATH
Path to output classification TSV file
MMseqs2 Options
These options apply only when --search-tool mmseqs2 is specified.
- --mmseqs-memory-limit LIMIT
Maximum RAM for the MMseqs2 prefilter/index step, passed as
--split-memory-limittommseqs easy-search.Default:
400GOn smaller hosts (laptops/small VMs), set a lower value explicitly, for example
16Gor32G.Accepts MMseqs2-style memory strings: a positive integer (no leading zeros) immediately followed by a single-character unit
K,M,G, orT(no extra suffix). Invalid values (e.g.0Gnot positive,01Gwith a leading zero,400GBwhereBmakes a double unit, orfourhundredGwith a non-numeric prefix) will cause an error before the search starts.
- --mmseqs-sensitivity FLOAT
MMseqs2 sensitivity setting (1–7.5)
Default:
5.7
- --mmseqs-search-type INT
MMseqs2 search type (2 = translated, 3 = nucleotide, 4 = translated nucleotide backtrace)
Default:
3
- --mmseqs-database NAME
MMseqs2 database name (default: same as
--database)
- --mmseqs-db-path PATH
Path to the MMseqs2 database directory
- --mmseqs-db-source LABEL
Provenance label for the MMseqs2 database, recorded in
search_metadata.json
Examples
# Basic usage with default parameters
eplace search query.fasta output_dir
# With custom parameters
eplace search query.fasta output_dir \
--rank genus \
--min-identity 95 \
--min-coverage 85 \
--num-threads 4
# Skip alignment and tree building (BLAST only)
eplace search query.fasta output_dir --skip-alignment
# Use custom BLAST database location
eplace search query.fasta output_dir --blastdb-path /path/to/blastdb
# Use MMseqs2 with memory limit for large NT database
eplace search query.fasta output_dir \
--search-tool mmseqs2 \
--mmseqs-db-path /path/to/mmseqs_db \
--mmseqs-memory-limit 400G
Output Structure
output_dir/
├── blast_results.txt # Raw BLAST results
├── blast_results_annotated.txt # BLAST results with taxonomic annotations
├── query1_id/
│ ├── query1_id_representatives.fasta # Representative sequences
│ ├── query1_id_with_query.fasta # Query + representatives
│ ├── query1_id_trimmed.fasta # Trimmed to aligned regions
│ ├── query1_id_aligned.fasta # Multiple sequence alignment
│ ├── query1_id_tree.treefile # Phylogenetic tree
│ ├── query1_id_tree_labeled.treefile # Tree with taxonomic labels
│ └── query1_id_tree.* (other IQTree files)
└── ...
eplace grouped
Run BLAST search with grouped taxonomy analysis. Groups queries by taxonomic rank and creates one phylogenetic tree per group.
Usage
eplace grouped QUERY_FASTA OUTPUT_DIR [OPTIONS]
Required Arguments
- QUERY_FASTA
Path to query FASTA file containing sequences to search
- OUTPUT_DIR
Output directory for results (will be created if it doesn’t exist)
Optional Arguments
Taxonomy Options
- --rank {phylum,class,order,family,genus,species}
Taxonomic rank for representative selection
Default:
genus
- --group-rank {phylum,class,order,family,genus,species}
Taxonomic rank for grouping sequences
Default:
class
- --tree-label-rank {phylum,class,order,family,genus,species}
Taxonomic rank for tree labeling
Default:
genus
- --combined-tree-label-rank {phylum,class,order,family,genus,species}
Taxonomic rank for labeling the combined tree (optional)
Default: Not set (combined tree will not be built)
The grouped workflow can create a combined tree from all groups, but this is optional because it can be very time-consuming with large datasets. If you want to build the combined tree, specify this parameter with the desired taxonomic rank for labeling. If not provided, only individual group trees will be built.
Filtering Options
- --min-identity FLOAT
Minimum percent identity for BLAST hits
Default:
90.0
- --min-coverage FLOAT
Minimum query coverage percentage
Default:
80.0
Database Options
- --database NAME
BLAST database name
Default:
core_nt
- --blastdb-path PATH
Path to BLAST database directory
Performance Options
- --num-threads INT
Number of threads for BLAST and alignment
Default:
1
Workflow Options
- --overwrite-existing-blast
Overwrite existing BLAST results
- --skip-alignment
Skip alignment and tree building steps
- --alignment-tolerance INT
Maximum coordinate difference for alignment consistency
Default:
50
- --output-classification PATH
Path to output classification TSV file
MMseqs2 Options
These options apply only when --search-tool mmseqs2 is specified.
- --mmseqs-memory-limit LIMIT
Maximum RAM for the MMseqs2 prefilter/index step, passed as
--split-memory-limittommseqs easy-search.Default:
400GAccepts MMseqs2-style memory strings: a positive integer (no leading zeros) immediately followed by a single-character unit
K,M,G, orT(no extra suffix, e.g.64G,400G,1T). Invalid values (e.g.0Gnot positive,01Gwith leading zero,400GBwith extraBsuffix,fourhundredGwith non-numeric prefix) will cause an error before the search starts.
- --mmseqs-sensitivity FLOAT
MMseqs2 sensitivity setting (1–7.5)
Default:
5.7
- --mmseqs-search-type INT
MMseqs2 search type (2 = translated, 3 = nucleotide, 4 = translated nucleotide backtrace)
Default:
3
- --mmseqs-database NAME
MMseqs2 database name (default: same as
--database)
- --mmseqs-db-path PATH
Path to the MMseqs2 database directory
- --mmseqs-db-source LABEL
Provenance label for the MMseqs2 database, recorded in
search_metadata.json
Examples
# Basic usage (group by class, default)
eplace grouped query.fasta output_dir
# Group by different taxonomic rank
eplace grouped query.fasta output_dir --group-rank order
# Specify both representative and grouping ranks
eplace grouped query.fasta output_dir --rank genus --group-rank family
# Skip alignment and tree building
eplace grouped query.fasta output_dir --skip-alignment
# Use MMseqs2 with memory limit for large NT database
eplace grouped query.fasta output_dir \
--search-tool mmseqs2 \
--mmseqs-db-path /path/to/mmseqs_db \
--mmseqs-memory-limit 400G
Output Structure
output_dir/
├── blast_results.txt # Raw BLAST results
├── blast_results_annotated.txt # BLAST results with taxonomic annotations
├── query1_id/ # Per-query representative sequences
│ └── query1_id_representatives.fasta
├── Taxonomic_Group_1/ # One directory per taxonomic group
│ ├── Taxonomic_Group_1_combined.fasta # All queries + unique references
│ ├── Taxonomic_Group_1_trimmed.fasta # Trimmed to aligned regions
│ ├── Taxonomic_Group_1_aligned.fasta # Multiple sequence alignment
│ ├── Taxonomic_Group_1_tree.treefile # Phylogenetic tree
│ ├── Taxonomic_Group_1_tree_labeled.treefile # Tree with taxonomic labels
│ └── Taxonomic_Group_1_tree.* (other IQTree files)
├── combined_all_groups_trimmed.fasta # Combined alignment of all groups
├── combined_all_groups_aligned.fasta # Multiple sequence alignment
├── combined_all_groups_tree.treefile # Combined phylogenetic tree
├── combined_all_groups_tree_labeled.treefile # Combined tree with taxonomic labels
└── ...
eplace relabel
Relabel a phylogenetic tree with taxonomic names from BLAST results. This command allows you to replace sequence IDs in an existing tree with taxonomic names at any specified rank.
Usage
eplace relabel BLAST_OUTPUT TREE_FILE OUTPUT_TREE [OPTIONS]
Required Arguments
- BLAST_OUTPUT
Path to BLAST output file (tabular format with taxonomy)
The BLAST results file should contain taxonomic information for the sequences in the tree.
- TREE_FILE
Path to input tree file (Newick format)
The phylogenetic tree to be relabeled with taxonomic names.
- OUTPUT_TREE
Path to output relabeled tree file
The new tree file with taxonomic labels will be written to this path.
Optional Arguments
Taxonomy Options
- --rank {phylum,class,order,family,genus,species}
Taxonomic rank for tree labeling
Default:
genusWhen using
species, the output will use binomial nomenclature (genus + species).
Database Options
- --blastdb-path PATH
Path to BLAST database directory
Optional parameter, not required for relabeling operation.
Examples
# Relabel tree with genus names (default)
eplace relabel blast_results.txt input.treefile output_labeled.treefile
# Relabel tree with species names (genus + species binomial)
eplace relabel blast_results.txt input.treefile output_species.treefile --rank species
# Relabel tree with family names
eplace relabel blast_results.txt input.treefile output_family.treefile --rank family
# Relabel tree with order names
eplace relabel blast_results.txt input.treefile output_order.treefile --rank order
Key Features
Flexible Taxonomic Ranks: Supports all standard taxonomic ranks from phylum to species
Binomial Nomenclature: Species rank uses “genus species” format for proper scientific names
Topology Preservation: Maintains the original tree structure while updating labels
Format Compatibility: Works with standard Newick format trees
Reversed Sequences: Handles sequences with _R_ prefix (from MAFFT orientation)
Label Cleaning: Automatically cleans labels for Newick format compatibility
Use Cases
The eplace relabel command is useful in several scenarios:
Re-labeling at Different Ranks
Generate multiple versions of the same tree with different taxonomic granularity without rebuilding the tree:
# Create genus-level tree eplace relabel blast_results.txt tree.treefile tree_genus.treefile --rank genus # Create family-level tree from same data eplace relabel blast_results.txt tree.treefile tree_family.treefile --rank family
External Tree Tools
Add taxonomic labels to trees generated by external phylogenetic tools:
# After building tree with RAxML, IQTree, or FastTree eplace relabel blast_results.txt external_tree.nwk labeled_tree.nwk --rank genus
Visualization Preparation
Create publication-ready trees with appropriate taxonomic labels:
# For species-level visualization eplace relabel blast_results.txt tree.treefile publication_tree.treefile --rank species
Updating Taxonomy
Update tree labels when taxonomy databases are updated or corrected:
# Re-run BLAST with updated database, then relabel eplace relabel new_blast_results.txt old_tree.treefile updated_tree.treefile
Notes
The BLAST results file must contain taxonomic information for the sequences
Tree file must be in Newick format (standard phylogenetic tree format)
Sequence IDs in the tree must match accession numbers in the BLAST results
Labels are automatically cleaned to comply with Newick format requirements (spaces, colons, parentheses are replaced)
If taxonomy information is missing for a sequence, it will be skipped with a warning
Output
The output is a Newick format tree file with sequence IDs replaced by taxonomic names at the specified rank.
Input tree:
((NZ_CP012345:0.01,NZ_CP067890:0.02):0.03,(NC_012345:0.01,NC_067890:0.02):0.03);
Output tree (--rank genus):
((Escherichia:0.01,Salmonella:0.02):0.03,(Bacillus:0.01,Staphylococcus:0.02):0.03);
Output tree (--rank species):
((Escherichia_coli:0.01,Salmonella_enterica:0.02):0.03,(Bacillus_subtilis:0.01,Staphylococcus_aureus:0.02):0.03);
Workflow Comparison
Individual Workflow (eplace search)
Best for: Analyzing each query sequence in its own phylogenetic context
Process:
Run BLAST search for all queries
Extract representative sequences for each query at specified rank
Create one directory per query
Build one alignment and tree per query
Output: Separate phylogenetic trees for each query sequence
Grouped Workflow (eplace grouped)
Best for: Analyzing multiple related queries together in a single phylogenetic context
Process:
Run BLAST search for all queries
Extract representative sequences for each query
Group queries by specified taxonomic rank (e.g., class, order)
Combine all queries in a group with unique reference sequences
Build one alignment and tree per group
Output: Phylogenetic trees with multiple queries grouped by taxonomy
Common Use Cases
Quick BLAST search without trees
eplace search query.fasta results --skip-alignment
Relabel existing trees
# Relabel tree with different taxonomic ranks
eplace relabel blast_results.txt input.treefile genus_tree.treefile --rank genus
eplace relabel blast_results.txt input.treefile family_tree.treefile --rank family
# Use with trees from external tools
eplace relabel blast_results.txt raxml_tree.nwk labeled_tree.nwk --rank species
High stringency search
eplace search query.fasta results \
--min-identity 95 \
--min-coverage 90
Multi-threaded analysis
eplace search query.fasta results --num-threads 8
Troubleshooting
MMseqs2 memory limits
When searching against large MMseqs2 databases such as NCBI NT, the MMseqs2 prefilter step may require hundreds of GB of RAM. If insufficient memory is available, MMseqs2 may fail with an error similar to:
Cannot fit databases into 22G. Please use a computer with more main memory.
Error: Prefilter died
Error: Search step died
Error: Search died
The workflow exposes the option:
--mmseqs-memory-limit 400G
which is passed to MMseqs2 as:
--split-memory-limit 400G
For full NCBI NT, we recommend running on a high-memory node. Suggested starting values are:
Node RAM |
Suggested |
|---|---|
128 GB |
|
256 GB |
|
512 GB |
|
1 TB |
|
For 16S amplicon/ZOTU workflows, consider using a smaller curated 16S database such as SILVA, GTDB rRNA, RDP, or NCBI 16S rather than full NT. These require far less RAM and often produce equally good taxonomic assignments for 16S data.
Command not found
If eplace command is not found after installation:
# Check if it's installed
pip show eplace
# Reinstall
pip install --force-reinstall .
# Or add to PATH
export PATH="$HOME/.local/bin:$PATH"
Dependencies missing
Some features require external tools:
BLAST+: Required for all workflows
sudo apt-get install ncbi-blast+ # Ubuntu/Debian
brew install blast # macOS
TaxonKit: Required for taxonomy lookups
conda install -c bioconda taxonkit
MAFFT: Required for alignment (unless –skip-alignment)
sudo apt-get install mafft # Ubuntu/Debian
brew install mafft # macOS
IQTree: Required for tree building (unless –skip-alignment)
sudo apt-get install iqtree # Ubuntu/Debian
brew install iqtree # macOS
See Also
Installation - Installation instructions
Workflows - Detailed workflow documentation
BLAST Sequence Comparison Module - Complete workflow guide
NCBI Database Download Module - Database management details