Command-Line Interface

The eplace command provides a unified interface to all ePLACE functionality through three subcommands.

Installation Verification

After installing the package with pip install . or pip install -e ., the eplace command will be available:

# Verify installation
eplace --help

# Check version
eplace --version

Commands Overview

Command	Description
`eplace download`	Download NCBI BLAST database
`eplace search`	Run individual search workflow (one tree per query; BLAST by default, MMseqs2 via `--search-tool`)
`eplace grouped`	Run grouped BLAST workflow (one tree per taxonomic group)
`eplace relabel`	Relabel phylogenetic tree with taxonomic names

eplace download

Download and setup the NCBI core_nt BLAST database and/or MMseqs2 NT database.

Usage

eplace download [--target {blast,mmseqs2,both}] [--force] [MMSEQS_OPTIONS]

Options

--target {blast,mmseqs2,both}

Database backend(s) to download

Default: blast

--force: Force redownload even if database exists

--mmseqs-db-dir PATH: Path to MMseqs2 database root directory. Defaults to $MMSEQS_DB_DIR, then $MMSEQS2DB, then ~/mmseqs2db.

--mmseqs-threads INT: Number of threads for MMseqs2 download/taxonomy commands

--add-taxonomy: Add taxonomy sidecar files to MMseqs2 NT database after download.

--ncbi-taxonomy PATH: Path to NCBI taxonomy dump directory containing nodes.dmp, names.dmp, and merged.dmp. Required with --add-taxonomy.

--acc2taxid-dir PATH: Path to accession2taxid files. Defaults to $ACC2TAXID_DIR or <ncbi-taxonomy>/accession2taxid.

--taxonomy-workdir PATH: Working directory for MMseqs taxonomy mapping files.

--skip-memory-check: Skip RAM preflight checks for MMseqs2 download/taxonomy.

Examples

# Download BLAST database to default location ($BLASTDB or ~/blastdb)
eplace download

# Download MMseqs2 NT database
eplace download --target mmseqs2 --mmseqs-db-dir /path/to/mmseqs_db

# Download MMseqs2 NT database and add taxonomy sidecar files
eplace download --target mmseqs2 --add-taxonomy --ncbi-taxonomy /path/to/ncbi/taxonomy/current

Notes

BLAST DB location: $BLASTDB or ~/blastdb
MMseqs2 DB location: $MMSEQS_DB_DIR, then $MMSEQS2DB, or ~/mmseqs2db
MMseqs2 NT download typically requires at least 64 GiB RAM
MMseqs2 taxonomy integration typically requires at least 128 GiB RAM

eplace search

Run BLAST search with individual taxonomy analysis. Creates one phylogenetic tree per query sequence.

Usage

eplace search QUERY_FASTA OUTPUT_DIR [OPTIONS]

Required Arguments

QUERY_FASTA: Path to query FASTA file containing sequences to search

OUTPUT_DIR: Output directory for results (will be created if it doesn’t exist)

Optional Arguments

Taxonomy Options

--rank {phylum,class,order,family,genus,species}

Taxonomic rank for representative selection

Default: genus

--tree-label-rank {phylum,class,order,family,genus,species}

Taxonomic rank for tree labeling

Default: genus

Filtering Options

--min-identity FLOAT

Minimum percent identity for BLAST hits

Default: 90.0

--min-coverage FLOAT

Minimum query coverage percentage

Default: 80.0

Database Options

--database NAME

BLAST database name

Default: core_nt

--blastdb-path PATH: Path to BLAST database directory

Performance Options

--num-threads INT

Number of threads for BLAST and alignment

Default: 1

Workflow Options

--overwrite-existing-blast: Overwrite existing BLAST results

--skip-alignment: Skip alignment and tree building steps

--output-classification PATH: Path to output classification TSV file

MMseqs2 Options

These options apply only when --search-tool mmseqs2 is specified.

--mmseqs-memory-limit LIMIT

Maximum RAM for the MMseqs2 prefilter/index step, passed as --split-memory-limit to mmseqs easy-search.

Default: 400G

On smaller hosts (laptops/small VMs), set a lower value explicitly, for example 16G or 32G.

Accepts MMseqs2-style memory strings: a positive integer (no leading zeros) immediately followed by a single-character unit K, M, G, or T (no extra suffix). Invalid values (e.g. 0G not positive, 01G with a leading zero, 400GB where B makes a double unit, or fourhundredG with a non-numeric prefix) will cause an error before the search starts.

--mmseqs-sensitivity FLOAT

MMseqs2 sensitivity setting (1–7.5)

Default: 5.7

--mmseqs-search-type INT

MMseqs2 search type (2 = translated, 3 = nucleotide, 4 = translated nucleotide backtrace)

Default: 3

--mmseqs-database NAME: MMseqs2 database name (default: same as --database)

--mmseqs-db-path PATH: Path to the MMseqs2 database directory

--mmseqs-db-source LABEL: Provenance label for the MMseqs2 database, recorded in search_metadata.json

Examples

# Basic usage with default parameters
eplace search query.fasta output_dir

# With custom parameters
eplace search query.fasta output_dir \
    --rank genus \
    --min-identity 95 \
    --min-coverage 85 \
    --num-threads 4

# Skip alignment and tree building (BLAST only)
eplace search query.fasta output_dir --skip-alignment

# Use custom BLAST database location
eplace search query.fasta output_dir --blastdb-path /path/to/blastdb

# Use MMseqs2 with memory limit for large NT database
eplace search query.fasta output_dir \
    --search-tool mmseqs2 \
    --mmseqs-db-path /path/to/mmseqs_db \
    --mmseqs-memory-limit 400G

Output Structure

output_dir/
├── blast_results.txt              # Raw BLAST results
├── blast_results_annotated.txt    # BLAST results with taxonomic annotations
├── query1_id/
│   ├── query1_id_representatives.fasta          # Representative sequences
│   ├── query1_id_with_query.fasta              # Query + representatives
│   ├── query1_id_trimmed.fasta                 # Trimmed to aligned regions
│   ├── query1_id_aligned.fasta                 # Multiple sequence alignment
│   ├── query1_id_tree.treefile                 # Phylogenetic tree
│   ├── query1_id_tree_labeled.treefile         # Tree with taxonomic labels
│   └── query1_id_tree.* (other IQTree files)
└── ...

eplace grouped

Run BLAST search with grouped taxonomy analysis. Groups queries by taxonomic rank and creates one phylogenetic tree per group.

Usage

eplace grouped QUERY_FASTA OUTPUT_DIR [OPTIONS]

Required Arguments

QUERY_FASTA: Path to query FASTA file containing sequences to search

OUTPUT_DIR: Output directory for results (will be created if it doesn’t exist)

Optional Arguments

Taxonomy Options

--rank {phylum,class,order,family,genus,species}

Taxonomic rank for representative selection

Default: genus

--group-rank {phylum,class,order,family,genus,species}

Taxonomic rank for grouping sequences

Default: class

--tree-label-rank {phylum,class,order,family,genus,species}

Taxonomic rank for tree labeling

Default: genus

--combined-tree-label-rank {phylum,class,order,family,genus,species}

Taxonomic rank for labeling the combined tree (optional)

Default: Not set (combined tree will not be built)

The grouped workflow can create a combined tree from all groups, but this is optional because it can be very time-consuming with large datasets. If you want to build the combined tree, specify this parameter with the desired taxonomic rank for labeling. If not provided, only individual group trees will be built.

Filtering Options

--min-identity FLOAT

Minimum percent identity for BLAST hits

Default: 90.0

--min-coverage FLOAT

Minimum query coverage percentage

Default: 80.0

Database Options

--database NAME

BLAST database name

Default: core_nt

--blastdb-path PATH: Path to BLAST database directory

Performance Options

--num-threads INT

Number of threads for BLAST and alignment

Default: 1

Workflow Options

--overwrite-existing-blast: Overwrite existing BLAST results

--skip-alignment: Skip alignment and tree building steps

--alignment-tolerance INT

Maximum coordinate difference for alignment consistency

Default: 50

--output-classification PATH: Path to output classification TSV file

MMseqs2 Options

These options apply only when --search-tool mmseqs2 is specified.

--mmseqs-memory-limit LIMIT

Maximum RAM for the MMseqs2 prefilter/index step, passed as --split-memory-limit to mmseqs easy-search.

Default: 400G

Accepts MMseqs2-style memory strings: a positive integer (no leading zeros) immediately followed by a single-character unit K, M, G, or T (no extra suffix, e.g. 64G, 400G, 1T). Invalid values (e.g. 0G not positive, 01G with leading zero, 400GB with extra B suffix, fourhundredG with non-numeric prefix) will cause an error before the search starts.

--mmseqs-sensitivity FLOAT

MMseqs2 sensitivity setting (1–7.5)

Default: 5.7

--mmseqs-search-type INT

MMseqs2 search type (2 = translated, 3 = nucleotide, 4 = translated nucleotide backtrace)

Default: 3

--mmseqs-database NAME: MMseqs2 database name (default: same as --database)

--mmseqs-db-path PATH: Path to the MMseqs2 database directory

--mmseqs-db-source LABEL: Provenance label for the MMseqs2 database, recorded in search_metadata.json

Examples

# Basic usage (group by class, default)
eplace grouped query.fasta output_dir

# Group by different taxonomic rank
eplace grouped query.fasta output_dir --group-rank order

# Specify both representative and grouping ranks
eplace grouped query.fasta output_dir --rank genus --group-rank family

# Skip alignment and tree building
eplace grouped query.fasta output_dir --skip-alignment

# Use MMseqs2 with memory limit for large NT database
eplace grouped query.fasta output_dir \
    --search-tool mmseqs2 \
    --mmseqs-db-path /path/to/mmseqs_db \
    --mmseqs-memory-limit 400G

Output Structure

output_dir/
├── blast_results.txt              # Raw BLAST results
├── blast_results_annotated.txt    # BLAST results with taxonomic annotations
├── query1_id/                     # Per-query representative sequences
│   └── query1_id_representatives.fasta
├── Taxonomic_Group_1/             # One directory per taxonomic group
│   ├── Taxonomic_Group_1_combined.fasta        # All queries + unique references
│   ├── Taxonomic_Group_1_trimmed.fasta         # Trimmed to aligned regions
│   ├── Taxonomic_Group_1_aligned.fasta         # Multiple sequence alignment
│   ├── Taxonomic_Group_1_tree.treefile         # Phylogenetic tree
│   ├── Taxonomic_Group_1_tree_labeled.treefile # Tree with taxonomic labels
│   └── Taxonomic_Group_1_tree.* (other IQTree files)
├── combined_all_groups_trimmed.fasta           # Combined alignment of all groups
├── combined_all_groups_aligned.fasta           # Multiple sequence alignment
├── combined_all_groups_tree.treefile           # Combined phylogenetic tree
├── combined_all_groups_tree_labeled.treefile   # Combined tree with taxonomic labels
└── ...

eplace relabel

Relabel a phylogenetic tree with taxonomic names from BLAST results. This command allows you to replace sequence IDs in an existing tree with taxonomic names at any specified rank.

Usage

eplace relabel BLAST_OUTPUT TREE_FILE OUTPUT_TREE [OPTIONS]

Required Arguments

BLAST_OUTPUT

Path to BLAST output file (tabular format with taxonomy)

The BLAST results file should contain taxonomic information for the sequences in the tree.

TREE_FILE

Path to input tree file (Newick format)

The phylogenetic tree to be relabeled with taxonomic names.

OUTPUT_TREE

Path to output relabeled tree file

The new tree file with taxonomic labels will be written to this path.

Optional Arguments

Taxonomy Options

--rank {phylum,class,order,family,genus,species}

Taxonomic rank for tree labeling

Default: genus

When using species, the output will use binomial nomenclature (genus + species).

Database Options

--blastdb-path PATH

Path to BLAST database directory

Optional parameter, not required for relabeling operation.

Examples

# Relabel tree with genus names (default)
eplace relabel blast_results.txt input.treefile output_labeled.treefile

# Relabel tree with species names (genus + species binomial)
eplace relabel blast_results.txt input.treefile output_species.treefile --rank species

# Relabel tree with family names
eplace relabel blast_results.txt input.treefile output_family.treefile --rank family

# Relabel tree with order names
eplace relabel blast_results.txt input.treefile output_order.treefile --rank order

Key Features

Flexible Taxonomic Ranks: Supports all standard taxonomic ranks from phylum to species
Binomial Nomenclature: Species rank uses “genus species” format for proper scientific names
Topology Preservation: Maintains the original tree structure while updating labels
Format Compatibility: Works with standard Newick format trees
Reversed Sequences: Handles sequences with _R_ prefix (from MAFFT orientation)
Label Cleaning: Automatically cleans labels for Newick format compatibility

Use Cases

The eplace relabel command is useful in several scenarios:

Re-labeling at Different Ranks

Generate multiple versions of the same tree with different taxonomic granularity without rebuilding the tree:

# Create genus-level tree
eplace relabel blast_results.txt tree.treefile tree_genus.treefile --rank genus

# Create family-level tree from same data
eplace relabel blast_results.txt tree.treefile tree_family.treefile --rank family

External Tree Tools

Add taxonomic labels to trees generated by external phylogenetic tools:

# After building tree with RAxML, IQTree, or FastTree
eplace relabel blast_results.txt external_tree.nwk labeled_tree.nwk --rank genus

Visualization Preparation

Create publication-ready trees with appropriate taxonomic labels:

# For species-level visualization
eplace relabel blast_results.txt tree.treefile publication_tree.treefile --rank species

Updating Taxonomy

Update tree labels when taxonomy databases are updated or corrected:

# Re-run BLAST with updated database, then relabel
eplace relabel new_blast_results.txt old_tree.treefile updated_tree.treefile

Notes

The BLAST results file must contain taxonomic information for the sequences
Tree file must be in Newick format (standard phylogenetic tree format)
Sequence IDs in the tree must match accession numbers in the BLAST results
Labels are automatically cleaned to comply with Newick format requirements (spaces, colons, parentheses are replaced)
If taxonomy information is missing for a sequence, it will be skipped with a warning

Output

The output is a Newick format tree file with sequence IDs replaced by taxonomic names at the specified rank.

Input tree:
((NZ_CP012345:0.01,NZ_CP067890:0.02):0.03,(NC_012345:0.01,NC_067890:0.02):0.03);

Output tree (--rank genus):
((Escherichia:0.01,Salmonella:0.02):0.03,(Bacillus:0.01,Staphylococcus:0.02):0.03);

Output tree (--rank species):
((Escherichia_coli:0.01,Salmonella_enterica:0.02):0.03,(Bacillus_subtilis:0.01,Staphylococcus_aureus:0.02):0.03);

Workflow Comparison

Individual Workflow (eplace search)

Best for: Analyzing each query sequence in its own phylogenetic context

Process:

Run BLAST search for all queries
Extract representative sequences for each query at specified rank
Create one directory per query
Build one alignment and tree per query

Output: Separate phylogenetic trees for each query sequence

Grouped Workflow (eplace grouped)

Best for: Analyzing multiple related queries together in a single phylogenetic context

Process:

Run BLAST search for all queries
Extract representative sequences for each query
Group queries by specified taxonomic rank (e.g., class, order)
Combine all queries in a group with unique reference sequences
Build one alignment and tree per group

Output: Phylogenetic trees with multiple queries grouped by taxonomy

Common Use Cases

Quick BLAST search without trees

eplace search query.fasta results --skip-alignment

Relabel existing trees

# Relabel tree with different taxonomic ranks
eplace relabel blast_results.txt input.treefile genus_tree.treefile --rank genus
eplace relabel blast_results.txt input.treefile family_tree.treefile --rank family

# Use with trees from external tools
eplace relabel blast_results.txt raxml_tree.nwk labeled_tree.nwk --rank species

High stringency search

eplace search query.fasta results \
    --min-identity 95 \
    --min-coverage 90

Multi-threaded analysis

eplace search query.fasta results --num-threads 8

Troubleshooting

MMseqs2 memory limits

When searching against large MMseqs2 databases such as NCBI NT, the MMseqs2 prefilter step may require hundreds of GB of RAM. If insufficient memory is available, MMseqs2 may fail with an error similar to:

Cannot fit databases into 22G. Please use a computer with more main memory.
Error: Prefilter died
Error: Search step died
Error: Search died

The workflow exposes the option:

--mmseqs-memory-limit 400G

which is passed to MMseqs2 as:

--split-memory-limit 400G

For full NCBI NT, we recommend running on a high-memory node. Suggested starting values are:

Node RAM	Suggested `--mmseqs-memory-limit`
128 GB	`90G`
256 GB	`200G`
512 GB	`400G`
1 TB	`800G`

For 16S amplicon/ZOTU workflows, consider using a smaller curated 16S database such as SILVA, GTDB rRNA, RDP, or NCBI 16S rather than full NT. These require far less RAM and often produce equally good taxonomic assignments for 16S data.

Command not found

If eplace command is not found after installation:

# Check if it's installed
pip show eplace

# Reinstall
pip install --force-reinstall .

# Or add to PATH
export PATH="$HOME/.local/bin:$PATH"

Dependencies missing

Some features require external tools:

BLAST+: Required for all workflows

sudo apt-get install ncbi-blast+  # Ubuntu/Debian
brew install blast                 # macOS

TaxonKit: Required for taxonomy lookups

conda install -c bioconda taxonkit

MAFFT: Required for alignment (unless –skip-alignment)

sudo apt-get install mafft  # Ubuntu/Debian
brew install mafft          # macOS

IQTree: Required for tree building (unless –skip-alignment)

sudo apt-get install iqtree  # Ubuntu/Debian
brew install iqtree          # macOS

Command-Line Interface

Installation Verification

Commands Overview

eplace download

Usage

Options

Examples

Notes

eplace search

Usage

Required Arguments

Optional Arguments

Taxonomy Options

Filtering Options

Database Options

Performance Options

Workflow Options

MMseqs2 Options

Examples

Output Structure

eplace grouped

Usage

Required Arguments

Optional Arguments

Taxonomy Options

Filtering Options

Database Options

Performance Options

Workflow Options

MMseqs2 Options

Examples

Output Structure

eplace relabel

Usage

Required Arguments

Optional Arguments

Taxonomy Options

Database Options

Examples

Key Features

Use Cases

Notes

Output

Workflow Comparison

Individual Workflow (eplace search)

Grouped Workflow (eplace grouped)

Common Use Cases

Quick BLAST search without trees

Relabel existing trees

High stringency search

Multi-threaded analysis

Group related sequences

Troubleshooting

MMseqs2 memory limits

Command not found

Dependencies missing

See Also