Phylogenetic Trees

ePLACE uses IQTree to build phylogenetic trees from multiple sequence alignments.

Overview

After aligning sequences, ePLACE builds phylogenetic trees to show the evolutionary relationships between query sequences and their taxonomic representatives. The tree building step:

  • Uses maximum likelihood methods via IQTree

  • Automatically selects the best-fit substitution model

  • Performs ultrafast bootstrap analysis

  • Labels tree tips with taxonomic information

IQTree Integration

ePLACE uses IQTree2 with automatic model selection and bootstrap support.

Model Selection

IQTree automatically selects the best-fit substitution model using ModelFinder:

iqtree2 -s alignment.fasta -m MFP -B 1000 -T AUTO

Where:

  • -m MFP: ModelFinder Plus - tests and selects best model

  • -B 1000: 1000 ultrafast bootstrap replicates

  • -T AUTO: Automatic thread detection

Supported Models

IQTree tests various nucleotide substitution models including:

  • JC (Jukes-Cantor)

  • F81

  • K2P (Kimura 2-parameter)

  • HKY

  • TN (Tamura-Nei)

  • TNe+I+G and variants

  • GTR and variants

Using the API

Basic Tree Building

from pathlib import Path
from eplace_lib.alignment import build_phylogenetic_tree

# Build tree
success = build_phylogenetic_tree(
    alignment_fasta=Path("aligned.fasta"),
    output_prefix=Path("tree"),
    num_threads=4
)

if success:
    print("Tree built successfully")
    print("Tree file: tree.treefile")
else:
    print("Tree building failed")

Checking IQTree Availability

from eplace_lib.alignment import check_iqtree_available

if check_iqtree_available():
    print("IQTree is available")
else:
    print("IQTree not found - install IQTree to enable tree building")

Adding Taxonomic Labels

from pathlib import Path
from eplace_lib.alignment import label_tree_with_taxonomy

# Label tree tips with taxonomic information
success = label_tree_with_taxonomy(
    tree_file=Path("tree.treefile"),
    output_tree=Path("tree_labeled.treefile"),
    blast_hits=filtered_hits,
    rank="genus"
)

Tree Building in Workflows

Individual Workflow

In the individual workflow (eplace search):

  1. Each query gets its own tree

  2. Tree includes query + representative sequences

  3. Tips are labeled with taxonomic information

  4. Tree files are saved in query-specific directory

eplace search query.fasta output_dir --tree-label-rank genus

Grouped Workflow

In the grouped workflow (eplace grouped):

  1. One tree per taxonomic group

  2. Tree includes all queries in group + unique references

  3. Shows relationships between multiple queries

  4. Useful for comparative analysis

eplace grouped query.fasta output_dir \
    --group-rank family \
    --tree-label-rank genus

Output Files

IQTree produces several output files:

Primary Output

  • *.treefile - Best tree in Newick format (main output)

  • *_labeled.treefile - Tree with taxonomic labels (ePLACE addition)

Supporting Files

  • *.iqtree - Full IQTree report with model selection and statistics

  • *.log - Detailed log of tree building process

  • *.bionj - Initial tree from BioNJ

  • *.mldist - Maximum likelihood distance matrix

  • *.model.gz - Model parameters (if applicable)

  • *.splits.nex - Split support values in NEXUS format

  • *.contree - Consensus tree (if bootstrap performed)

  • *.ckp.gz - Checkpoint file (for resuming interrupted runs)

Tree File Format

Trees are in Newick format:

(query_1:0.05,(ref_1:0.02,ref_2:0.03):0.04);

Labeled trees include taxonomic information:

(query_1:0.05,(ref_1|Escherichia:0.02,ref_2|Salmonella:0.03):0.04);

Visualizing Trees

Using Python

from Bio import Phylo
import matplotlib.pyplot as plt

# Read tree
tree = Phylo.read("tree.treefile", "newick")

# Draw tree
fig = plt.figure(figsize=(10, 8))
Phylo.draw(tree, do_show=False)
plt.tight_layout()
plt.savefig("tree.png", dpi=300)
plt.show()

Using External Tools

  • FigTree: GUI application for viewing and annotating trees

  • iTOL: Interactive Tree Of Life (web-based)

  • ggtree: R package for tree visualization

  • ETE Toolkit: Python framework for tree analysis and visualization

Example with ETE3:

from ete3 import Tree, TreeStyle

# Read tree
t = Tree("tree.treefile")

# Style
ts = TreeStyle()
ts.show_leaf_name = True
ts.show_branch_length = True
ts.show_branch_support = True

# Render
t.render("tree.pdf", tree_style=ts)

Interpreting Trees

Branch Lengths

  • Represent evolutionary distance (substitutions per site)

  • Longer branches = more evolutionary change

  • Scale bar shows units

Bootstrap Support

  • Numbers at nodes indicate support (0-100)

  • >95: Strong support

  • 70-95: Moderate support

  • <70: Weak support

Tree Topology

  • Sister taxa are more closely related

  • Deeper nodes = older divergence

  • Monophyletic groups share common ancestor

Troubleshooting

IQTree not found

If you get “IQTree is not available”:

# Ubuntu/Debian
sudo apt-get install iqtree

# macOS
brew install iqtree

# Conda
conda install -c bioconda iqtree

Tree building fails

Common causes:

  1. Insufficient sequences: Need ≥3 sequences for tree

  2. Poor alignment: Check alignment quality first

  3. Identical sequences: Remove duplicates

  4. No variation: All sequences too similar

Tree building too slow

For faster tree building:

  1. Increase --num-threads

  2. Reduce bootstrap replicates (not recommended for publication)

  3. Use simpler models

  4. Reduce number of sequences

Strange tree topology

If tree structure seems incorrect:

  1. Check alignment quality

  2. Verify sequences are homologous

  3. Check for contamination or misidentification

  4. Consider longer sequences for better resolution

  5. Try different taxonomic ranks for representatives

Advanced Usage

Custom IQTree Parameters

Modify tree building parameters:

import subprocess
from pathlib import Path

def custom_tree_build(alignment: Path, prefix: Path):
    """Build tree with custom IQTree parameters."""
    cmd = [
        "iqtree2",
        "-s", str(alignment),
        "-pre", str(prefix),
        "-m", "GTR+I+G",  # Specific model
        "-B", "2000",      # More bootstrap replicates
        "-alrt", "1000",   # SH-aLRT test
        "-T", "8"
    ]
    subprocess.run(cmd, check=True)

Parsing Tree Files

Extract information from trees:

from Bio import Phylo

# Read tree
tree = Phylo.read("tree.treefile", "newick")

# Get all tips
tips = tree.get_terminals()
print(f"Number of tips: {len(tips)}")

# Calculate tree height
height = tree.total_branch_length()
print(f"Tree height: {height:.4f}")

# Find specific clade
for clade in tree.find_clades():
    if clade.name and "query" in clade.name:
        print(f"Found query: {clade.name}")

Tree Comparison

Compare multiple trees:

from Bio import Phylo
from Bio.Phylo.Consensus import majority_consensus

# Read multiple trees
trees = list(Phylo.parse("trees.nexus", "nexus"))

# Build consensus tree
consensus = majority_consensus(trees, cutoff=0.5)

# Write consensus
Phylo.write(consensus, "consensus.tree", "newick")

Rerooting Trees

Change tree root:

from Bio import Phylo

# Read tree
tree = Phylo.read("tree.treefile", "newick")

# Reroot on outgroup
outgroup = tree.find_any(name="outgroup_name")
tree.root_with_outgroup(outgroup)

# Write rerooted tree
Phylo.write(tree, "rerooted_tree.treefile", "newick")

Best Practices

  1. Alignment Quality: Always check alignment before tree building

  2. Bootstrap Support: Use bootstrap to assess confidence

  3. Model Selection: Let IQTree select best model automatically

  4. Outgroups: Include outgroup if possible for rooting

  5. Visualization: View trees to catch obvious errors

  6. Documentation: Record all parameters for reproducibility

Performance Considerations

Tree building time depends on:

  • Number of sequences

  • Sequence length

  • Model complexity

  • Number of bootstrap replicates

  • Number of threads

  • CPU speed

Typical timings:

  • 10 sequences × 500bp: ~10-30 seconds

  • 50 sequences × 1000bp: ~1-5 minutes

  • 100 sequences × 2000bp: ~10-30 minutes

Using maximum threads significantly improves speed.

Statistical Considerations

Branch Support

  • UFBoot: Ultrafast bootstrap approximation (default)

  • Standard Bootstrap: Classic but slower

  • SH-aLRT: Shimodaira-Hasegawa-like approximate likelihood ratio test

All methods assess confidence in tree topology.

Model Selection Criteria

IQTree uses:

  • BIC: Bayesian Information Criterion (default)

  • AIC: Akaike Information Criterion

  • AICc: Corrected AIC

BIC generally preferred to avoid over-parameterization.

See Also