Phylogenetic Trees
==================
ePLACE uses IQTree to build phylogenetic trees from multiple sequence alignments.
Overview
--------
After aligning sequences, ePLACE builds phylogenetic trees to show the evolutionary relationships between query sequences and their taxonomic representatives. The tree building step:
* Uses maximum likelihood methods via IQTree
* Automatically selects the best-fit substitution model
* Performs ultrafast bootstrap analysis
* Labels tree tips with taxonomic information
IQTree Integration
------------------
ePLACE uses IQTree2 with automatic model selection and bootstrap support.
Model Selection
~~~~~~~~~~~~~~~
IQTree automatically selects the best-fit substitution model using ModelFinder:
.. code-block:: bash
iqtree2 -s alignment.fasta -m MFP -B 1000 -T AUTO
Where:
* ``-m MFP``: ModelFinder Plus - tests and selects best model
* ``-B 1000``: 1000 ultrafast bootstrap replicates
* ``-T AUTO``: Automatic thread detection
Supported Models
~~~~~~~~~~~~~~~~
IQTree tests various nucleotide substitution models including:
* JC (Jukes-Cantor)
* F81
* K2P (Kimura 2-parameter)
* HKY
* TN (Tamura-Nei)
* TNe+I+G and variants
* GTR and variants
Using the API
-------------
Basic Tree Building
~~~~~~~~~~~~~~~~~~~
.. code-block:: python
from pathlib import Path
from eplace_lib.alignment import build_phylogenetic_tree
# Build tree
success = build_phylogenetic_tree(
alignment_fasta=Path("aligned.fasta"),
output_prefix=Path("tree"),
num_threads=4
)
if success:
print("Tree built successfully")
print("Tree file: tree.treefile")
else:
print("Tree building failed")
Checking IQTree Availability
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code-block:: python
from eplace_lib.alignment import check_iqtree_available
if check_iqtree_available():
print("IQTree is available")
else:
print("IQTree not found - install IQTree to enable tree building")
Adding Taxonomic Labels
~~~~~~~~~~~~~~~~~~~~~~~
.. code-block:: python
from pathlib import Path
from eplace_lib.alignment import label_tree_with_taxonomy
# Label tree tips with taxonomic information
success = label_tree_with_taxonomy(
tree_file=Path("tree.treefile"),
output_tree=Path("tree_labeled.treefile"),
blast_hits=filtered_hits,
rank="genus"
)
Tree Building in Workflows
---------------------------
Individual Workflow
~~~~~~~~~~~~~~~~~~~
In the individual workflow (``eplace search``):
1. Each query gets its own tree
2. Tree includes query + representative sequences
3. Tips are labeled with taxonomic information
4. Tree files are saved in query-specific directory
.. code-block:: bash
eplace search query.fasta output_dir --tree-label-rank genus
Grouped Workflow
~~~~~~~~~~~~~~~~
In the grouped workflow (``eplace grouped``):
1. One tree per taxonomic group
2. Tree includes all queries in group + unique references
3. Shows relationships between multiple queries
4. Useful for comparative analysis
.. code-block:: bash
eplace grouped query.fasta output_dir \
--group-rank family \
--tree-label-rank genus
Output Files
------------
IQTree produces several output files:
Primary Output
~~~~~~~~~~~~~~
* ``*.treefile`` - Best tree in Newick format (main output)
* ``*_labeled.treefile`` - Tree with taxonomic labels (ePLACE addition)
Supporting Files
~~~~~~~~~~~~~~~~
* ``*.iqtree`` - Full IQTree report with model selection and statistics
* ``*.log`` - Detailed log of tree building process
* ``*.bionj`` - Initial tree from BioNJ
* ``*.mldist`` - Maximum likelihood distance matrix
* ``*.model.gz`` - Model parameters (if applicable)
* ``*.splits.nex`` - Split support values in NEXUS format
* ``*.contree`` - Consensus tree (if bootstrap performed)
* ``*.ckp.gz`` - Checkpoint file (for resuming interrupted runs)
Tree File Format
~~~~~~~~~~~~~~~~
Trees are in Newick format:
.. code-block:: text
(query_1:0.05,(ref_1:0.02,ref_2:0.03):0.04);
Labeled trees include taxonomic information:
.. code-block:: text
(query_1:0.05,(ref_1|Escherichia:0.02,ref_2|Salmonella:0.03):0.04);
Visualizing Trees
-----------------
Using Python
~~~~~~~~~~~~
.. code-block:: python
from Bio import Phylo
import matplotlib.pyplot as plt
# Read tree
tree = Phylo.read("tree.treefile", "newick")
# Draw tree
fig = plt.figure(figsize=(10, 8))
Phylo.draw(tree, do_show=False)
plt.tight_layout()
plt.savefig("tree.png", dpi=300)
plt.show()
Using External Tools
~~~~~~~~~~~~~~~~~~~~
* **FigTree**: GUI application for viewing and annotating trees
* **iTOL**: Interactive Tree Of Life (web-based)
* **ggtree**: R package for tree visualization
* **ETE Toolkit**: Python framework for tree analysis and visualization
Example with ETE3:
.. code-block:: python
from ete3 import Tree, TreeStyle
# Read tree
t = Tree("tree.treefile")
# Style
ts = TreeStyle()
ts.show_leaf_name = True
ts.show_branch_length = True
ts.show_branch_support = True
# Render
t.render("tree.pdf", tree_style=ts)
Interpreting Trees
------------------
Branch Lengths
~~~~~~~~~~~~~~
* Represent evolutionary distance (substitutions per site)
* Longer branches = more evolutionary change
* Scale bar shows units
Bootstrap Support
~~~~~~~~~~~~~~~~~
* Numbers at nodes indicate support (0-100)
* >95: Strong support
* 70-95: Moderate support
* <70: Weak support
Tree Topology
~~~~~~~~~~~~~
* Sister taxa are more closely related
* Deeper nodes = older divergence
* Monophyletic groups share common ancestor
Troubleshooting
---------------
IQTree not found
~~~~~~~~~~~~~~~~
If you get "IQTree is not available":
.. code-block:: bash
# Ubuntu/Debian
sudo apt-get install iqtree
# macOS
brew install iqtree
# Conda
conda install -c bioconda iqtree
Tree building fails
~~~~~~~~~~~~~~~~~~~
Common causes:
1. **Insufficient sequences**: Need ≥3 sequences for tree
2. **Poor alignment**: Check alignment quality first
3. **Identical sequences**: Remove duplicates
4. **No variation**: All sequences too similar
Tree building too slow
~~~~~~~~~~~~~~~~~~~~~~
For faster tree building:
1. Increase ``--num-threads``
2. Reduce bootstrap replicates (not recommended for publication)
3. Use simpler models
4. Reduce number of sequences
Strange tree topology
~~~~~~~~~~~~~~~~~~~~~
If tree structure seems incorrect:
1. Check alignment quality
2. Verify sequences are homologous
3. Check for contamination or misidentification
4. Consider longer sequences for better resolution
5. Try different taxonomic ranks for representatives
Advanced Usage
--------------
Custom IQTree Parameters
~~~~~~~~~~~~~~~~~~~~~~~~~
Modify tree building parameters:
.. code-block:: python
import subprocess
from pathlib import Path
def custom_tree_build(alignment: Path, prefix: Path):
"""Build tree with custom IQTree parameters."""
cmd = [
"iqtree2",
"-s", str(alignment),
"-pre", str(prefix),
"-m", "GTR+I+G", # Specific model
"-B", "2000", # More bootstrap replicates
"-alrt", "1000", # SH-aLRT test
"-T", "8"
]
subprocess.run(cmd, check=True)
Parsing Tree Files
~~~~~~~~~~~~~~~~~~
Extract information from trees:
.. code-block:: python
from Bio import Phylo
# Read tree
tree = Phylo.read("tree.treefile", "newick")
# Get all tips
tips = tree.get_terminals()
print(f"Number of tips: {len(tips)}")
# Calculate tree height
height = tree.total_branch_length()
print(f"Tree height: {height:.4f}")
# Find specific clade
for clade in tree.find_clades():
if clade.name and "query" in clade.name:
print(f"Found query: {clade.name}")
Tree Comparison
~~~~~~~~~~~~~~~
Compare multiple trees:
.. code-block:: python
from Bio import Phylo
from Bio.Phylo.Consensus import majority_consensus
# Read multiple trees
trees = list(Phylo.parse("trees.nexus", "nexus"))
# Build consensus tree
consensus = majority_consensus(trees, cutoff=0.5)
# Write consensus
Phylo.write(consensus, "consensus.tree", "newick")
Rerooting Trees
~~~~~~~~~~~~~~~
Change tree root:
.. code-block:: python
from Bio import Phylo
# Read tree
tree = Phylo.read("tree.treefile", "newick")
# Reroot on outgroup
outgroup = tree.find_any(name="outgroup_name")
tree.root_with_outgroup(outgroup)
# Write rerooted tree
Phylo.write(tree, "rerooted_tree.treefile", "newick")
Best Practices
--------------
1. **Alignment Quality**: Always check alignment before tree building
2. **Bootstrap Support**: Use bootstrap to assess confidence
3. **Model Selection**: Let IQTree select best model automatically
4. **Outgroups**: Include outgroup if possible for rooting
5. **Visualization**: View trees to catch obvious errors
6. **Documentation**: Record all parameters for reproducibility
Performance Considerations
--------------------------
Tree building time depends on:
* Number of sequences
* Sequence length
* Model complexity
* Number of bootstrap replicates
* Number of threads
* CPU speed
Typical timings:
* 10 sequences × 500bp: ~10-30 seconds
* 50 sequences × 1000bp: ~1-5 minutes
* 100 sequences × 2000bp: ~10-30 minutes
Using maximum threads significantly improves speed.
Statistical Considerations
--------------------------
Branch Support
~~~~~~~~~~~~~~
* **UFBoot**: Ultrafast bootstrap approximation (default)
* **Standard Bootstrap**: Classic but slower
* **SH-aLRT**: Shimodaira-Hasegawa-like approximate likelihood ratio test
All methods assess confidence in tree topology.
Model Selection Criteria
~~~~~~~~~~~~~~~~~~~~~~~~
IQTree uses:
* **BIC**: Bayesian Information Criterion (default)
* **AIC**: Akaike Information Criterion
* **AICc**: Corrected AIC
BIC generally preferred to avoid over-parameterization.
See Also
--------
* :doc:`alignment` - Sequence alignment documentation
* :doc:`workflows` - Complete workflow documentation
* :doc:`blast_workflow` - Full pipeline from BLAST to trees
* `IQTree documentation `_
* `Phylogenetics guide `_