API Reference ============= This page documents the ePLACE Python API for programmatic access. Core Modules ------------ eplace_lib.blast_analysis ~~~~~~~~~~~~~~~~~~~~~~~~~ .. automodule:: eplace_lib.blast_analysis :members: :undoc-members: :show-inheritance: eplace_lib.taxonomy ~~~~~~~~~~~~~~~~~~~ .. automodule:: eplace_lib.taxonomy :members: :undoc-members: :show-inheritance: eplace_lib.sequences ~~~~~~~~~~~~~~~~~~~~ .. automodule:: eplace_lib.sequences :members: :undoc-members: :show-inheritance: eplace_lib.alignment ~~~~~~~~~~~~~~~~~~~~ .. automodule:: eplace_lib.alignment :members: :undoc-members: :show-inheritance: eplace_lib.ncbi_download ~~~~~~~~~~~~~~~~~~~~~~~~~ .. automodule:: eplace_lib.ncbi_download :members: :undoc-members: :show-inheritance: eplace_lib.cli ~~~~~~~~~~~~~~ .. automodule:: eplace_lib.cli :members: :undoc-members: :show-inheritance: Quick Examples -------------- BLAST Analysis ~~~~~~~~~~~~~~ .. code-block:: python from pathlib import Path from eplace_lib import run_blast_search, process_blast_results_for_taxonomy # Run BLAST search with filtering success, filtered_hits = run_blast_search( query_fasta=Path("query.fasta"), output_file=Path("blast_results.txt"), min_identity=90.0, min_coverage=80.0 ) # Extract representative sequences results = process_blast_results_for_taxonomy( blast_hits=filtered_hits, output_dir=Path("output"), rank="genus" ) Database Download ~~~~~~~~~~~~~~~~~ .. code-block:: python from eplace_lib import setup_ncbi_database # Download the core_nt database success, message = setup_ncbi_database() print(f"Success: {success}, Message: {message}") FASTA Reading ~~~~~~~~~~~~~ .. code-block:: python from pathlib import Path from eplace_lib.blast_analysis import FastaReader # Read sequences sequences = FastaReader.read_fasta(Path("input.fasta")) # Get sequence lengths lengths = FastaReader.get_sequence_lengths(Path("input.fasta")) Sequence Alignment ~~~~~~~~~~~~~~~~~~ .. code-block:: python from pathlib import Path from eplace_lib.alignment import align_sequences, build_phylogenetic_tree # Align sequences success = align_sequences( input_fasta=Path("sequences.fasta"), output_fasta=Path("aligned.fasta"), num_threads=4 ) # Build tree success = build_phylogenetic_tree( alignment_fasta=Path("aligned.fasta"), output_prefix=Path("tree"), num_threads=4 ) Data Structures --------------- BlastHit ~~~~~~~~ Represents a single BLAST hit with the following attributes: * ``query_id``: Query sequence identifier * ``subject_id``: Subject (database) sequence identifier * ``percent_identity``: Percentage of identical matches * ``alignment_length``: Length of alignment * ``query_length``: Length of query sequence * ``subject_length``: Length of subject sequence * ``query_start``: Start position in query * ``query_end``: End position in query * ``subject_start``: Start position in subject * ``subject_end``: End position in subject * ``evalue``: Expectation value * ``bit_score``: Bit score * ``query_coverage``: Percentage of query covered by alignment * ``subject_taxonomy``: Dictionary containing taxonomic information (phylum, class, order, family, genus, species) Example usage: .. code-block:: python from eplace_lib.blast_analysis import BlastHit # Create a BlastHit hit = BlastHit( query_id="query1", subject_id="NC_001234.5", percent_identity=95.5, alignment_length=500, query_length=550, subject_length=5000, query_start=1, query_end=500, subject_start=100, subject_end=599, evalue=1e-100, bit_score=900, query_coverage=90.9, subject_taxonomy={"genus": "Escherichia", "species": "coli"} ) Common Workflows ---------------- Complete BLAST to Tree Workflow ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python from pathlib import Path from eplace_lib import ( run_blast_search, process_blast_results_for_taxonomy, ) from eplace_lib.sequences import trim_sequences_to_blast_coordinates from eplace_lib.alignment import align_sequences, build_phylogenetic_tree # Step 1: BLAST search success, filtered_hits = run_blast_search( query_fasta=Path("query.fasta"), output_file=Path("blast_results.txt"), min_identity=90.0, min_coverage=80.0, num_threads=4 ) # Step 2: Extract representatives results = process_blast_results_for_taxonomy( blast_hits=filtered_hits, output_dir=Path("output"), rank="genus" ) # Step 3: Process each query for query_id, fasta_path in results.items(): # Trim sequences trimmed_path = fasta_path.parent / f"{query_id}_trimmed.fasta" trim_sequences_to_blast_coordinates( input_fasta=fasta_path, output_fasta=trimmed_path, blast_hits=filtered_hits ) # Align sequences aligned_path = fasta_path.parent / f"{query_id}_aligned.fasta" align_sequences( input_fasta=trimmed_path, output_fasta=aligned_path, num_threads=4 ) # Build tree tree_prefix = fasta_path.parent / f"{query_id}_tree" build_phylogenetic_tree( alignment_fasta=aligned_path, output_prefix=tree_prefix, num_threads=4 ) Custom BLAST Parameters ~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python from pathlib import Path from eplace_lib.blast_analysis import BlastRunner runner = BlastRunner() # Run BLAST with custom parameters success = runner.run_blastn( query_fasta=Path("query.fasta"), output_file=Path("blast_results.txt"), database="core_nt", num_threads=8, max_target_seqs=500, evalue=1e-10, word_size=11 ) # Parse and filter results hits = runner.parse_blast_results(Path("blast_results.txt")) filtered_hits = runner.filter_blast_hits( hits, min_identity=95.0, min_coverage=90.0 ) Working with Taxonomic Data ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python from eplace_lib.taxonomy import TaxonomyExtractor extractor = TaxonomyExtractor() # Group hits by query grouped_hits = extractor.group_hits_by_query(blast_hits) # Select representatives at different ranks for query_id, query_hits in grouped_hits.items(): # At genus level genus_reps = extractor.select_representatives_by_rank( hits=query_hits, rank="genus", max_per_rank=1 ) # At species level species_reps = extractor.select_representatives_by_rank( hits=query_hits, rank="species", max_per_rank=2 ) Error Handling -------------- Most functions return success indicators and provide error messages: .. code-block:: python from pathlib import Path from eplace_lib import run_blast_search success, result = run_blast_search( query_fasta=Path("query.fasta"), output_file=Path("output.txt"), min_identity=90.0, min_coverage=80.0 ) if not success: print(f"BLAST failed: {result}") else: print(f"Found {len(result)} hits") For functions that don't return tuples, check return values: .. code-block:: python from pathlib import Path from eplace_lib.alignment import align_sequences success = align_sequences( input_fasta=Path("sequences.fasta"), output_fasta=Path("aligned.fasta") ) if not success: print("Alignment failed") Type Hints ---------- ePLACE uses type hints throughout the codebase for better IDE support: .. code-block:: python from pathlib import Path from typing import List, Dict, Tuple from eplace_lib.blast_analysis import BlastHit def process_hits( hits: List[BlastHit], min_identity: float = 90.0 ) -> Tuple[bool, List[BlastHit]]: """Process BLAST hits with type hints.""" filtered = [h for h in hits if h.percent_identity >= min_identity] return True, filtered Logging ------- ePLACE uses Python's logging module. Configure logging in your scripts: .. code-block:: python import logging # Configure logging logging.basicConfig( level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s' ) # Now run ePLACE functions from eplace_lib import run_blast_search Advanced Usage -------------- Custom Database Management ~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python from eplace_lib.ncbi_download import NCBIDownloader downloader = NCBIDownloader() # Get database directory db_dir = downloader.get_blastdb_directory() # Check if database exists exists = downloader.check_database_exists() # Get available files files = downloader.get_available_files() # Download specific file downloader.download_file('core_nt.00.tar.gz', db_dir) Sequence Extraction ~~~~~~~~~~~~~~~~~~~ .. code-block:: python from pathlib import Path from eplace_lib.taxonomy import SequenceExtractor extractor = SequenceExtractor() # Extract specific sequences success = extractor.extract_sequences( sequence_ids=["NC_001234.5", "NC_005678.9"], output_fasta=Path("extracted.fasta"), database="core_nt" ) See Also -------- * :doc:`quickstart` - Quick start guide with examples * :doc:`workflows` - Workflow documentation * :doc:`blast_workflow` - Detailed BLAST workflow guide