Aggregator Module
The aggregator module is the core component of promptprep that handles file scanning, content extraction, and aggregation.
Module Overview
The aggregator module provides functionality for:
Scanning directories for code files
Filtering files based on various criteria
Extracting content from files
Generating directory trees
Processing files incrementally
Generating diffs between versions
Key Classes and Functions
CodeAggregator
- class promptprep.aggregator.CodeAggregator(directory='.', output_file='full_code.txt', include_files=None, exclude_dirs=None, extensions=None, max_file_size=100.0, include_comments=True, summary_mode=False, line_numbers=False, incremental=False, last_run_timestamp=None)[source]
The main class responsible for aggregating code files.
- Parameters:
directory (str) – The directory to scan for code files (default: current directory)
output_file (str) – The file to save the output to (default: ‘full_code.txt’)
include_files (list) – List of specific files to include (default: None)
exclude_dirs (list) – List of directories to exclude (default: None)
extensions (list) – List of file extensions to include (default: None)
max_file_size (float) – Maximum file size in MB to include (default: 100.0)
include_comments (bool) – Whether to include comments in the output (default: True)
summary_mode (bool) – Whether to extract only signatures and docstrings (default: False)
line_numbers (bool) – Whether to add line numbers to the output (default: False)
incremental (bool) – Whether to process files incrementally (default: False)
last_run_timestamp (float) – Timestamp of the last run for incremental processing (default: None)
- scan_directory()
Scan the directory for code files based on the configured filters.
- Returns:
A list of file paths that match the criteria
- Return type:
- generate_directory_tree()
Generate an ASCII representation of the directory structure.
- Returns:
ASCII directory tree
- Return type:
- process_file(file_path)
Process a single file and extract its content.
- aggregate_code()[source]
Aggregate code from all matching files.
- Returns:
Aggregated code with directory tree and file headers
- Return type:
- save_output(content)
Save the aggregated content to the output file.
- Parameters:
content (str) – The content to save
- Returns:
None
- generate_metadata()
Generate metadata about the processed files.
- Returns:
Metadata as a formatted string
- Return type:
- count_tokens(content, model='cl100k_base')
Count the number of tokens in the content.
- generate_diff(prev_file, context_lines=3)
Generate a diff between the current output and a previous output file.
FileProcessor
- class promptprep.aggregator.FileProcessor(include_comments=True, summary_mode=False, line_numbers=False)
Class responsible for processing individual files.
- Parameters:
- process_file(file_path)
Process a file and extract its content based on the configured options.
- extract_summary(content, file_ext)
Extract function/class signatures and docstrings from the content.
DirectoryTreeGenerator
IncrementalProcessor
- class promptprep.aggregator.IncrementalProcessor(last_run_timestamp=None)
Class responsible for incremental processing.
- Parameters:
last_run_timestamp (float) – Timestamp of the last run (default: None)
- should_process_file(file_path, prev_output_file=None)
Determine if a file should be processed based on its modification time.
DiffGenerator
Usage Examples
Basic Usage
from promptprep.aggregator import CodeAggregator
# Create an aggregator
aggregator = CodeAggregator(
directory='./my_project',
output_file='output.txt',
exclude_dirs=['venv', 'node_modules'],
extensions=['.py', '.js']
)
# Aggregate code
content = aggregator.aggregate_code()
# Save output
aggregator.save_output(content)
With Metadata
from promptprep.aggregator import CodeAggregator
aggregator = CodeAggregator(directory='./my_project')
# Generate metadata
metadata = aggregator.generate_metadata()
# Aggregate code
content = aggregator.aggregate_code()
# Combine metadata and content
full_content = metadata + '\n\n' + content
# Save output
aggregator.save_output(full_content)
Incremental Processing
from promptprep.aggregator import CodeAggregator
import time
# First run
aggregator = CodeAggregator(
directory='./my_project',
output_file='baseline.txt'
)
content = aggregator.aggregate_code()
aggregator.save_output(content)
# Later, after making changes
timestamp = time.time()
incremental_aggregator = CodeAggregator(
directory='./my_project',
output_file='updated.txt',
incremental=True,
last_run_timestamp=timestamp
)
updated_content = incremental_aggregator.aggregate_code()
incremental_aggregator.save_output(updated_content)
Generating Diffs
from promptprep.aggregator import CodeAggregator
aggregator = CodeAggregator(
directory='./my_project',
output_file='current.txt'
)
content = aggregator.aggregate_code()
# Generate diff with a previous version
diff = aggregator.generate_diff('previous.txt', context_lines=5)
# Save diff to a file
with open('diff.txt', 'w') as f:
f.write(diff)