Aggregator Module

The aggregator module is the core component of promptprep that handles file scanning, content extraction, and aggregation.

Module Overview

The aggregator module provides functionality for:

Scanning directories for code files
Filtering files based on various criteria
Extracting content from files
Generating directory trees
Processing files incrementally
Generating diffs between versions

Key Classes and Functions

CodeAggregator

class promptprep.aggregator.CodeAggregator(directory='.', output_file='full_code.txt', include_files=None, exclude_dirs=None, extensions=None, max_file_size=100.0, include_comments=True, summary_mode=False, line_numbers=False, incremental=False, last_run_timestamp=None)[source]

The main class responsible for aggregating code files.

Parameters:

directory (str) – The directory to scan for code files (default: current directory)
output_file (str) – The file to save the output to (default: ‘full_code.txt’)
include_files (list) – List of specific files to include (default: None)
exclude_dirs (list) – List of directories to exclude (default: None)
extensions (list) – List of file extensions to include (default: None)
max_file_size (float) – Maximum file size in MB to include (default: 100.0)
include_comments (bool) – Whether to include comments in the output (default: True)
summary_mode (bool) – Whether to extract only signatures and docstrings (default: False)
line_numbers (bool) – Whether to add line numbers to the output (default: False)
incremental (bool) – Whether to process files incrementally (default: False)
last_run_timestamp (float) – Timestamp of the last run for incremental processing (default: None)

scan_directory()

Scan the directory for code files based on the configured filters.

Returns:: A list of file paths that match the criteria
Return type:: list

generate_directory_tree()

Generate an ASCII representation of the directory structure.

Returns:: ASCII directory tree
Return type:: str

process_file(file_path)

Process a single file and extract its content.

Parameters:: file_path (str) – Path to the file to process
Returns:: Processed content of the file
Return type:: str

aggregate_code()[source]

Aggregate code from all matching files.

Returns:: Aggregated code with directory tree and file headers
Return type:: str

save_output(content)

Save the aggregated content to the output file.

Parameters:: content (str) – The content to save
Returns:: None

generate_metadata()

Generate metadata about the processed files.

Returns:: Metadata as a formatted string
Return type:: str

count_tokens(content, model='cl100k_base')

Count the number of tokens in the content.

Parameters:

content (str) – The content to count tokens in
model (str) – The tokenizer model to use (default: ‘cl100k_base’)

Returns:

Number of tokens

Return type:

int

generate_diff(prev_file, context_lines=3)

Generate a diff between the current output and a previous output file.

Parameters:

prev_file (str) – Path to the previous output file
context_lines (int) – Number of context lines to include in the diff (default: 3)

Returns:

Diff as a formatted string

Return type:

str

FileProcessor

class promptprep.aggregator.FileProcessor(include_comments=True, summary_mode=False, line_numbers=False)

Class responsible for processing individual files.

Parameters:

include_comments (bool) – Whether to include comments in the output (default: True)
summary_mode (bool) – Whether to extract only signatures and docstrings (default: False)
line_numbers (bool) – Whether to add line numbers to the output (default: False)

process_file(file_path)

Process a file and extract its content based on the configured options.

Parameters:: file_path (str) – Path to the file to process
Returns:: Processed content of the file
Return type:: str

extract_summary(content, file_ext)

Extract function/class signatures and docstrings from the content.

Parameters:

content (str) – The file content
file_ext (str) – The file extension

Returns:

Extracted summary

Return type:

str

add_line_numbers(content)

Add line numbers to the content.

Parameters:: content (str) – The content to add line numbers to
Returns:: Content with line numbers
Return type:: str

DirectoryTreeGenerator

class promptprep.aggregator.DirectoryTreeGenerator(root_dir, exclude_dirs=None, include_files=None)[source]

Class responsible for generating ASCII directory trees.

Parameters:

root_dir (str) – The root directory to generate the tree for
exclude_dirs (list) – List of directories to exclude (default: None)
include_files (list) – List of specific files to include (default: None)

generate_tree()

Generate an ASCII representation of the directory structure.

Returns:: ASCII directory tree
Return type:: str

IncrementalProcessor

class promptprep.aggregator.IncrementalProcessor(last_run_timestamp=None)

Class responsible for incremental processing.

Parameters:: last_run_timestamp (float) – Timestamp of the last run (default: None)

should_process_file(file_path, prev_output_file=None)

Determine if a file should be processed based on its modification time.

Parameters:

file_path (str) – Path to the file to check
prev_output_file (str) – Path to the previous output file (default: None)

Returns:

Whether the file should be processed

Return type:

bool

extract_timestamp_from_file(file_path)

Extract the timestamp from a previous output file.

Parameters:: file_path (str) – Path to the file to extract the timestamp from
Returns:: Extracted timestamp or None if not found
Return type:: float or None

DiffGenerator

class promptprep.aggregator.DiffGenerator(context_lines=3)

Class responsible for generating diffs between versions.

Parameters:: context_lines (int) – Number of context lines to include in the diff (default: 3)

generate_diff(current_content, prev_file)

Generate a diff between the current content and a previous output file.

Parameters:

current_content (str) – The current content
prev_file (str) – Path to the previous output file

Returns:

Diff as a formatted string

Return type:

str

Usage Examples

Basic Usage

from promptprep.aggregator import CodeAggregator

# Create an aggregator
aggregator = CodeAggregator(
    directory='./my_project',
    output_file='output.txt',
    exclude_dirs=['venv', 'node_modules'],
    extensions=['.py', '.js']
)

# Aggregate code
content = aggregator.aggregate_code()

# Save output
aggregator.save_output(content)

With Metadata

from promptprep.aggregator import CodeAggregator

aggregator = CodeAggregator(directory='./my_project')

# Generate metadata
metadata = aggregator.generate_metadata()

# Aggregate code
content = aggregator.aggregate_code()

# Combine metadata and content
full_content = metadata + '\n\n' + content

# Save output
aggregator.save_output(full_content)

Incremental Processing

from promptprep.aggregator import CodeAggregator
import time

# First run
aggregator = CodeAggregator(
    directory='./my_project',
    output_file='baseline.txt'
)
content = aggregator.aggregate_code()
aggregator.save_output(content)

# Later, after making changes
timestamp = time.time()
incremental_aggregator = CodeAggregator(
    directory='./my_project',
    output_file='updated.txt',
    incremental=True,
    last_run_timestamp=timestamp
)
updated_content = incremental_aggregator.aggregate_code()
incremental_aggregator.save_output(updated_content)

Generating Diffs

from promptprep.aggregator import CodeAggregator

aggregator = CodeAggregator(
    directory='./my_project',
    output_file='current.txt'
)
content = aggregator.aggregate_code()

# Generate diff with a previous version
diff = aggregator.generate_diff('previous.txt', context_lines=5)

# Save diff to a file
with open('diff.txt', 'w') as f:
    f.write(diff)