promptprep.aggregator module

class promptprep.aggregator.CodeAggregator(directory: str | None = None, output_file: str = 'full_code.txt', include_files: Set[str] | None = None, programming_extensions: Set[str] | None = None, exclude_dirs: Set[str] | None = None, exclude_files: Set[str] | None = None, max_file_size_mb: float | None = None, summary_mode: bool = False, include_comments: bool = True, collect_metadata: bool = False, count_tokens: bool = False, token_model: str = 'cl100k_base', output_format: str = 'plain', line_numbers: bool = False, template_file: str | None = None, incremental: bool = False, last_run_timestamp: float | None = None)[source]

Bases: object

DEFAULT_EXCLUDE_DIRS = {'.git', '__pycache__', 'build', 'dist', 'flask_session', 'node_modules', 'old_files', 'temp', 'venv'}

DEFAULT_EXCLUDE_FILES = {'full_code.txt'}

DEFAULT_MAX_FILE_SIZE_MB = 100.0

DEFAULT_PROGRAMMING_EXTENSIONS = {'.Makefile', '.bat', '.c', '.cmake', '.cmd', '.cpp', '.cs', '.css', '.db', '.fish', '.go', '.gradle', '.h', '.hpp', '.html', '.ini', '.java', '.js', '.json', '.jsx', '.kt', '.less', '.lua', '.md', '.ninja', '.php', '.pl', '.pq', '.pqm', '.ps1', '.psql', '.py', '.r', '.rb', '.rs', '.rst', '.sass', '.scala', '.scss', '.sh', '.sql', '.sqlite', '.swift', '.toml', '.ts', '.tsx', '.vb', '.xml', '.yaml', '.yml', '.zsh'}

DEFAULT_TOKEN_MODEL = 'cl100k_base'

aggregate()[source]

aggregate_code() → str[source]: Brings together the directory tree and content of programming files into a single document.

collect_metadata() → dict[source]: Gathers stats about the codebase like lines of code and comment ratio.

compare_files(file1: str, file2: str, output_file: str | None = None, context_lines: int = 3) → str[source]

Compares two code files and shows their differences with clear formatting.

Parameters:

file1 – Path to the first file
file2 – Path to the second file
output_file – Optional path to write the diff results to
context_lines – Number of context lines to include in the diff (default: 3)

Returns:

String containing the formatted differences

compare_runs(prev_output: str, current_output: str | None = None, output_file: str | None = None, context_lines: int = 3) → str[source]

Compares the current aggregation run with a previous one.

Parameters:

prev_output – Path to the previous aggregation output file
current_output – Path to the current output file (defaults to self.output_file)
output_file – Optional path to write the diff results to
context_lines – Number of context lines to include in the diff

Returns:

String containing the formatted differences

copy_to_clipboard(content: str | None = None) → bool[source]: Copies the content to clipboard, with platform-specific handling.

count_text_tokens(text: str) → int[source]: Count the number of tokens in a text string using our tokenizer.

file_mod_times: Dict[str, float]

is_file_size_within_limit(file_path: str) → bool[source]: Check if the file size is within our configured limit.

is_programming_file(filename: str) → bool[source]

should_exclude(path: str) → bool[source]

should_include(file_path: str) → bool[source]

write_to_file(content: str | None = None, filename: str | None = None) → None[source]: Writes the aggregated content to a file with appropriate extension based on format.

class promptprep.aggregator.DirectoryTreeGenerator(exclude_dirs: Set[str] | None = None, include_files: Set[str] | None = None, exclude_files: Set[str] | None = None, programming_extensions: Set[str] | None = None)[source]

Bases: object

generate(start_path: str) → str[source]: Creates an ASCII representation of the directory structure starting from the given path.