promptprep.aggregator module

class promptprep.aggregator.CodeAggregator(directory: str | None = None, output_file: str = 'full_code.txt', include_files: Set[str] | None = None, programming_extensions: Set[str] | None = None, exclude_dirs: Set[str] | None = None, exclude_files: Set[str] | None = None, max_file_size_mb: float | None = None, summary_mode: bool = False, include_comments: bool = True, collect_metadata: bool = False, count_tokens: bool = False, token_model: str = 'cl100k_base', output_format: str = 'plain', line_numbers: bool = False, template_file: str | None = None, incremental: bool = False, last_run_timestamp: float | None = None)[source]

Bases: object

DEFAULT_EXCLUDE_DIRS = {'.git', '__pycache__', 'build', 'dist', 'flask_session', 'node_modules', 'old_files', 'temp', 'venv'}
DEFAULT_EXCLUDE_FILES = {'full_code.txt'}
DEFAULT_MAX_FILE_SIZE_MB = 100.0
DEFAULT_PROGRAMMING_EXTENSIONS = {'.Makefile', '.bat', '.c', '.cmake', '.cmd', '.cpp', '.cs', '.css', '.db', '.fish', '.go', '.gradle', '.h', '.hpp', '.html', '.ini', '.java', '.js', '.json', '.jsx', '.kt', '.less', '.lua', '.md', '.ninja', '.php', '.pl', '.pq', '.pqm', '.ps1', '.psql', '.py', '.r', '.rb', '.rs', '.rst', '.sass', '.scala', '.scss', '.sh', '.sql', '.sqlite', '.swift', '.toml', '.ts', '.tsx', '.vb', '.xml', '.yaml', '.yml', '.zsh'}
DEFAULT_TOKEN_MODEL = 'cl100k_base'
aggregate()[source]
aggregate_code() str[source]

Brings together the directory tree and content of programming files into a single document.

collect_metadata() dict[source]

Gathers stats about the codebase like lines of code and comment ratio.

compare_files(file1: str, file2: str, output_file: str | None = None, context_lines: int = 3) str[source]

Compares two code files and shows their differences with clear formatting.

Parameters:
  • file1 – Path to the first file

  • file2 – Path to the second file

  • output_file – Optional path to write the diff results to

  • context_lines – Number of context lines to include in the diff (default: 3)

Returns:

String containing the formatted differences

compare_runs(prev_output: str, current_output: str | None = None, output_file: str | None = None, context_lines: int = 3) str[source]

Compares the current aggregation run with a previous one.

Parameters:
  • prev_output – Path to the previous aggregation output file

  • current_output – Path to the current output file (defaults to self.output_file)

  • output_file – Optional path to write the diff results to

  • context_lines – Number of context lines to include in the diff

Returns:

String containing the formatted differences

copy_to_clipboard(content: str | None = None) bool[source]

Copies the content to clipboard, with platform-specific handling.

count_text_tokens(text: str) int[source]

Count the number of tokens in a text string using our tokenizer.

file_mod_times: Dict[str, float]
is_file_size_within_limit(file_path: str) bool[source]

Check if the file size is within our configured limit.

is_programming_file(filename: str) bool[source]
should_exclude(path: str) bool[source]
should_include(file_path: str) bool[source]
write_to_file(content: str | None = None, filename: str | None = None) None[source]

Writes the aggregated content to a file with appropriate extension based on format.

class promptprep.aggregator.DirectoryTreeGenerator(exclude_dirs: Set[str] | None = None, include_files: Set[str] | None = None, exclude_files: Set[str] | None = None, programming_extensions: Set[str] | None = None)[source]

Bases: object

generate(start_path: str) str[source]

Creates an ASCII representation of the directory structure starting from the given path.