promptprep.aggregator module
- class promptprep.aggregator.CodeAggregator(directory: str | None = None, output_file: str = 'full_code.txt', include_files: Set[str] | None = None, programming_extensions: Set[str] | None = None, exclude_dirs: Set[str] | None = None, exclude_files: Set[str] | None = None, max_file_size_mb: float | None = None, summary_mode: bool = False, include_comments: bool = True, collect_metadata: bool = False, count_tokens: bool = False, token_model: str = 'cl100k_base', output_format: str = 'plain', line_numbers: bool = False, template_file: str | None = None, incremental: bool = False, last_run_timestamp: float | None = None)[source]
Bases:
object- DEFAULT_EXCLUDE_DIRS = {'.git', '__pycache__', 'build', 'dist', 'flask_session', 'node_modules', 'old_files', 'temp', 'venv'}
- DEFAULT_EXCLUDE_FILES = {'full_code.txt'}
- DEFAULT_MAX_FILE_SIZE_MB = 100.0
- DEFAULT_PROGRAMMING_EXTENSIONS = {'.Makefile', '.bat', '.c', '.cmake', '.cmd', '.cpp', '.cs', '.css', '.db', '.fish', '.go', '.gradle', '.h', '.hpp', '.html', '.ini', '.java', '.js', '.json', '.jsx', '.kt', '.less', '.lua', '.md', '.ninja', '.php', '.pl', '.pq', '.pqm', '.ps1', '.psql', '.py', '.r', '.rb', '.rs', '.rst', '.sass', '.scala', '.scss', '.sh', '.sql', '.sqlite', '.swift', '.toml', '.ts', '.tsx', '.vb', '.xml', '.yaml', '.yml', '.zsh'}
- DEFAULT_TOKEN_MODEL = 'cl100k_base'
- aggregate_code() str[source]
Brings together the directory tree and content of programming files into a single document.
- collect_metadata() dict[source]
Gathers stats about the codebase like lines of code and comment ratio.
- compare_files(file1: str, file2: str, output_file: str | None = None, context_lines: int = 3) str[source]
Compares two code files and shows their differences with clear formatting.
- Parameters:
file1 – Path to the first file
file2 – Path to the second file
output_file – Optional path to write the diff results to
context_lines – Number of context lines to include in the diff (default: 3)
- Returns:
String containing the formatted differences
- compare_runs(prev_output: str, current_output: str | None = None, output_file: str | None = None, context_lines: int = 3) str[source]
Compares the current aggregation run with a previous one.
- Parameters:
prev_output – Path to the previous aggregation output file
current_output – Path to the current output file (defaults to self.output_file)
output_file – Optional path to write the diff results to
context_lines – Number of context lines to include in the diff
- Returns:
String containing the formatted differences
- copy_to_clipboard(content: str | None = None) bool[source]
Copies the content to clipboard, with platform-specific handling.
- count_text_tokens(text: str) int[source]
Count the number of tokens in a text string using our tokenizer.