This document describes the terminology dictionaries for maintaining translation consistency across the Python documentation project.
The translation dictionary project provides curated key terms and their translations to help translators maintain consistent terminology usage across different documents. The dictionaries are maintained using LLM knowledge to identify and categorize important Python terminology.
The complete terminology dictionary containing important terms identified from Python documentation. Contains:
- source_term: The original English term
- translated_term: The corresponding Chinese (Traditional) translation
- frequency: Number of occurrences across all files
- files_count: Number of different files containing this term
- source_file: Example file where this term was found
- directory: Directory of the source file
- example_files: List of up to 5 files containing this term
Total entries: ~14,700 unique terms
A curated subset of ~2,900 terms focusing on the most important Python terminology. Includes additional columns:
- priority: High/Medium priority classification
- category: Term classification
- Core Concepts (7 terms): class, function, method, module, package, object, type
- Built-in Types (9 terms): int, str, list, dict, tuple, set, float, bool, complex
- Keywords/Constants (8 terms): None, True, False, return, import, def, async, await
- Exceptions (690 terms): All *Error and *Exception terms
- Code Elements (825 terms): Terms in backticks, magic methods
- Common Terms (1,365 terms): Frequently used technical terms
The terminology dictionaries are maintained using LLM knowledge to identify important Python terms and their translations. The dictionaries can be updated as needed to reflect new terminology or improved translations.
- Start with
focused_terminology_dictionary.csv - Learn standard translations for core Python concepts
- Reference high-frequency terms for consistency
- Check new translations against the dictionary
- Verify consistent terminology usage
- Update dictionary when establishing new standard translations
- Track translation progress for key technical terms
- Identify terminology needing standardization
- Prioritize translation efforts using frequency data
CSV files use UTF-8 encoding to properly handle Chinese characters. Compatible with Excel, Google Sheets, and other spreadsheet applications.
To extend pattern recognition, modify extract_key_terms() function in extract_terminology.py:
# Add new technical patterns
tech_patterns = [
r'\b(?:new_pattern_here)\b',
# existing patterns...
]Modify filtering criteria in is_significant_term() and create_focused_dictionary() functions.
- Current processing: ~509 files in 2-3 minutes
- Memory usage: ~50MB peak
- Scalable to larger repositories
This documentation provides comprehensive guidance for maintaining and using the translation dictionary system to ensure consistent, high-quality Python documentation translation.