Skip to content

[ContentUnderstanding] Add to_llm_input helper for converting analysis results to LLM-friendly text#46386

Open
chienyuanchang wants to merge 13 commits intomainfrom
cu_sdk/llm_input_format_helper
Open

[ContentUnderstanding] Add to_llm_input helper for converting analysis results to LLM-friendly text#46386
chienyuanchang wants to merge 13 commits intomainfrom
cu_sdk/llm_input_format_helper

Conversation

@chienyuanchang
Copy link
Copy Markdown
Member

Description

Adds the to_llm_input() public helper function to the azure-ai-contentunderstanding package. This function converts a CU AnalysisResult into a formatted text string (YAML front matter + markdown body) suitable for injecting into LLM prompts, storing in vector databases, or returning as tool output in agentic workflows.

Key features:

  • Renders all content types: documents (with page markers), audio/video (with time ranges for multi-segment), and classification hierarchies (parent auto-skipped, children rendered with category labels)
  • _resolve_fields() recursively flattens all 9 ContentField subtypes (StringField, NumberField, ObjectField, ArrayField, etc.) into plain Python dicts
  • Span-based page markers from pages[].spans offsets, with <!-- PageBreak --> fallback for older content
  • Minimal built-in YAML serializer (no external dependency) with proper quoting for dates, booleans, and YAML-special characters
  • RAI warnings always included in output regardless of include_fields/include_markdown flags
  • Single AV content omits timeRange; only multi-segment AV includes timeRange per segment (per design spec)
  • Configurable via include_fields, include_markdown, and metadata keyword arguments

Files changed

File Change
azure/ai/contentunderstanding/_helpers.py New file — to_llm_input(), _resolve_fields(), and supporting internal functions
azure/ai/contentunderstanding/_patch.py Import and re-export to_llm_input in __all__
tests/test_to_llm_input.py New file — 60 unit tests across 10 categories (public API, error handling, field resolution, documents, AV, multi-segment, classification, parameters, front matter, edge cases, real CU patterns)

How to verify

cd sdk/contentunderstanding/azure-ai-contentunderstanding
pip install -e .
python -m pytest tests/test_to_llm_input.py -v

@chienyuanchang chienyuanchang marked this pull request as ready for review April 20, 2026 19:03
Copilot AI review requested due to automatic review settings April 20, 2026 19:03
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a public to_llm_input() helper to azure-ai-contentunderstanding to convert AnalysisResult objects into LLM-friendly text (YAML front matter + Markdown), along with unit tests and version/docs updates.

Changes:

  • Introduces to_llm_input() and supporting field/page/YAML rendering helpers.
  • Re-exports to_llm_input from the package public surface and bumps version to 1.1.0.
  • Adds a comprehensive test suite covering content types, rendering rules, and edge cases.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
azure/ai/contentunderstanding/_helpers.py New helper implementation, including minimal YAML serializer and content rendering logic.
azure/ai/contentunderstanding/_patch.py Re-exports to_llm_input via __all__ for package-level import.
tests/test_to_llm_input.py New unit tests validating helper behavior across documents, AV, classification, and flags.
azure/ai/contentunderstanding/_version.py Version bump to 1.1.0.
README.md Adds 1.1.0 to the SDK-to-service-version table.
CHANGELOG.md Adds release notes entry for 1.1.0.

Comment thread sdk/contentunderstanding/azure-ai-contentunderstanding/tests/test_to_llm_input.py Outdated
…at_helper

# Conflicts:
#	sdk/contentunderstanding/azure-ai-contentunderstanding/CHANGELOG.md
#	sdk/contentunderstanding/azure-ai-contentunderstanding/README.md
#	sdk/contentunderstanding/azure-ai-contentunderstanding/azure/ai/contentunderstanding/_version.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants