PowerShell modules for text-to-speech (TTS) and speech-to-text (STT) across multiple providers.
| Module | TTS | STT | Requires |
|---|---|---|---|
| Speech.Windows | Offline SAPI | Offline SAPI | Windows 10/11 |
| Speech.Azure | 400+ neural voices | Real-time streaming | Azure Speech key |
| Speech.OpenAI | 11 multilingual voices | Whisper (batch) | OpenAI API key |
| Speech.Google | Standard/WaveNet/Neural2 | Batch | Google Cloud credential JSON |
| Speech.Amazon | Neural/standard voices | Real-time streaming | AWS access key + secret key |
| Speech.Core | — | — | (shared config, microphone, output device) |
| Cmdlet | Windows | Linux/macOS |
|---|---|---|
Out-*Speech (all providers) |
Yes | Yes |
Read-AzureSpeech |
Yes | Yes |
Read-GoogleSpeech |
Yes | Yes |
Read-AmazonSpeech |
Yes | Yes |
Read-WindowsSpeech |
Yes | No (SAPI) |
Read-OpenAISpeech |
Yes | No (NAudio WinMM) |
# Windows — no setup needed
Out-WindowsSpeech "Hello, world!"
# Azure
Set-AzureSpeechConfig -Key "your-key" -Region "eastus"
Out-AzureSpeech "Hello" -Language en-US
# OpenAI
Set-OpenAISpeechConfig -Key "sk-..."
Out-OpenAISpeech "Hello" -Voice nova
# Google
Set-GoogleSpeechConfig -Credential "path/to/key.json"
Out-GoogleSpeech "Hello"
# Amazon
Set-AmazonSpeechConfig -AccessKey "AKIA..." -SecretKey "..." -Region "ap-northeast-1"
Out-AmazonSpeech "Hello" -Voice Joanna
# Speech recognition (all providers)
$text = Read-WindowsSpeech
$text = Read-AzureSpeech -Language ja-JP
$text = Read-OpenAISpeech -Language ja
$text = Read-GoogleSpeech -Language ja-JP
$text = Read-AmazonSpeech -Language ja-JPInstall-PSResource SpeechWith PowerShell.MCP, AI can configure everything for you:
Install-PSResource PowerShell.MCP
claude mcp add PowerShell -s user -- "$(Get-MCPProxyPath)"Then just ask:
Install the Az module and help me create an Azure Speech resource.
Help me set up OpenAI Speech. I don't have an API key yet.
Guide me through setting up Google Cloud Speech.
Help me set up Amazon Polly with my AWS credentials.
Say 'Hello world' using Windows Speech.
Windows SAPI works offline with zero configuration — the quickest way to get started.
Settings are stored in ~/Documents/PowerShell/Modules/Speech/SpeechConfig.json. API keys are masked when displayed.
Get-SpeechConfig # View all settings
Get-SpeechConfig -Path # Get config file pathProvider setup
# Get key: Azure Portal > Create "Speech" resource > Keys and Endpoint
# Free tier (F0): 0.5M chars TTS + 5h STT / month
Set-AzureSpeechConfig -Key "your-key" -Region "eastus"
Get-AzureSpeech -Locale ja
Set-AzureSpeechConfig -Voice "ja-JP-NanamiNeural"# Get key: https://platform.openai.com/api-keys
Set-OpenAISpeechConfig -Key "sk-..."
Set-OpenAISpeechConfig -Voice nova -Model tts-1# Get credential: Google Cloud Console > IAM > Service Accounts > Create key (JSON)
Set-GoogleSpeechConfig -Credential "C:\path\to\service-account.json"
Get-GoogleSpeech -Language ja-JP
Set-GoogleSpeechConfig -Voice "ja-JP-Neural2-B"# Get credentials: AWS Console > IAM > Users > Create access key
# Free tier: 5M chars TTS + 60 min STT / month (first 12 months)
Set-AmazonSpeechConfig -AccessKey "AKIA..." -SecretKey "..." -Region "ap-northeast-1"
Get-AmazonSpeech -Language ja-JP
Set-AmazonSpeechConfig -Voice "Mizuki"# No API key needed. Add voices: Settings > Time & language > Speech
Get-WindowsSpeech
Set-WindowsSpeechConfig -Voice "Microsoft Haruka Desktop"Common options
All Out-*Speech cmdlets accept pipeline input and share these patterns:
# Pipeline
"Line 1", "Line 2" | Out-AzureSpeech
# Output device selection (Tab completion available)
Out-AzureSpeech "Hello" -OutputDevice "Speakers (Realtek)"
Set-SpeechConfig -OutputDevice "Speakers (Realtek)" # persist
# Microphone selection
Read-AzureSpeech -Microphone "Headset Microphone"
Set-SpeechConfig -Microphone "Headset Microphone" # persist
# Parameter > config priority for all settings
Out-AzureSpeech "Hello" -Key "temp-key" -Region "westus" # one-time overrideWith PowerShell.MCP configured, AI can speak and listen through your speakers and microphone:
Let's have a voice conversation in English.
When I type 't', start listening and respond by voice.
Find me a good English voice and play a sample.
Any MCP-compatible client that supports PowerShell.MCP can use Speech modules:
- Claude Code (CLI)
- Claude Desktop
- GitHub Copilot (VS Code)
- Any other MCP-compatible client
Each provider has 4 cmdlets following a consistent pattern:
| Verb | Purpose | Example |
|---|---|---|
Out-*Speech |
Text-to-speech | Out-AzureSpeech "Hello" |
Read-*Speech |
Speech-to-text | $text = Read-AzureSpeech |
Get-*Speech |
List voices | Get-AzureSpeech -Locale ja |
Set-*SpeechConfig |
Configure provider | Set-AzureSpeechConfig -Voice "..." |
Plus shared cmdlets in Speech.Core: Get-SpeechConfig, Set-SpeechConfig, Get-Microphone, Test-Microphone.
Use Get-Help <cmdlet> -Full for detailed documentation.
All 24 cmdlets
Speech.Core — Shared configuration and audio devices
Get-SpeechConfig— Display current configuration (-Pathfor file location)Set-SpeechConfig— Set common settings:-Rate,-Volume,-Language,-Microphone,-OutputDeviceGet-Microphone— List audio input devicesTest-Microphone— Test microphone input level
Speech.Azure — Azure Cognitive Services
Out-AzureSpeech— TTS with SSML prosody (-Rate,-Volume,-Pitch,-Language,-Voice)Read-AzureSpeech— Real-time streaming STT (-Language,-Detailed)Get-AzureSpeech— List 400+ neural voices (-Localeto filter)Set-AzureSpeechConfig— Set-Key,-Region,-Voice,-Pitch
Speech.OpenAI — OpenAI Audio API
Out-OpenAISpeech— TTS with 11 voices (-Voice,-Model,-Speed)Read-OpenAISpeech— Whisper batch STT (-Language,-Model)Get-OpenAISpeech— List available voicesSet-OpenAISpeechConfig— Set-Key,-Voice,-Model,-STTModel
Speech.Google — Google Cloud Speech
Out-GoogleSpeech— TTS with Standard/WaveNet/Neural2 (-Voice,-Language,-Speed)Read-GoogleSpeech— Batch STT (-Language)Get-GoogleSpeech— List available voices (-Languageto filter)Set-GoogleSpeechConfig— Set-Voice,-Credential
Speech.Amazon — Amazon Polly / Transcribe
Out-AmazonSpeech— TTS with neural/standard voices (-Voice,-Language,-Rate)Read-AmazonSpeech— Real-time streaming STT (-Language)Get-AmazonSpeech— List available voices (-Languageto filter)Set-AmazonSpeechConfig— Set-AccessKey,-SecretKey,-Region,-Voice
Speech.Windows — Windows SAPI
Out-WindowsSpeech— Offline TTS (-Voice,-Rate,-Volume)Read-WindowsSpeech— Offline STT (-Language,-Confidence,-Detailed)Get-WindowsSpeech— List installed SAPI voices (-Cultureto filter)Set-WindowsSpeechConfig— Set-Voice
Most parameters support Tab or Ctrl+Space completion. Voice and language lists are fetched from each provider's API and cached for the session.
| Cmdlet | Tab-completable Parameters |
|---|---|
Out-WindowsSpeech |
-Voice, -OutputDevice |
Out-AzureSpeech |
-Language, -Voice, -OutputDevice |
Out-OpenAISpeech |
-Model, -Voice, -OutputDevice |
Out-GoogleSpeech |
-Language, -Voice, -OutputDevice |
Out-AmazonSpeech |
-Language, -Voice, -OutputDevice |
Read-WindowsSpeech |
-Culture, -Microphone |
Read-AzureSpeech |
-Language, -Microphone |
Read-OpenAISpeech |
-Language, -Model, -Microphone |
Read-GoogleSpeech |
-Language, -Microphone |
Read-AmazonSpeech |
-Language, -Microphone |
Get-WindowsSpeech |
-Culture |
Get-AzureSpeech |
-Locale |
Get-GoogleSpeech |
-Language |
Get-AmazonSpeech |
-Language |
Set-*SpeechConfig |
-Voice, -Microphone, -OutputDevice |
# Language narrows the voice list
Out-AzureSpeech "Hello" -Language <Tab> -Voice <Tab>
# → en-US-JennyNeural, en-US-GuyNeural, ...
Out-OpenAISpeech "Hello" -Voice <Tab>
# → alloy, ash, ballad, coral, echo, fable, nova, onyx, sage, shimmer, verse
Read-AzureSpeech -Language <Tab>
# → en-US, ja-JP, zh-CN, ...
Read-OpenAISpeech -Microphone <Tab>
# → Headset Microphone, Microphone Array, ...Common issues
"key not configured" / "credential not configured"
Run the provider's Set-*Config cmdlet. See Get-Help Set-AzureSpeechConfig -Full.
No microphone input
Get-Microphone # List devices
Test-Microphone # Check input level (> 30 = OK)Windows STT not recognizing language Install language pack: Settings > Time & language > Language & region > Add language > "Speech" feature.
Third-party: NAudio (MIT), Azure Speech SDK (MIT), AWS SDK for .NET (Apache-2.0).