Whisper Dictation by Firsh
Press a hotkey, speak, and your words appear at the cursor. Free and offline after setup.
What it does
- Press a hotkey and start dictating
- Speech is transcribed locally using OpenAI Whisper AI (large-v3-turbo model)
- Text is pasted at your cursor position and is available on the clipboard
- Works in any application
- Private, no cloud, no API keys, meaning all processing is local: no data sent to any server after initial model download
- Any language Whisper recognises: primary and secondary language are configurable during setup
- Batch transcription: drag and drop audio files onto a GUI window
- All your recordings are kept along with transcriptions, available for retry, and can be converted to opus for archival in monthly packages
Why
I first started out with Wispr Flow then tried the then-free alternative, WhisperTyping. However, these tools became paid and I was really fed up when they didn't work, losing some dictations, or the fact that my voice was sent to the cloud and got transcribed there. Even if inference is faster when using a provider, simply the latency and privacy concerns made me want to write a local version without any fancy graphical interface. Basically I wanted the most minimal and performant thing that just gets the job done. I have used it extensively over a month with Claude Code, writing this tool with itself, polishing it adhering to this mindset. So, this became my daily driver as it's really simple but effective.
System requirements
- Windows 10 or 11, 64-bit
- NVIDIA GPU recommended (CUDA 12.4 or 11.8 auto-detected) with any reasonable amount (like 8GB) of VRAM, as the model isn't big at all
- Fast, preferably NVME SSD
- Microphone obviously
- ~2 GB disk space for the AI model and tooling
- Internet connection only for first-time installation
Installation
- Extract the ZIP to any folder (this is portable)
- Right-click
install.ps1→Run with PowerShell(or the executable also runs this for you) - The installer detects your GPU, installs aria2c for fast downloads, procures the right Whisper.cpp build, FFmpeg, and the AI model, asks for your primary and secondary language (ISO codes, e.g.
en,hu,de), detects your microphone, and writes the config file for you - Double-click
firsh-whisper-dictation.exeto start (it is a built AHK script) - If you like it, set
Run at Startupfrom the tray icon (copies a shortcut to your shell:startup folder)
How to use
- Press
AppsKey(the context-menu key, between right Alt and right Ctrl) - Speak: the tray icon shows
Listening...while recording - Press
AppsKeyagain to stop - The transcription is pasted at your cursor and gets copied to the clipboard
Hotkeys
All of these are toggles, not push-to-talk:
| Key | Action |
|---|---|
AppsKey |
Record in primary language / stop |
Ctrl+AppsKey |
Record in secondary language / stop |
Win+AppsKey |
Retry last recording with auto-detect language |
Note:
- There is no separate cancel key
- Retry is useful if you accidentally spoke in the other language and the result is weird or unexpected
- If you speak your secondary language while you started transcription using the primary, it gets translated on the fly, as a happy little accident
Batch transcription
- Right-click the tray icon and
Show Transcribe Window - Drag and drop audio files (MP3, WAV, M4A, OGG, FLAC) onto it
- Wait a bit. Each file is transcribed and saved as a
.txtnext to the original.
Note: Text is automatically split into paragraphs deterministically, but additional processing by an LLM may be required for best results.
Configuration
Editing config.ini is optional, as is auto-created on first run:
RecordingsPathis where audio files are saved (default:Documents\Voice Typing\)MicrophoneNameis set manually if auto-detection picks the wrong deviceEnableLoggingwrites tostt.logfor troubleshootingDevModesaves raw and processed output files side-by-sidePrimaryLanguage/SecondaryLanguageISO codes for your two languages (e.g.en,hu,de,fr)VolumeFadeDurationadjusts audio ducking duration in milliseconds while recordingPrimaryPrompt/SecondaryPromptis the Whisper punctuation hint; built-in for English and Hungarian, empty for others (write your own if needed)
Use the Reload Script from the tray menu after making changes.
Archive recordings
The tray menu's Archive Recordings converts old WAV recordings to 32 kbps Opus (16 kHz mono). Saves roughly 90% disk space, becoming very cheap to store. Files are packed into monthly archives (RAR with recovery record, or ZIP as fallback). Transcription text files are kept as-is. Or you could delete them any time, the Open Recordings Folder tray menu option gets you there.
Troubleshooting
- Microphone not found → set
MicrophoneNameinconfig.inior run the installer again, or just connect (turn on) your microphone you used during installation - No speech detected → check microphone volume in Windows settings
- Slow → The AI model takes a moment to load into GPU memory for every dictation job (then it unloads to not hog it). This is where a fast SSD shines!
- GPU not used → installer falls back to a CPU-optimized build automatically; check
stt.logifEnableLogging=true
Support
Feedback? Chat with me on bsky: @firsh.dev
Enjoying my Whisper Dictation tool? If you find it useful, buy me a tea 🍵 via Stripe or support directly using crypto. Completely optional, but appreciated!
- BTC
bc1qwm8q89vl3hxzfugztf7p744hnuky7f7vk2s0vt - LTC
ltc1qehgstdq043fhy7th4gf8rs76gprd6vw54y3dqa - DOGE
DLPbYnt5afGzK9GQEYkKVt8qGpq5ESCu3M
Credits
This project stands on the shoulders of excellent open-source work:
- OpenAI Whisper — the speech recognition model behind everything
- whisper.cpp by ggml-org — the fast C++ inference engine with CUDA support used here
- ggml-large-v3-turbo model on HuggingFace — the GGML-quantised model weights
- FFmpeg (BtbN builds) — audio conversion and processing
- AutoHotkey v2 — the scripting runtime the dictation tool is built on
- aria2c — multi-threaded downloader used by the installer for fast dependency downloads