Under the hood · Transcription
How Korely turns audio into notes
Free uses Whisper running locally on your CPU for recorded meetings and dropped audio. Pro adds Deepgram in the cloud for live streaming with speaker names. This page covers what each one does, the family of speech tools they belong to, and what stays on your machine.
Concept
What transcription is
Imagine sitting in a meeting with a fast stenographer at your side. They listen to the room, type as people speak, put the right name in front of each line ("Sarah:", "John:"), and hand you a clean transcript at the end. That is what transcription software does, except the stenographer is a small AI model and the typing happens while the audio plays.
Modern transcription tools also do two things a human stenographer cannot do as easily. They can tell different speakers apart from the sound of the voice (called diarization) and they can capitalise named entities like company names, places, and product names automatically. The good ones do both while the conversation is still going on.
Korely uses transcription in two places: live meetings recorded inside the app, and audio or video files you drop in afterwards. Each path uses a different model for the job it is best at.
Free, on your CPU
Free: Whisper, running on your laptop
On the Free tier the stenographer is Whisper, an open weights speech-to-text model released by OpenAI under the MIT license. It ships inside the Korely desktop bundle and runs entirely on your CPU.
- License: MIT, open weights.
- Where it runs: on your CPU, inside the Korely process. The audio never leaves your machine.
- Speed: short recordings (a few minutes) finish in seconds. Long ones can take several minutes depending on your laptop.
- Languages: dozens, including English, Italian, French, Spanish, German.
- Use case: recorded meetings and dropped audio files. Not live streaming on Free, since CPU-side transcription cannot keep up with a live conversation in real time.
Picture buying a portable typewriter you keep in a drawer. Whenever you need a transcript you take it out, feed it the recording, and a few minutes later you have a typed page. It is yours, it is offline, you never have to ask anyone's permission to use it.
Pro, in the cloud
Pro: Deepgram, with speaker names
The Pro tier swaps the portable typewriter for a professional stenographer on the other end of a phone line. The audio of a live meeting streams in real time to Deepgram, one of the leading commercial speech-to-text providers. The transcript comes back in milliseconds, word by word, already labelled with who is speaking.
- Live streaming: sub-second latency from speech to text. You can watch the transcript appear on screen while the meeting is still going.
- Speaker diarization: the model tells different voices apart and labels each line with the right speaker.
- Speaker names: Korely also calls Groq's Llama 3.3 70B in parallel to map "Speaker 1, 2, 3" to real names when people introduce themselves ("Hi, I'm Sarah"). Groq runs that job with sub-second latency so the names appear in real time too.
- Where it runs: in the cloud, on Deepgram's servers, during the call. The audio file of the meeting still lives on your disk.
- Languages: over fifty, with strong support for English.
You pay one Pro subscription. Korely covers the Deepgram and Groq bills as part of the plan, the same way a phone plan covers every call you make without a separate bill from each cell tower.
In the same family
Other transcription tools in the same family
Picture a row of stenographers sitting on a bench. Some work from a small office in your house, some take phone calls from anywhere. Whisper and Deepgram are the two Korely currently uses. Here is the rest of the bench, in case the slot ever changes.
Local / open weights
- OpenAI Whisper (MIT, the model Korely runs today). The baseline every other open speech model is compared against. Ships in sizes from tiny to large-v3.
- Distil-Whisper (Hugging Face, MIT). A slimmed-down student of Whisper large-v3, with about half the parameters and several times the throughput. Picture the same stenographer after a few weeks of speed training: nearly the same accuracy on long recordings, much faster typing.
- NVIDIA Parakeet TDT 0.6B v2 (NVIDIA, CC BY 4.0, commercial use allowed). A throughput-optimised open ASR model. Good fit when you want to transcribe a very large archive quickly on modest hardware.
Cloud / commercial
- Deepgram Nova-3 (the cloud option Korely Pro uses today). Commercial API, more than fifty languages, strong on noisy production audio.
- AssemblyAI Universal-3 (commercial API). Wider language coverage and natural language prompting for domain control. Picture a stenographer you can brief in plain English ("this is a medical interview, prefer Latin terms").
- Soniox (commercial API). Strong on multilingual conversations and translation. The previous live transcription provider used by Korely before the move to Deepgram.
What leaves the machine
What leaves the machine
On Free, nothing. The Whisper model and the audio file both stay on the device. The transcript ends up next to your meeting note in the same vault folder.
On Pro, only the audio stream of a live meeting is sent to Deepgram during the call, and a short text snippet to Groq for the speaker name guess. The recording file itself stays on your disk. After the meeting, all further work (recap, search, knowledge graph) runs either locally or through the cloud AI model you have already opted into.
The vault itself only goes to the Korely cloud if you turn on cloud sync, which is a separate Pro setting. Transcription does not require cloud sync.
Frequently asked
Which transcription tool does Korely use today? +
Free uses Whisper running locally on your CPU for recorded meetings and dropped audio files. Pro adds Deepgram in the cloud for live streaming with speaker diarization. Live speaker name inference goes through Groq Llama 3.3 70B for sub-second latency.
Does my audio leave the machine on Free? +
No. Free transcription runs through Whisper on your CPU. The audio file stays on disk, the transcript is written next to the meeting note, nothing is uploaded.
What changes when I turn on Pro? +
Live streaming transcription routes to Deepgram in the cloud, which adds speaker labels, real-time punctuation, and noise-robust handling. The recording itself still lives on your disk, only the audio stream is sent for the live conversion.
How long is a typical recording before Whisper feels slow? +
Whisper on CPU runs comfortably on short recordings (under fifteen minutes). For longer ones the transcription can take several minutes. Pro Deepgram is sub-second for live streaming. The exact number depends on your laptop and the recording length.
Can I swap to a different transcription engine? +
Free is locked to Whisper today, but the architecture is modular so a swap is mostly a configuration change. Pro routing between Deepgram and other cloud providers (AssemblyAI, Soniox) is a roadmap item.
Record on your laptop. Transcribe on your terms.
Free is fully local with Whisper. Pro adds Deepgram for live streaming with speaker names.