Under the hood · Embeddings

How Korely finds notes by meaning, on your machine

Korely uses a small open weights embedding model running locally on your CPU to read every note you write. Today the model in the slot is Nomic embed v1.5, one of a wider family we compare further down. Combined with keyword search it finds notes by both the words you typed and the meaning you had in mind. No API key, no upload, no cost per query.

Download Korely See the product

Concept

What an embedding is

An embedding is a list of numbers that represents the meaning of a piece of text.

Think of every note as a tiny pin you stick on a giant magnetic board. Notes about pricing all cluster together in one corner. Notes about cooking sit somewhere else. Notes that mention the same person hang around that person. The embedding is just the address of where each pin goes on the board. A note titled "team agreed to lower the entry tier by 20%" ends up right next to one titled "decision on pricing rework", even though they share almost no words. That closeness is what powers search by meaning.

One twist: this board is not a normal 2D wall. It has 768 different directions you can move in. A regular map has two, latitude and longitude. A globe has three. An embedding is an address with 768 numbers instead of 2 or 3. We can't picture that many dimensions in our heads, but the math handles them the same way.

The numbers come out of a small neural network called an embedding model. It reads text on one end and emits a fixed length vector on the other. Korely runs one of these models every time you save or query a note, so every note ends up with its own numeric fingerprint, and the search bar can compare fingerprints instead of just matching letters.

The model today

The model in the slot today

Korely today runs Nomic embed v1.5, one option in the family of small open weights embedding models. It ships inside the Korely desktop bundle and is downloaded once on first run.

Vector size: 768 dimensions per note.
Quantization: int8 (q8), so the model runs comfortably on CPU.
License: Apache 2.0, released by Nomic AI.
Disk: around 140 MB for the weights, another ~3 KB per note for the stored vector.
Languages: English, Italian, French, Spanish, German, and a long tail of other languages. A query in one language matches notes written in another.
Retrieval quality on our internal test corpus: NDCG@10 of 0.935, the standard way to score "did the right note end up in the top 10 results".

The architecture is a function: text goes in on one side, a fixed length list of numbers comes out on the other. That function can be swapped for a different one without changing anything else in Korely. The next section walks through the siblings we would pick from if we ever swapped.

Hybrid search

Hybrid search: keyword plus meaning

Imagine looking for a book in a library with two helpers working at the same time. One helper checks the exact title against the catalog. The other helper knows what each book is about, and finds the ones that match the topic. You get two shortlists, and you trust the books that show up high on both. That's hybrid search.

Inside Korely:

Keyword side. SQLite FTS5, the open source full text search engine built into SQLite. It indexes every word in every note, with prefix matching on, so typing "graph" already matches "GraphRAG" the moment you press space.
Semantic side. Nomic embed v1.5 produces a 768 dimension vector for every note and for the live query. The sqlite-vec extension finds the nearest neighbours in milliseconds.
Fusion. Both rankings are merged using Reciprocal Rank Fusion, a simple algorithm that asks "how high did this note appear in either list?" and ranks the union.

A note titled "Q3 pricing rework decisions" surfaces when you type the literal words, and also when you type "what did we agree on pricing last quarter", and also when you type the Italian equivalent "decisione su prezzi terzo trimestre". The keyword side handles the first, the semantic side handles the other two.

In the same family

Other embedding models in the same family

Picture a small library of pocket dictionaries that all do the same job (turn text into a list of numbers) with different tradeoffs (size, languages covered, special tricks). Nomic embed v1.5 is the one currently sitting on Korely's desk. Here are the siblings we would consider if the slot ever changed. All of them are open weights, all of them can run on a normal laptop CPU.

BGE-M3 (BAAI, MIT license). The multilingual all-rounder. One model produces dense, sparse, and multi-vector representations at once, and covers more than 100 languages. The Swiss army knife of the embedding shelf.
mxbai-embed-large-v1 (Mixedbread AI, Apache 2.0). 1024 dimensions with a built in compression trick called Matryoshka, plus support for binary and int8 storage. Picture compressing a photo from JPEG to thumbnail: the index gets up to thirty times smaller while staying useful.
Snowflake Arctic Embed L v2.0 (Snowflake, Apache 2.0). Strong on 74 languages, built on a multilingual base. Good pick if your vault mixes English with less common languages.
GTE-large-en-v1.5 (Alibaba NLP, Apache 2.0). 1024 dimensions and an 8192 token context window, which lets you embed a whole long note in a single pass instead of slicing it first.

There is also a closed-source cloud option from Google, the Gemini embedding family (currently gemini-embedding-001 for text). Picture a courier service: you send the text to Google, you get the numbers back. Higher quality on some benchmarks, but the text leaves your machine and you pay per call. Korely uses the local family by default for that reason.

We did evaluate Google embeddings against the open-weights options before shipping. They are measurably more precise on our test corpus and the latency is low. The price per query is also high enough that running them on every save of every note would make Free unviable. We kept Nomic embed locally as the default. Google embeddings stay on the roadmap as a possible opt-in for Pro users who want maximum recall and accept the cost.

On your CPU

Everything runs on your CPU

The embedding model and the search index both live on disk inside the Korely app folder, next to the SQLite vault. The flow looks like this:

You save or edit a note. Korely runs the text through Nomic embed v1.5 on the CPU, gets a 768 number vector, and stores it in the SQLite file.
You type into the search bar. Korely embeds the live query the same way, queries FTS5 and sqlite-vec in parallel, fuses the rankings, and shows the top results.
A typical embedding takes tens of milliseconds on a modern laptop CPU. Search updates as you type.

In Free mode, with cloud sync off, the text of your notes never touches a network. The model is on disk, the index is on disk, the answer is on disk. Search keeps working on a plane, on a train, on a coffee shop wifi that drops every 30 seconds.

Frequently asked

What is an embedding? +

An embedding is a list of numbers that represents the meaning of a piece of text. Two notes about the same idea, written with different words, end up with similar embeddings. That lets search find a note by its meaning instead of only by the exact words you typed.

Which embedding model does Korely use today? +

Nomic embed v1.5, an open source model under Apache 2.0, released by Nomic AI. It produces 768 dimension vectors, runs on CPU in int8 quantization, and is multilingual. The slot can be swapped for other models in the same family, compared on this page.

Will my embeddings sync if I turn on Pro cloud sync? +

Only the embedding values, never the model. Each note ships with its 768 number vector so the cloud can rank search results, but the Nomic model itself stays on every device. The math runs locally, the ranking just needs the numbers.

How big is the model and how much disk does it use? +

The Nomic embed v1.5 weights, quantized to int8 (q8), are around 140 MB. Each note adds roughly 3 KB of vector data to the SQLite file. A vault of 10,000 notes adds about 30 MB on top of the Markdown itself.

Can I swap the embedding model later? +

In principle yes, since the pipeline is just a function that turns text into a vector. In practice you would need to re-embed the whole vault, which Korely can do in the background on next launch. We have not exposed a model picker in the UI yet, but the architecture supports it.

Try the search yourself

Free forever for the local vault. The model downloads once, then your search runs offline for the rest of its life.

Download Korely How keyword and meaning merge →