A chatbot that remembers every customer
Giulia opens your support chat for the third time this month. A stateless bot asks for her email again, asks what plan she is on again, and has no idea she already reported the same shipping problem twice. She types the whole story a third time.
With Korely in the loop, the same bot opens with: "Hi Giulia, I can see your Advanced plan and the open shipping issue from last week. I'll follow up by email as usual, unless you'd prefer otherwise." Same LLM, same prompt template. The difference is four API calls. This cookbook walks through all four at the function level.
The snippets below use the Python SDK (pip install korely-memory).
The same four calls work over the
REST API and the
Node SDK
(npm install korely-memory).
The four calls
1. On chat open: load who she is
Before the first LLM turn, fetch the customer's active facts. This is a deterministic read, pure SQL and graph lookups with no model in the path, typically under 50 ms. It fits inside your time-to-first-token budget.
from korely_memory import Korely
korely = Korely(api_key="kor_live_...", region="eu")
facts = korely.get_facts(user_id="customer-giulia-4812")
The response is a flat list of her active facts: typed
(subject, predicate, object) triples the graph extracts
automatically. A fact is live while its invalid_at is
null; predicate is the normalized verb (the raw
phrasing she used is kept in predicate_raw), and each fact
carries a predicate_family for grouping. (Need them grouped
by family instead? Call get_profile.)
{ "facts": [ { "id": "fct_a1", "subject": "customer-giulia-4812", "predicate": "has_plan", "object": "Advanced", "predicate_family": "other", "predicate_raw": "has_plan", "valid_from": "2026-05-02T09:14:00Z", "invalid_at": null, "invalidated_by": null, "source_memory_id": "mem_91a2" }, { "id": "fct_a2", "subject": "customer-giulia-4812", "predicate": "has_open_issue", "object": "shipping delay, order #88412", "predicate_family": "events", "predicate_raw": "has_open_issue", "valid_from": "2026-06-03T16:40:00Z", "invalid_at": null, "invalidated_by": null, "source_memory_id": "mem_2b91d4" }, { "id": "fct_a3", "subject": "customer-giulia-4812", "predicate": "likes", "object": "email follow-ups", "predicate_family": "preferences", "predicate_raw": "prefers", "valid_from": "2026-05-19T11:02:00Z", "invalid_at": null, "invalidated_by": null, "source_memory_id": "mem_5c0d" } ], "total": 3}Render that into your system prompt as a compact context block and the bot greets her like it knows her, because it does.
Turn 0
Giulia opens the chat
- Third visit this month
- Your bot resolves her to user_id "customer-giulia-4812"
Korely
get_facts(user_id=...)
- Active typed facts only (invalid_at = null)
- Deterministic read, typically under 50 ms
Turn 1
Personalized greeting
- "I can see your Advanced plan and the open shipping issue"
- Zero questions she already answered
2. During the chat: search her history
When she mentions the shipping problem, don't make the LLM guess. Search her memories, scoped to her and only her:
hits = korely.search( "shipping complaint", user_id="customer-giulia-4812", limit=5,)
What comes back is ranked retrieval, not generated text — semantic vector
search (cosine over embeddings) scoped to her user_id. The
only model call on the read path is the query embedding, a fraction of a
hundredth of a cent. Your bot's own model does the reasoning over the
results. Each hit is {id, score, snippet, user_id, agent_id,
metadata} — the snippet is a short excerpt
(≤280 chars), not the full memory:
{ "results": [ { "id": "mem_2b91d4", "score": 0.93, "snippet": "Order #88412 delayed at the Bologna hub, second report. Promised an email update within 48h.", "user_id": "customer-giulia-4812", "agent_id": "support-bot", "metadata": {} }, { "id": "mem_77c0ae", "score": 0.81, "snippet": "First shipping complaint for order #88412. Courier marked the address as incomplete.", "user_id": "customer-giulia-4812", "agent_id": "support-bot", "metadata": {} } ]}
Filters are additive (AND): user_id alone searches everything
stored about Giulia across every conversation; add run_id to
narrow to one session. Another customer's memories can never leak into her
results. The scope is enforced server-side, not by prompt discipline.
3. On new info: write it down
Giulia mentions she'd rather not be called on the phone. Store it:
result = korely.add( "Prefers email follow-ups, not phone", user_id="customer-giulia-4812", agent_id="support-bot",)
The call returns the stored memory immediately. Fact extraction runs
asynchronously, so facts is often empty on the immediate
response and populates a few seconds later (read it back with
get_facts at the next chat open). This is the bi-temporal
part: if Giulia previously preferred phone calls, that old fact is not
deleted when the new one lands. It gets an invalid_at
timestamp, stops being served by get_facts, and survives for
audit and point-in-time queries (as_of). Each extracted fact
on the write shape lists the ids it superseded in its
invalidated array:
{ "id": "mem_d3f7", "content": "Prefers email follow-ups, not phone", "user_id": "customer-giulia-4812", "agent_id": "support-bot", "run_id": null, "metadata": {}, "created_at": "2026-06-11T10:22:00Z", "updated_at": "2026-06-11T10:22:00Z", "facts": [ { "id": "fct_e1", "subject": "customer-giulia-4812", "predicate": "likes", "object": "email follow-ups", "predicate_family": "preferences", "valid_from": "2026-06-11T10:22:00Z", "invalidated": ["fct_b0"] } ]}4. On account change: let the contradiction engine work
You don't write supersede logic. When Giulia upgrades and the bot (or your billing webhook) writes "Giulia upgraded to the Advanced plan", the two-stage contradiction detector finds the existing "Giulia has plan Basic" fact, same predicate with a conflicting object, and supersedes it:
{ "id": "mem_e8a1", "content": "Giulia upgraded to the Advanced plan", "user_id": "customer-giulia-4812", "agent_id": "support-bot", "metadata": {}, "created_at": "2026-06-11T10:25:00Z", "updated_at": "2026-06-11T10:25:00Z", "facts": [ { "id": "fct_f2", "subject": "Giulia", "predicate": "has_plan", "object": "Advanced", "predicate_family": "other", "valid_from": "2026-06-11T10:25:00Z", "invalidated": ["fct_a0"] } ]}The write path is where the intelligence runs: document and chunk embeddings, entity extraction on our own infrastructure, typed-fact extraction with contradiction checking and bi-temporal validity. About a tenth of a cent per memory, all included. Nobody edits anything by hand.
Scoping: one agent, unlimited customers
The scoping model is three free-form strings, and it maps one-to-one onto the scoping you already use elsewhere (see the migration guide):
| Parameter | What it identifies | Example |
|---|---|---|
user_id | Your end user. Free-form string, you choose it. | "customer-giulia-4812" |
agent_id | Your application's namespace. | "support-bot" |
run_id | One session or conversation. | "chat-2026-06-11-a" |
The part that matters for your bill: a support bot serving 10,000
customers is one agent_id with 10,000
user_id values, and end users are unlimited on every tier,
including the free one. What's metered is volume: memories written and
searches per month (Hobby: 1k/25k, Developer €19: 5k/250k, Team €79:
25k/1M). Reads are retrieval, not generation, which is why the search
quotas are an order of magnitude more generous than the write quotas. No
overage billing: we email you at 80%, and past a +10% soft cap you get a
clean 429, never a surprise invoice.
Why not just stuff the history into the prompt?
Prompt-stuffing is the default pattern, and it makes chatbots expensive and forgetful at the same time. Three concrete reasons:
- Token cost. Re-sending a 20-turn history is thousands
of tokens, every turn, forever, and it grows with every message. A
facts block from
get_factsis a few hundred tokens, fetched once at chat open, and it grows with what's worth remembering, not with conversation length. - Staleness. A raw history contains both "I'm on Basic"
(March) and "I upgraded to Advanced" (May), and you're trusting the LLM
to pick the right one every single time. Bi-temporal facts resolve this
before the LLM sees the context:
get_factsreturns only the facts that are valid now. - Forgetting. When Giulia says "forget my data", deleting her
from conversation logs scattered across your own database is a project.
With Korely it's one call,
DELETE /v1/users/customer-giulia-4812/memories, and every memory and fact scoped to heruser_idis forgotten.
The trust angle
This deployment takes two shapes, and the data-ownership story differs:
- B2C2C (your end users are also Korely users, think personal assistants writing into someone's own memory store): the end user sees what the bot remembers. The Memory Panel lists every fact, editable and forgettable by the user themselves. Correcting and erasing what the bot knows are product features the user controls, not a support ticket.
- Pure B2B2C (the Giulia scenario, she's your customer, not ours): you own the data, and the deletion surface is API-side. Your privacy policy, your deletion endpoint wiring, our one-call bulk delete underneath. Data is EU-hosted on our own infrastructure either way.
One honest caveat: the write path
(korely.add) runs the extraction pipeline asynchronously, so
newly extracted facts land shortly after the write returns, not within
the same request. Write fire-and-forget during the chat and rely on
get_facts at the next chat open. The read path is
the deterministic, fast one.
Where to go next
Temporal facts explains supersede and point-in-time queries under the hood; memory model covers the full scoping model; the REST API reference is the published contract the snippets above come from.