The Context Layer for Data Agents
The Context Layer for Data Agents: A Deep Dive into ktx
ktx is a self-improving context layer designed to teach data-agents how to query your warehouse with confidence. It goes beyond traditional semantic layers by automatically ingesting and organizing company knowledge, mapping the data stack, and building a reusable semantic surface that agents can rely on. The result is faster, more accurate questions answered by agents like Claude Code, Codex, Cursor, or OpenCode—without incurring extra usage charges from ktx itself. This blog post walks you through what ktx is, how it works, who it’s for, and how to get started.
Why ktx is needed in modern data environments
General-purpose agents often stumble when working with data warehouses. They tend to re-scan your warehouse for every question, invent their own metric logic, and produce results that clash with established definitions. Traditional semantic layers attempt to fix this, but they require ongoing manual upkeep and rarely absorb the broader knowledge scattered across your organization. ktx solves both problems in a unified flow:
- Learn from company knowledge: ktx ingests wiki content, curates it, eliminates duplicates, and flags contradictions for human review. This helps agents align with business definitions beyond raw schemas.
- Map the data stack: it samples tables, captures metadata and usage patterns, identifies joinable columns, and annotates sources. Agents then write queries with better context and fewer ad-hoc decisions.
- Build a semantic layer: by combining raw tables with high-level metrics via a join graph, ktx resolves common data access traps (like chasms and fan traps). Agents fetch metrics declaratively, rather than reconstructing canonical SQL for every prompt.
- Serve agents at execution: ktx exposes a CLI and MCP (Management/Computing Process) toolkit with full-text and semantic search across both wiki content and the semantic-layer entities.
This integrated approach dramatically reduces drift between what business teams expect and what agents return, while preserving the safety of controlled, reusable definitions.
How ktx works: a lifecycle from ingestion to action
kt x is not a static layer. It continuously learns, maps, and serves the data context that agents rely on. The lifecycle can be summarized in a few key phases:
- Ingest and unify knowledge
- ktx ingests various sources of business knowledge, from dbt docs and Looker definitions to Notion pages and team wikis.
- It organizes content, removes duplicates, and highlights contradictions for human review.
- The result is a coherent knowledge graph that agents can reference when constructing queries.
- Map the data stack
- The layer samples tables and captures metadata, usage patterns, and column relationships.
- It detects joinable columns and annotates sources so agents can write more accurate SQL with fewer prompts.
- This mapping reduces the likelihood that an agent will “make up” a metric or misinterpret a table relationship.
- Build semantic layer
- Raw tables are integrated with high-level metrics through a join graph.
- The design resolves common trap scenarios (e.g., chasm and fan traps) to provide a stable, declarative surface.
- Agents can fetch metrics from this semantic layer rather than re-deriving canonical SQL for every request.
- Execution-time serving
- The CLI and MCP tools give agents a single, searchable surface that spans both wiki content and the semantic layer.
- Queries—whether for a quick check or a deep metric—are grounded in approved definitions.
- The filesystem and local execution model keeps data processing private, secure, and offline-ready when needed.
While many data tools require either ad hoc querying or separate semantic layers, ktx fuses ingest, mapping, and semantic modeling into a unified experience. The result is an environment where agents can operate with a higher degree of trust and consistency.
How ktx compares to other approaches
To understand the value proposition, it helps to contrast ktx with two common models: general-purpose agents and traditional semantic layers.
General-purpose agents
Build warehouse context automatically is not guaranteed by default; agents may reproduce your schema or wander through it.
Joinable columns and trap resolution depend on manual configuration or ad hoc prompts.
Metric definitions, when present, may be inconsistently applied if not codified centrally.
Wiki and team knowledge are often outside the immediate data-scape and can be overlooked.
Contradictions across sources may go unnoticed unless someone actively audits them.
Execution-time tooling is often separate from the agent prompt workflow, leading to disjoint experiences.
Typically read-only or partially integrated, making it harder to enforce a single source of truth.
Traditional semantic layers
They provide a layer of metrics and predefined joins but often require ongoing maintenance to stay aligned with business terms.
Manual effort is needed to keep definitions up to date and to map them to warehouse schemas.
The integration with scattered knowledge bases is limited, leaving a gap between business intent and technical implementation.
ktx
Builds warehouse context automatically, reducing manual setup and drift.
Detects joinable columns and resolves fan/chasm traps as part of the mapping stage.
Offers approved, reusable metric definitions that are easy to share and reuse.
Absorbs wiki, Notion, and team knowledge, consolidating it into a single searchable surface.
Flags contradictions across sources so teams can review and harmonize definitions.
Ships a CLI plus MCP for agent execution, providing a cohesive workflow for querying the warehouse.
Designed to be read-only by default, reinforcing a single source of truth and preventing accidental alterations to the warehouse.
In short, ktx marries the strengths of automatic knowledge ingestion with a robust semantic surface, giving agents a reliable, declarative foundation for data work while minimizing manual upkeep.
Who is ktx for?
Use ktx if you want agents to query your warehouse with confidence and consistency, backed by business knowledge. Consider the following scenarios:
- You work with agents like Claude Code, Codex, Cursor, or OpenCode and want them to rely on approved metric definitions rather than inferring metrics on the fly.
- Your business knowledge is dispersed across dbt, Looker, Metabase, Notion, and internal wikis, and you want a unified surface for querying.
- You need agents to reuse canonical SQL instead of rewriting it with every prompt, preventing drift and rework.
Skip ktx if:
- You don’t operate a SQL data warehouse or don’t plan to expose a readable data surface on top of one.
- You only need one-off, ad-hoc queries without maintaining a broader semantic surface.
- You rely on environments where a local runtime is not feasible (though ktx emphasizes local, private operation).
ktx supports a broad range of databases, including PostgreSQL, Snowflake, BigQuery, ClickHouse, MySQL, SQL Server, and SQLite. It also integrates with popular data modeling and BI tooling like dbt, MetricFlow, LookML, Looker, Metabase, and Notion, enabling flexible workflows.
Quick Start: getting up and running
A quick example to get you started demonstrates the power of ktx in a few commands. The setup creates or resumes a local ktx project, configures providers and connections, builds context, and installs agent integration.
- Global installation and initial setup:
- npm install -g @kaelio/ktx
- ktx setup
- ktx status
After setup, a sample status might look like this:
- Project path
- Project ready: yes
- LLM ready: yes (e.g., Claude Sonnet)
- Embeddings ready: yes
- Databases configured: yes (warehouse)
- Context sources configured: yes (dbt_main)
- ktx context built: yes
- Agent integration ready: yes (codex:project)
If you already have an agent in your project, you can leverage it by installing the ktx skill into your agent project directory:
- Run in your project directory:
- Run npx skills add Kaelio/ktx --skill ktx
- Use the ktx skill to install and configure ktx for your project
Important: If ktx status indicates that you should start the MCP server, run ktx mcp start --project-dir
Typical first commands to explore the setup and basic functionality include:
- ktx setup — Create, resume, or update a ktx project
- ktx status — Check project readiness
- ktx ingest — Build context for configured connections
- ktx sl "revenue" — Search semantic sources
- ktx wiki "refund policy" — Search wiki pages
- ktx mcp start — Start the MCP server for agent clients
For a complete command reference, consult the CLI Reference in the docs.
Code blocks (for illustration):
bash npm install -g @kaelio/ktx ktx setup ktx status
bash ktx ingest ktx sl "revenue" ktx wiki "refund policy" ktx mcp start
Project layout and structure
A typical ktx project looks like this:
text my-project/ ├── ktx.yaml # Project configuration ├── semantic-layer// # YAML semantic sources ├── wiki/global/ # Shared business context ├── wiki/user// # User-scoped notes ├── raw-sources// # Ingest artifacts and reports └── .ktx/ # Local state and secrets, git-ignored
Notes:
- Commit ktx.yaml, semantic-layer/, and wiki/.
- Keep .ktx/ local and ignored.
- Project resolution defaults to KTXPROJECTDIR, then the nearest ktx.yaml, then the current directory.
- When scripting, pass --project-dir to specify the exact project directory.
This structure keeps a clean separation between canonical warehouse context, business knowledge, and ephemeral state, while still enabling seamless agent workflows.
Frequently asked questions
- Does ktx send my schema or query results to a hosted service?
- No. ktx runs locally. The only data leaving your machine is what you explicitly send to the chosen LLM provider.
- Which LLM backends are supported?
- Anthropic API, Google Vertex AI, AI Gateway, and local Claude Code session via the Claude Agent SDK.
- How is ktx different from a dbt or MetricFlow semantic layer?
- ktx ingests and blends those layers with raw table introspection and wiki content. Agents get a single searchable surface with central contradictions flagged for human review.
- Does ktx require a running server?
- There's no hosted service. The local MCP daemon runs on demand via ktx mcp start when an agent client needs it.
- Is my warehouse safe?
- Yes. All connections are read-only, and ktx never writes to the warehouse.
- How do I learn more?
- Quickstart, The Context Layer, Building Context, CLI Reference, Agent Quickstart, Community and Support pages are all available in the docs.
Documentation and further reading
- Quickstart
- The Context Layer
- Building Context
- CLI Reference
- Agent Quickstart
- Community and Support
If you’re exploring ktx, these docs provide a narrative that complements this overview and helps you tailor the context layer to your organization.
Community, contribution, and development
Getting involved is straightforward:
- Slack: Join the ktx community to ask questions, share builds, and chat with maintainers.
- GitHub Issues: Report bugs and request features.
- Contributing: Learn how to set up the repo, run tests, and submit PRs.
For developers who want to contribute locally, the project uses a pnpm + uv workspace:
- Core components include:
- packages/cli: The TypeScript CLI and published npm package source
- packages/cli/src/context: Core context engine
- packages/cli/src/llm: LLM and embedding providers
- packages/cli/src/connectors: Database scan connectors
- python/ktx-sl: Semantic-layer query planning
- python/ktx-daemon: Portable compute service
Local development workflow:
bash git clone https://github.com/kaelio/ktx.git cd ktx pnpm install uv sync --all-groups pnpm run build pnpm run check
Useful checks during development:
bash pnpm run type-check pnpm run test pnpm run dead-code uv run pytest -q
Telemetry and privacy
ktx collects anonymous usage telemetry from interactive CLI runs to improve setup reliability and data-agent workflows. It does not record file paths, hostnames, SQL strings, schema names, error messages, or full argv content. Opt-out options are available in the Telemetry documentation.
Licensing and recognition
ktx is licensed under the Apache License, Version 2.0. See the LICENSE file for details.
The journey so far: star history
A popular indicator of community momentum is star history. If you’re curious about adoption trends, you can explore the star history visualization:
In short, ktx offers a cohesive, automated approach to building and maintaining the contextual surface that data agents rely on. By ingesting knowledge, mapping the data stack, and constructing a unified semantic layer, it reduces the manual overhead of maintaining a data governance surface while keeping agents grounded in approved definitions. The result is faster, more accurate data questions addressed with less friction—allowing your analytics workflows to scale with confidence. If you’re evaluating how to empower your agents to query your warehouse correctly and consistently, ktx presents a compelling architecture that harmonizes business knowledge with technical context, all while keeping your data securely under your control.
Enjoying this project?
Discover more amazing open-source projects on TechLogHub. We curate the best developer tools and projects.
Repository:https://github.com/Kaelio/ktx-ai-data-agents-mcp-context-skills
GitHub - Kaelio/ktx-ai-data-agents-mcp-context-skills: The Context Layer for Data Agents
The Context Layer for Data Agents is an automated solution that ingests business knowledge, maps data stacks, and builds a semantic layer enabling data agents l...
github - kaelio/ktx-ai-data-agents-mcp-context-skills