dev.gxuri.in

Fri Sep 12 2025

Kolophon - No-Code RAG Application Builder

Kolophon - No-Code RAG Application Builder

Kolophon is a no-code platform for building Retrieval-Augmented Generation (RAG) applications such as AI chatbots, voice assistants, and analytics dashboards using your own data.

What is Kolophon?



Kolophon

A no-code platform that lets anyone build RAG-powered AI applications on top of their own data.


Building an AI chatbot that actually knows your business used to mean weeks of backend work: setting up vector databases, writing retrieval pipelines, wiring LLMs to context, and building an interface on top of all of it. Most people who needed that capability simply couldn't access it.

Kolophon changes that. Upload your data, configure a prompt, and you have a working AI application. No pipelines to write. No infrastructure to manage. The complexity lives inside the platform so it doesn't have to live in your workflow.


The Problem

RAG (Retrieval-Augmented Generation) is one of the most useful patterns in applied AI. Instead of relying on a model's general training, you give it your documents and it answers questions grounded in what you actually know. The problem is that building a RAG system from scratch is genuinely hard. You need a vector database, an embedding pipeline, a retrieval layer, an LLM integration, and something for users to interact with.

Teams that can afford an ML engineer get all of this. Everyone else gets a generic chatbot that hallucinates and knows nothing about their specific context.

Kolophon was built to close that gap.


What It Does

Kolophon is a full-stack platform where anyone can create, configure, and deploy AI applications powered by their own data. The experience is structured around projects: each project is a bot, and each bot is backed by a dataset the user controls.

Upload your data, get an AI that knows it. Users upload documents or datasets directly through the dashboard. Kolophon automatically generates vector embeddings and stores them in Pinecone. From that point on, every query the bot receives retrieves the most semantically relevant chunks of that data before passing anything to the LLM. The answers are grounded, not guessed.

Choose your interface. Not every AI application should look like a chat window. Kolophon supports three distinct interfaces: a conversational chat agent, a voice assistant with real-time recording and speech synthesis powered by ElevenLabs, and embeddable widgets that drop into any external website via an iframe-friendly route. One dataset, multiple surfaces.

Understand how it's being used. The analytics dashboard tracks every interaction across daily, weekly, and monthly intervals and visualizes them with Recharts. You can see which questions are being asked, how engagement changes over time, and where users are dropping off. Building the bot is only half the work. Understanding whether it's actually helping is the other half.


How It's Built

The stack is Next.js 15 with the App Router and React 19 throughout. Authentication is handled by NextAuth with support for Google and GitHub. The database is PostgreSQL accessed through Drizzle ORM, which provides a type-safe query layer without the overhead of a heavier ORM.

The RAG pipeline is built on LangChain. When a user sends a query, the relevant document chunks are retrieved from Pinecone using semantic search, assembled into a context window, and passed to either Groq or Google GenAI for inference depending on the project configuration. The whole chain runs server-side inside Next.js API routes, which keeps latency low and secrets off the client.

Voice is handled by ElevenLabs, which converts LLM responses into natural-sounding speech. The voice widget captures audio from the browser, transcribes it, runs it through the same RAG pipeline as the chat interface, and plays the response back. The same intelligence, a different modality.

The UI is built with Tailwind CSS and Shadcn components, with Lenis handling scroll behavior across the dashboard. The architecture is modular throughout: the embed routes, the analytics layer, the bot creation flow, and the query endpoint are all cleanly separated concerns.


What I Learned

The hardest part of building Kolophon was making the RAG pipeline feel fast. Vector search and LLM inference are both network-bound operations, and stacking them in sequence means latency compounds. Getting the retrieval step tight (right chunk sizes, right embedding model, right similarity threshold) made a bigger difference to perceived quality than any amount of prompt engineering.

The second insight was about interface design for AI. The temptation is to expose every configuration option to the user because the underlying system has a lot of knobs. But the users who most need a tool like this are the ones least equipped to reason about chunk overlap and temperature settings. Hiding that complexity behind sensible defaults, and only surfacing controls that meaningfully change the output, was a product decision as much as a technical one.


Built as part of an ongoing exploration into making AI infrastructure accessible without sacrificing capability.