NVG8

Case study

Answers in Slack, grounded in your own docs

A B2B SaaS company wanted its team to get instant answers inside Slack: product questions, enablement questions, customer questions, answered from the company’s own documentation rather than someone’s memory. The knowledge existed, spread across internal docs, PDFs, Markdown files, and Salesforce Knowledge Articles. We built a production multi-tenant retrieval platform that answers in the channel, grounded in the source.

Josh Weckesser · February 2025 · 7 min read

The knowledge existed. Finding it did not.

Every team has this problem in some form. The answer to a customer question is written down somewhere, in a doc, a PDF, a Knowledge Article, but finding it means searching three systems or interrupting the one person who knows. So people guess, or they wait, or they ask in Slack and hope someone answers.

This was not a content problem. The company had good documentation. It was a retrieval problem: the right passage existed, but nothing put it in front of the person asking, at the moment they asked, in the tool they already had open.

The added requirement was trust. An assistant that confidently makes things up is worse than no assistant, because people stop believing the good answers too. Whatever we built had to cite its sources and stay grounded in them.

The build, step by step

The platform is a small number of parts doing one job each: ingest the documents, retrieve the right passages, answer in Slack with citations, and keep the index current. The work was in making each part production-grade and multi-tenant, not in any single clever trick.

Step 01

Ingest everything into one index

We used LlamaIndex to ingest the company’s documents, text, PDF, Markdown, and DOCX, and store them as embeddings in a Qdrant vector database. Mixed formats in, one queryable index out. Getting the chunking right at this stage mattered more than any later tuning: passages that are too large bury the answer, and passages that are too small lose the context that makes the answer correct.

Step 02

Answer in Slack, grounded in the source

When someone asks a question in Slack, the service retrieves the most relevant passages from the vector database and passes them to the model to compose an answer that cites where it came from. The answer arrives in the channel the question was asked in. Because the response is built from retrieved passages rather than the model’s memory, it stays grounded in the company’s own documentation.

Step 03

Build it multi-tenant from the first line

We designed the platform so a single codebase serves many clients. Each client gets an isolated ORG_PREFIX that scopes its secrets, its vector collections, and its configuration, so no client can ever see another’s data. Making that decision early meant deploying to the next client is a matter of configuration, not a fork.

Step 04

Sync Salesforce Knowledge

For a Salesforce-ecosystem company, the Knowledge base is the differentiator. We designed and built a sync with Salesforce Knowledge Articles: OAuth 2.0 authentication, the Bulk API for the initial load, incremental change detection over SOQL so only what changed gets re-pulled, rate limiting to stay inside Salesforce limits, and automatic re-indexing so the bot answers from the current article, not last month’s.

Step 05

Ship it to production

The FastAPI application is containerized and deployed on Google Cloud Run, with continuous delivery from a cloudbuild.yaml pipeline. It runs in production for a paying client, not as a demo.

What changed

Staff stopped switching tools to find answers. A product, enablement, or customer question asked in Slack comes back answered and cited, from the company’s own documentation, in the channel where it was asked. The subject-matter experts who used to field those questions got their time back.

The architecture changed what the platform is worth beyond this one client. Because it is multi-tenant, the same system deploys to another company as a configuration change rather than a rebuild. And because it syncs Salesforce Knowledge, it fits the ISV and SI companies that already maintain a Knowledge base, a capability most general-purpose Slack bots do not have.

What made it work

Grounded and cited, or it does not ship

The platform answers only from retrieved passages and shows where each answer came from. That single constraint is what makes people trust it, and trust is the whole point. An assistant that occasionally invents a confident wrong answer trains its own users to ignore it.

Multi-tenant is a day-one decision, not a later refactor

Isolating every client behind an ORG_PREFIX from the first line of code is far cheaper than retrofitting tenancy onto a single-tenant system later. It is the difference between onboarding a new client in an afternoon and rewriting the data layer.

Chunking beats model choice

How the documents are split into passages affected answer quality more than which model composed the answer. Passages sized to hold a complete thought, with the surrounding context retrieval needs, did more for accuracy than any prompt change.

What we would do differently

Build the evaluation set first

We tuned retrieval by reading answers and judging them. The better approach is to write a fixed set of real questions with known-good source passages on day one, so every change to chunking or retrieval can be measured instead of eyeballed.

Scope the Salesforce Knowledge sync up front

The Knowledge sync turned out to be the most valuable and most differentiated part of the platform. We treated it as an extension. Scoping it as a first-class requirement from the start would have shaped the ingestion design earlier and saved rework.

The takeaway

Most teams do not have a knowledge problem. They have a retrieval problem. The answer is written down, but it is not where the question gets asked, so people guess or interrupt an expert. A grounded, source-cited assistant in the tool the team already lives in turns documentation that exists into answers people actually get.

If your team is searching three systems or waiting on a subject-matter expert to answer routine questions, the Slack assistants and internal-knowledge workflows we build at NVG8 came from work like this. The Workflow Design Hour is the fastest way to scope whether the pattern fits your stack.