# Data Processing (Knowledge Ops)

Lyntaris ensures maximum data privacy by processing your documents locally, isolating PDFs and intellectual property from third-party APIs. The Data Processing panel controls this local "Knowledge Ops" pipeline.

The RAG Pipeline

Before the physical Kiosk's AI can answer questions about your corporate data, the files must be transformed into searchable vector space.

Upload: Navigate to the Data Processing dashboard and select a file (PDF, TXT, Word).
Semantic Chunking: The document is not just split arbitrarily. Lyntaris uses advanced LangChain text splitters to semantically divide the content (favoring paragraph breaks or markdown headers) to preserve context.
Local Vectorization (multilingual-e5-large): The chunks are handed to a locally hosted embedding model running via FastAPI. E5 converts each textual chunk into a dense, multi-dimensional floating-point matrix (a Vector).
Instant Injection: The moment vectorization finishes, these matrices are pushed to the local Weaviate vector database instance running alongside Flowise.
Sub-Millisecond Retrieval: When a user asks a Kiosk a question, Flowise calculates the "Cosine Similarity" between the user's spoken prompt and every chunk in Weaviate. The highly relevant chunks are injected into the LLM's context window. Because Weaviate is local, the physical Kiosk will instantly answer questions about the new document without any reboot required.

Specialized Transcriptions (Soniox/Mistral)

Not all corporate data is text.

Transcripts Tab: If your deployment relies on extracting information from massive video assets or recorded board meetings, upload them here. Flowise offloads the raw audio to the Soniox engine, transcribing the multimedia into text chunks that can then be vectorized.
OCR Tab: For complex visual documents (like P&ID blueprints, receipts, or charts), standard text extractors fail. Lyntaris utilizes specialized Vision pipelines (like Mistral OCR) to map the visual structure of a PDF into a readable markdown format before vectorizing it.

Synthetic Q&A Generation

Often, corporate documentation is dense and jargon-heavy. Lyntaris provides an LLM summarization pipeline to pre-compute simple answers.

You can trigger Synthetic Generation on an uploaded document.
The system uses an LLM to read the jargon and automatically generate simple "User Questions" (e.g., turning a 4-page technical chassis manual into the hypothetical query: "How do I install the chassis?").
The system vectorizes both the raw text and the synthetic questions. This drastically increases the Cosine Similarity match rate when a layperson asks the Kiosk a simple question.

Computer Vision Deduplication (InsightFace)

This tab exposes controls for the biometric pipeline (face_dedupe.py), running via a FastAPI backend alongside your Unity installation.

When the physical Kiosk camera captures a user, it generates a biometric hash.
You can define the Match Threshold (e.g., 0.65 cosine similarity) here.
This single slider dictates the exact mathematical tolerance the local Kiosk uses to identify "Returning VIP users" versus treating them as "strangers", altering the privacy stringency of your physical deployment instantly without touching Unity code.

Data Processing

The RAG Pipeline

Specialized Transcriptions (Soniox/Mistral)

Synthetic Q&A Generation

Computer Vision Deduplication (InsightFace)

results matching ""

No results matching ""