Skip to Content
Trust & TransparencyWhere AI Outputs Come From

Where AI Outputs Come From

Virza uses AI across several features: chat, summaries, extraction, and search ranking. This page explains exactly how each AI output is generated and what you can trust about the results.

Chat responses

When you ask a question in Virza’s chat:

  1. Evidence retrieval: Virza searches your workspace documents (or a specific collection/document, depending on your selected scope) for passages relevant to your question
  2. Source evaluation: Retrieved passages are scored for relevance using a neural cross-encoder model
  3. Response generation: A large language model generates an answer, grounding it in the retrieved evidence when relevant passages were found
  4. Citation attachment: Every claim derived from your documents includes an inline citation linking to the specific source passage

The AI model may also draw on its training knowledge when:

  • No relevant documents were found in your workspace
  • Your question is clearly general (e.g., “What is quantum computing?”)
  • Your question requires context beyond what’s in your documents

The knowledge source badge always tells you which sources were used.

Document summaries

Summaries are generated by passing the document’s extracted text to a language model with a structured prompt. The model produces a summary based solely on the document’s content. No external knowledge is added.

  • Free and Starter plans: standard AI model
  • Pro and higher: advanced model (Claude) for higher accuracy and depth

Table and figure descriptions

AI-generated descriptions of tables and figures use a vision model that analyzes the cropped image of each artifact. The description is generated from what the model can see in the image. It does not combine information from other parts of the document.

Search ranking

Virza’s search uses AI in two ways:

  • Embedding generation: your query and document content are converted to vector representations that capture semantic meaning, enabling concept-level matching beyond keywords
  • Neural reranking: a cross-encoder model reads your query alongside each candidate result to produce a fine-grained relevance score

These AI components improve result quality but do not generate text. They score and rank existing content.

What is inferred vs. extracted

OutputSourceReliability
Document title, authors, DOIExtracted from document + verified via CrossRef/arXivHigh, cross-referenced with external databases
Section boundariesDetected from document structureHigh for standard academic layouts
Table data (headers, rows)Extracted from document using Docling TableFormerHigh (97.9% accuracy on standard tables)
Figure cropsExtracted from document pages using bounding box detectionHigh
Citation recordsParsed from bibliography section via GROBID + CrossRefHigh for well-formatted references
Executive summaryInferred by AI model from document textMedium, captures key points but may miss nuance
AI chat answersInferred by AI model, grounded in retrieved evidenceVaries, check the confidence meter and source badges
Artifact descriptionsInferred by vision AI from cropped imageMedium, best for clear charts and diagrams
Claims (Enterprise)Inferred by AI model from results sectionsMedium, always verify against the original text

Rule of thumb: anything labeled “extracted” comes directly from the document and is highly reliable. Anything labeled “inferred” was generated by an AI model and should be verified for important work.

Privacy of AI processing

  • Your documents are processed on secure infrastructure, never on shared public AI endpoints
  • Document content is never used to train AI models
  • AI conversation history is stored within your workspace and accessible only to workspace members with appropriate roles
  • Workspace admins can audit AI usage from Settings

See Privacy for the full data handling policy.

Last updated on