Skip to Content
ResearchExtraction Tables

Extraction Tables

Extraction Tables let you build structured grids where AI reads each of your documents and fills in the cells. Define the columns you care about — sample size, methodology, primary outcome, confidence intervals — and Virza extracts that data from every paper in your table automatically.

Extraction Tables require a Pro plan. They are also gated behind the research_extraction_tables feature flag. If you don’t see the Extraction Tables option, check your plan in Billing & Plans.


What extraction tables are for

Extraction tables solve the problem of reading dozens of papers and manually copy-pasting data points into a spreadsheet. Common use cases:

Use caseExample columns
Systematic reviewsPopulation, Intervention, Comparator, Outcome, Sample Size, Effect Size, p-value
Methodology comparisonStudy design, Data collection method, Analysis technique, Limitations
Cohort or RCT summaryTrial phase, Randomisation method, Blinding, Drop-out rate, Primary endpoint
Technology evaluationProgramming language, Framework, Benchmark dataset, Reported accuracy, Hardware
Qualitative researchResearch approach, Theoretical framework, Participant count, Data analysis method

Each cell is extracted by AI from the paper’s full text — not from the abstract alone. The model reads the methods, results, and discussion sections to find the most accurate answer.


Requirements

Before running extraction on a document, it must meet both of these conditions:

  1. Fully processed — the document must have a Ready status. Documents that are still scanning or parsing do not have extracted text yet, so the AI has nothing to read.
  2. Has readable text — scanned images with no OCR, password-protected PDFs, and corrupt uploads cannot be extracted. Virza runs OCR automatically, but very low-quality scans may produce poor results.

If a document is still processing when you trigger extraction, those cells will be marked Not Found. You can remove and re-add the document after it finishes processing, then re-run extraction to fill those cells.


Creating a table

Open Extraction Tables

From the sidebar, navigate to ResearchExtraction Tables. If this is your first table, you will see an empty state with a New Table button.

Give your table a name

Enter a descriptive title — for example, “RCT meta-analysis 2024” or “NLP benchmarking comparison”. Optionally add a description explaining the research question this table is answering.

Choose a template pack (optional)

If your documents belong to a recognised study type, select a pre-built template pack. Templates give you a head start with columns that are already worded correctly for that study type:

Template packBest for
RCT (Randomised Controlled Trial)Clinical trials, drug studies, intervention research
Cohort studyObservational epidemiology, longitudinal studies
Qualitative researchInterviews, ethnography, grounded theory, thematic analysis
Systematic reviewEvidence synthesis, meta-analyses, PRISMA-style reviews
Technology / benchmarkML papers, software comparisons, performance evaluations

You can use a template pack as a starting point and add, edit, or remove columns after.

Add documents

Click Add Documents to pick papers from your library. You can:

  • Select documents one by one from the document picker
  • Filter by collection to quickly add a whole set of related papers
  • Add up to 100 documents per table

Documents that are not yet fully processed are shown with a warning icon. You can still add them, but those rows will not extract until the documents finish processing.

Create your table

Click Create Table. Virza creates the table in Draft status with pending cells for every document × column combination.


Defining columns

Columns are the backbone of your table. Each column has:

  • Name — a short label shown in the table header (e.g., “Sample size”)
  • Prompt — the instruction the AI follows when reading each paper (e.g., “What is the total number of participants in the study? Return only the number.”)
  • Data type — the expected format of the extracted value

Column data types

TypeUse whenExample
TextThe value is a sentence, phrase, or descriptionMethodology description, study design
NumberThe value is a numeric figureSample size, p-value, mean age
BooleanThe answer is yes/no or true/false”Was the study double-blinded?”, “Was ethics approval obtained?”
ListThe value is multiple itemsOutcome measures, co-authors, interventions tested

Choosing the right data type helps the AI format its answer correctly and makes the table easier to read at a glance.

Writing effective prompts

The quality of your extraction depends almost entirely on your prompts. A vague prompt produces vague results.

Rules for good prompts:

  1. Be specific about what you want — instead of “Sample size”, write “What is the total number of participants enrolled in the study? Include dropouts. Return only the integer.”
  2. Specify the format — “Return only the number”, “Return yes or no”, “Return a comma-separated list”
  3. Tell the AI where to look — “As reported in the Methods section”, “As stated in Table 1 or the Results section”
  4. Handle missing data — “If not reported, return N/A”
  5. Avoid ambiguity — if a paper could have multiple answers (e.g., multiple arms with different sample sizes), specify which you want: “Total enrolled across all arms combined”

Prompt examples by column type:

Column nameEffective prompt
Sample sizeWhat is the total number of participants enrolled in the study? Return only the number. If not reported, return N/A.
Study designWhat type of study design is used? (e.g., randomised controlled trial, cohort study, case-control, cross-sectional). Return a single phrase.
Primary outcomeWhat is the primary outcome measure as stated by the authors? Return the exact outcome name as written in the paper.
Double-blindedWas the study double-blinded? Answer yes, no, or unclear.
Key limitationsList the main limitations the authors acknowledge. Return each as a short phrase, comma-separated.
Follow-up durationWhat is the follow-up duration? Include the time unit (weeks, months, years). Return the exact value as written.
Effect sizeWhat is the reported effect size or main statistical result? Include the metric type (OR, HR, RR, Cohen’s d, etc.) and confidence interval if reported.
CountryIn which country or countries was the study conducted? Return a comma-separated list.

Batching: Virza sends all columns for one document in a single AI request. This means 20 columns on 50 documents = 50 AI calls, not 1,000. This is fast and cost-efficient.

Column limits

Each table supports up to 20 columns. If you need more, consider splitting your extraction into multiple focused tables (e.g., one table for study design, another for outcomes).


Using pre-built template columns

Template packs give you pre-written column definitions with proven prompts. To use them:

  1. Open your table and click Add Column
  2. Switch to the Templates tab
  3. Select a template pack and choose the columns you want to add
  4. Click Add Selected Columns

You can modify a template column’s name or prompt after adding it. The template key is retained for reference but does not affect extraction.

RCT template columns

ColumnWhat it extracts
Sample SizeTotal enrolled participants
Randomisation MethodHow participants were randomised (block, stratified, etc.)
BlindingSingle, double, or open-label
Primary OutcomeStated primary endpoint
Follow-up DurationDuration with unit
Drop-out RatePercentage or count of drop-outs
InterventionWhat the treatment group received
Control / ComparatorWhat the control group received
Statistical MethodPrimary analysis approach
Effect MeasureOR, RR, HR, mean difference, etc. with CI
p-valueReported significance level for primary outcome
Ethics ApprovalWhether ethics approval is stated

Cohort template columns

ColumnWhat it extracts
Study DesignProspective or retrospective cohort
Sample SizeTotal cohort size
ExposureExposure or risk factor studied
OutcomePrimary outcome measured
Confounders AdjustedVariables adjusted for in analysis
Follow-up DurationFollow-up period with unit
Loss to Follow-upPercentage lost
Association MeasureRR, HR, OR with confidence interval

Qualitative template columns

ColumnWhat it extracts
Research ApproachPhenomenology, grounded theory, ethnography, etc.
Participant CountNumber of participants
Sampling StrategyPurposive, snowball, theoretical, etc.
Data CollectionInterviews, focus groups, observations, documents
Analysis MethodThematic, content, discourse analysis, etc.
Theoretical FrameworkUnderlying theory or paradigm
Saturation ReachedWhether data saturation is reported
Key ThemesMain themes or categories identified

Adding and removing documents

Adding more documents after creation

Click Add Documents from the table view. Any new documents are added with Pending cells for all existing columns. Run extraction again to fill the new rows.

Removing documents

Click the row’s context menu and select Remove Document. This removes the document from the table but does not delete it from your library. Cells are soft-deleted (retained internally for 30 days) and can be re-added.


Running extraction

Review pending cells

After adding documents and columns, the table shows cells in Pending status (shown as a grey dash). Pending cells are waiting to be extracted.

Click Extract

Click the Extract button (or Re-extract if the table has been run before). Virza enqueues the extraction job and the table status changes to Extracting.

Wait for results

Extraction processes 10 documents at a time in batches. For a 50-document table with 10 columns, expect around 60–120 seconds total. A progress indicator in the table header shows how many cells have been filled.

Review the results

Once complete, the table status changes to Ready and cells show their extracted values. Cells have one of four final states:

Cell statusMeaning
DoneAI found and extracted a value
Not FoundAI could not find the requested information in the document
FailedAn error occurred during extraction (try re-extracting)
PendingNot yet processed (trigger extraction to fill)

Table stuck in “Extracting”? If the table status is still “Extracting” after several minutes with no progress, the extraction may have failed silently. Refresh the page — if the table is still stuck, click Re-extract to re-queue only the remaining pending and failed cells.


Re-extracting cells

Extraction is not destructive. You can re-run extraction at any time:

  • Re-extract all — click the Extract button again; Virza re-processes only cells that are still Pending or Failed, leaving already Done cells untouched.
  • Re-extract a single cell — click the cell, then click Re-extract this cell from the cell detail panel.
  • Re-extract after editing a prompt — if you update a column’s prompt, cells already marked Done are reset to Pending for that column only, then re-extraction fills them with the new prompt.

Understanding “Not Found” results

A Not Found result means the AI read the full document and could not identify the information you asked for. Common causes:

CauseSolution
The document genuinely doesn’t report this dataNormal — indicates a gap in the literature
The prompt is too narrow or uses jargon not present in the paperBroaden or rephrase the prompt
The document is a very short abstract or metadata-only entryThe document was not fully processed — check its status
The relevant section uses different terminologyAdd synonyms to your prompt: “What is the sample size, participant count, or cohort size?”
The information is in a table or figure but not in proseFor highly structured data, specify “including data reported in tables or figures”

Not Found cells count as a valid result and can be sorted and filtered like any other.


Editing the table

Renaming the table

Click the table title to edit it inline. Press Enter or click away to save.

Editing column prompts

Click the column header, then Edit Column. Update the prompt and save. All cells for that column are reset to Pending and will be re-extracted on the next extraction run.

Reordering columns

Drag column headers left or right to reorder them. Column positions are saved automatically.

Deleting columns

Click the column header → Delete Column. Cells are soft-deleted (recoverable within 30 days). This cannot be undone from the UI within the same session.


Exporting to CSV

Click Export → Download CSV from the table toolbar. The CSV file contains:

  • One row per document
  • One column per extraction column, plus document metadata columns (title, authors, year, DOI)
  • Cell values as plain text (lists are joined with semicolons)
  • Empty cells for Not Found or pending values

The filename is extraction-table-{table-id}.csv. You can open it directly in Excel, Google Sheets, or import it into R/Python for statistical analysis.


Table limits

LimitValue
Documents per table100
Columns per table20
Tables per workspaceUnlimited
Extraction timeout per table10 minutes
Cell retention after deletion30 days

Tips and best practices

Start narrow, then expand — begin with 5–10 high-priority columns on a small subset of papers (10–15). Review the quality before running on your full corpus.

Use collections to organise documents — create a collection for the papers relevant to your review before opening Extraction Tables. You can filter the document picker by collection to add them all at once.

Test prompts on a single paper first — before running on 80 documents, test your column prompt on one paper you know well. Verify the cell captures what you expect, then scale up.

Number columns are strict — if a paper reports a range (“100–150 participants”), the AI may return the range as text. Make your prompt explicit: “If a range is given, return the midpoint as an integer.”

Not all documents process equally — preprints, conference papers, and theses may use different section structures. If you get many Not Found cells for a specific paper, check its document status and consider whether it was fully parsed.

Re-extract is non-destructive — if you tweak a prompt and run extraction again, only the affected column’s cells are reset. All other cells remain intact.


Troubleshooting

The table is stuck in “Extracting” status

The extraction worker may have crashed or the job may have timed out. Refresh the page and click Re-extract. If it persists, check that your documents are all in Ready status.

All cells show “Not Found” for a specific document

The document likely failed to parse correctly. Go to your library, find the document, and check its status. If it shows a processing error, try re-uploading the original file.

Cells show “Failed” for many documents

This usually indicates an issue with the AI service (transient error). Click Re-extract to retry. Failed cells are re-queued automatically on re-extraction.

The Extract button is disabled or missing

Check that you have Editor role or higher in the workspace. Viewers can read and export tables but cannot trigger extraction or add columns. See Roles & Permissions for details.

The Extraction Tables option is not visible in the sidebar

This feature requires a Pro plan and the research_extraction_tables feature flag to be enabled for your workspace. Contact your workspace owner or check Billing & Plans.


Privacy and data handling

  • Document text is sent to Virza’s AI gateway (Virza Cortex) for extraction. No document content is used to train models.
  • Extraction jobs are workspace-scoped — cells from one workspace are never visible to another.
  • Extracted cell values are stored encrypted in Virza’s database.
  • Deleting a table soft-deletes all cells. Hard deletion occurs automatically after 30 days.

For full privacy details, see Data Isolation.

Last updated on