Nebula AI Research
RASA Content Audit
How to Score Existing Content for AI Retrieval
Implementation Guide · Nebula Personalization Tech Solutions Pvt. Ltd.
Research basis: Verma & Agarwal (2026), DOI: 10.5281/zenodo.20325460
Most content libraries were built for keyword-based search. A RASA content audit is the process of systematically evaluating that existing content against the five RASA dimensions to determine which pieces are already GEO-ready, which require targeted revision, and which should be restructured or retired.
This guide covers the full audit workflow: how to identify the right audit scope, how to chunk content correctly before scoring, how to use RASA-Analyst to generate dimension scores, and how to triage results into a prioritised remediation queue.
What a RASA Content Audit Measures
A RASA content audit scores content at the chunk level — not the page level. This is the fundamental difference from a traditional SEO content audit, which evaluates pages by keyword coverage, word count, or backlink profile.
Each content chunk is scored across five dimensions:
-
Retrieval Probability (RP) — weight 0.25: Does this chunk contain the precise entity and terminology signals that vector retrieval systems need to surface it for the right query?
-
Semantic Chunk Coherence (SCC) — weight 0.20: Is this chunk a clean, self-contained unit of meaning that can be retrieved and understood without surrounding context?
-
Entity Clarity Score (ECS) — weight 0.20: Are the named entities in this chunk precise, consistent, and unambiguously identified?
-
Synthesis Compatibility Index (SCI) — weight 0.20: Can this chunk be safely incorporated into an AI-generated answer without introducing errors, contradictions, or ambiguity?
-
Citation & Grounding Potential (CGP) — weight 0.15: Does this chunk contain named sources, statistics, or institutional attribution that AI systems can cite?
The composite RASA score is: (RP × 0.25) + (SCC × 0.20) + (ECS × 0.20) + (SCI × 0.20) + (CGP × 0.15)
PUBLISH verdict: ≥ 8.0. REVISE: 6.0–7.9. REJECT: < 6.0, or SCI < 6.0 regardless of composite.
Step 1 — Define Your Audit Scope
Auditing every piece of content at once is inefficient. Prioritise the audit scope by starting with content that meets one or more of these criteria:
-
High-intent pages: pages that already receive organic traffic and could compound authority if optimised for AI retrieval
-
Pillar content: cornerstone articles, framework explainers, and methodology pages that are referenced internally across the site
-
Conversion-adjacent content: pages in the consideration and decision stages of the funnel, where AI citation could meaningfully influence purchase decisions
-
Recently updated content: pages revised in the last 12 months are likely already indexed and actively serving queries
-
Competitor-contested topics: topics where AI-generated answers currently cite competitor content instead of yours
For each page in scope, list all the discrete topic sections — not just the page as a whole. A 2,000-word article typically contains 4–8 auditable chunks.
Step 2 — Chunk the Content Correctly
The quality of a RASA audit depends entirely on how content is chunked before scoring. Incorrect chunking produces misleading scores.
Chunking rules
-
One topic per chunk. A chunk should be about one clearly defined subject. If a passage shifts topic mid-way — even mid-paragraph — split it at the transition point.
-
Target 150–400 words per chunk. Below 150 words, chunks typically lack enough context for reliable scoring. Above 400 words, they usually contain multiple topics that should be scored separately.
-
Use headings as natural chunk boundaries. Each H2 or H3 section is usually a candidate chunk boundary. Don't merge sections that have different topical focuses just to reach a word count.
-
Introductions and conclusions are separate chunks. Don't combine a page introduction with the first body section — intros typically have different retrieval signals than developed body content.
-
Tables, lists, and definitions are chunks. A comparison table, a numbered process list, or a definition block each constitutes a distinct retrievable unit and should be scored independently.
What to record per chunk
For each chunk submitted to RASA-Analyst, the following fields should be recorded: the Page URL (the page the chunk lives on), Chunk ID (a reference such as page-slug_chunk-01), Chunk heading (the H2 or H3 that introduces the section, or a brief descriptor), Word count (approximate character length), and Chunk text (the full text to be submitted to RASA-Analyst).
Step 3 — Score Each Chunk with RASA-Analyst
RASA-Analyst is the official evaluation engine for the RASA framework, available at ollama.com/nebulatech/rasa-analyst. It runs locally, requires no API key, and scores content across all five dimensions in a single pass.
Running RASA-Analyst
ollama run nebulatech/rasa-analyst
When prompted, paste the chunk text. RASA-Analyst returns:
-
A score for each of the five dimensions (1–10)
-
A composite RASA score with a PUBLISH / REVISE / REJECT verdict
-
Quoted phrases from the input that produced weak signals, with a brief explanation for each
-
One concrete improvement recommendation per dimension scoring below 8.0
Recording scores
For each chunk, record the five dimension scores and the composite alongside the chunk ID. A spreadsheet with columns for Chunk ID, RP, SCC, ECS, SCI, CGP, Composite, and Verdict is sufficient for most audits. See the RASA-Analyst Guide for prompt templates and batch-scoring workflows.
Step 4 — Triage Results by Dimension
Once all chunks are scored, triage the results using the following priority matrix. The SCI column is treated separately because a SCI score below 6.0 overrides the composite verdict — it is always the highest-priority fix regardless of other scores.
Condition | Priority | Action |
|---|---|---|
SCI < 6.0 (any chunk) | Critical | Fix immediately — this chunk actively risks degrading AI synthesis quality if retrieved. Remove contradictions, add factual precision, eliminate hedging language. |
Composite < 6.0 (REJECT) | High | Restructure or replace — the chunk is not retrieval-ready. Consider whether the topic is worth developing or should be consolidated into another chunk. |
RP < 7.0 (weak retrieval signals) | High | Replace generic language with named entities and precise terminology. RP is the highest-weighted dimension — a weak RP score caps the composite ceiling. |
Composite 6.0–7.9 (REVISE) | Medium | Targeted revision — identify the lowest-scoring dimension and apply RASA-Analyst's specific recommendation for that chunk. |
ECS < 7.0 (entity ambiguity) | Medium | Add full entity names on first mention, disambiguate pronouns, ensure consistent naming across the chunk. |
CGP < 7.0 (low citation potential) | Low–Medium | Add named sources, statistics with origin, DOIs, or institutional attribution. CGP compounds over time — citations begat citations. |
Composite ≥ 8.0 (PUBLISH) | No action | Chunk is retrieval-ready. Log as complete and move to next chunk. |
Step 5 — Prioritise the Remediation Queue
After triage, sequence remediation work using the following logic:
-
Fix all SCI < 6.0 chunks first, regardless of which page they are on. These are the only chunks that can actively harm AI synthesis quality — they are not merely invisible, they are potentially hazardous to your brand's representation in AI-generated answers.
-
Next, address REJECT-verdict chunks on high-traffic or pillar pages. The highest-value pages with the most retrieval failure need the most immediate attention.
-
Then work through REVISE-verdict chunks by highest composite improvement potential. A chunk scoring 7.8 with a single weak RP dimension is often one precise entity replacement away from PUBLISH — these are quick wins.
-
Batch similar remediation types. If twelve chunks across a content library all have low CGP, address them together — the fix pattern (add named sources, DOIs, institutional attribution) is identical and can be applied rapidly.
-
Re-score every revised chunk before marking complete. Run RASA-Analyst again on the revised text to confirm the score moved in the intended direction and no new issues were introduced.
What Changes After a RASA Content Audit
The immediate output of a RASA content audit is a scored inventory of every content chunk in scope, with a prioritised remediation queue. But the medium-term effect is more significant: teams that have conducted at least one full RASA audit report a structural shift in how they approach new content creation. The five dimensions — RP, SCC, ECS, SCI, CGP — become a pre-publishing checklist rather than a retrospective fix.
For content teams running ongoing GEO programmes, a quarterly RASA audit of the top 20–30 content pages provides a reliable signal on AI retrieval performance trends. Tracking composite scores across audit cycles reveals whether content is improving or drifting as topics evolve and new competitor content enters the retrieval pool.
A Practical Transition Checklist
For content teams, SEO Agencies, Digital Marketing Agencies and practitioners moving from SEO-only to GEO-aware content strategy:
-
Audit existing content at the chunk level, not the page level — identify passages, not just pages
-
Replace broad keyword-optimised language with precise named entities: frameworks, tools, organisations, methodologies, people
-
Ensure each passage is self-contained: a reader (or AI system) should not need surrounding context to understand it
-
Add verifiable attribution to every factual claim: named sources, statistics with origin, DOIs where applicable
-
Implement TechArticle / ScholarlyArticle JSON-LD schema to support both SERP rich results and AI entity disambiguation
-
Remove synthesis-incompatible language: hedging qualifiers, contradictory claims, and passive-voice ambiguity that AI systems cannot safely quote
-
Score revised content with RASA-Analyst before publishing — target RP ≥ 8.0, SCI ≥ 6.0 (hard floor)
-
Continue page-level SEO practice: technical accessibility, canonical tags, Core Web Vitals, and E-E-A-T signals remain relevant
Related Resources
-
RASA-Analyst Guide — Full usage documentation, prompt templates, and batch scoring workflows
-
GEO Content Structure Guide — How to write new content that scores PUBLISH from the first draft
-
GEO vs SEO — Why AI retrieval requires different content signals than keyword ranking
-
RASA Framework Overview — Full framework reference
-
RASA Research Paper on Zenodo — DOI: 10.5281/zenodo.20325460
Framework Reference: Verma, A. & Agarwal, S. (2026). Retrieval-Aware Semantic Architectures (RASA) for AI-Native Search. Nebula Personalization Tech Solutions Pvt. Ltd. DOI: 10.5281/zenodo.20325460
