Nebula AI Research

Implementation Guide · Nebula Personalization Tech Solutions Pvt. Ltd.
Research basis: Verma & Agarwal (2026), DOI: 10.5281/zenodo.20325460

Most content libraries were built for keyword-based search. A RASA content audit is the process of systematically evaluating that existing content against the five RASA dimensions to determine which pieces are already GEO-ready, which require targeted revision, and which should be restructured or retired.

This guide covers the full audit workflow: how to identify the right audit scope, how to chunk content correctly before scoring, how to use RASA-Analyst to generate dimension scores, and how to triage results into a prioritised remediation queue.

What a RASA Content Audit Measures

A RASA content audit scores content at the chunk level — not the page level. This is the fundamental difference from a traditional SEO content audit, which evaluates pages by keyword coverage, word count, or backlink profile.

Each content chunk is scored across five dimensions:

Retrieval Probability (RP) — weight 0.25: Does this chunk contain the precise entity and terminology signals that vector retrieval systems need to surface it for the right query?
Semantic Chunk Coherence (SCC) — weight 0.20: Is this chunk a clean, self-contained unit of meaning that can be retrieved and understood without surrounding context?
Entity Clarity Score (ECS) — weight 0.20: Are the named entities in this chunk precise, consistent, and unambiguously identified?
Synthesis Compatibility Index (SCI) — weight 0.20: Can this chunk be safely incorporated into an AI-generated answer without introducing errors, contradictions, or ambiguity?
Citation & Grounding Potential (CGP) — weight 0.15: Does this chunk contain named sources, statistics, or institutional attribution that AI systems can cite?

The composite RASA score is: (RP × 0.25) + (SCC × 0.20) + (ECS × 0.20) + (SCI × 0.20) + (CGP × 0.15)

PUBLISH verdict: ≥ 8.0. REVISE: 6.0–7.9. REJECT: < 6.0, or SCI < 6.0 regardless of composite.

Step 1 — Define Your Audit Scope

Auditing every piece of content at once is inefficient. Prioritise the audit scope by starting with content that meets one or more of these criteria:

High-intent pages: pages that already receive organic traffic and could compound authority if optimised for AI retrieval
Pillar content: cornerstone articles, framework explainers, and methodology pages that are referenced internally across the site
Conversion-adjacent content: pages in the consideration and decision stages of the funnel, where AI citation could meaningfully influence purchase decisions
Recently updated content: pages revised in the last 12 months are likely already indexed and actively serving queries
Competitor-contested topics: topics where AI-generated answers currently cite competitor content instead of yours

For each page in scope, list all the discrete topic sections — not just the page as a whole. A 2,000-word article typically contains 4–8 auditable chunks.

Field	What To Capture
Page URL	The page the chunk lives on
Chunk ID	A reference (e.g. page-slug_chunk-01)
Chunk heading	The H2/H3 that introduces this section, or a brief descriptor
Word count	Approximate character length
Chunk text	The full text to be submitted to RASA-Analyst

Step 2 — Chunk the Content Correctly

The quality of a RASA audit depends entirely on how content is chunked before scoring. Incorrect chunking produces misleading scores.

Chunking rules

One topic per chunk. A chunk should be about one clearly defined subject. If a passage shifts topic mid-way — even mid-paragraph — split it at the transition point.
Target 150–400 words per chunk. Below 150 words, chunks typically lack enough context for reliable scoring. Above 400 words, they usually contain multiple topics that should be scored separately.
Use headings as natural chunk boundaries. Each H2 or H3 section is usually a candidate chunk boundary. Don't merge sections that have different topical focuses just to reach a word count.
Introductions and conclusions are separate chunks. Don't combine a page introduction with the first body section — intros typically have different retrieval signals than developed body content.
Tables, lists, and definitions are chunks. A comparison table, a numbered process list, or a definition block each constitutes a distinct retrievable unit and should be scored independently.

What to record per chunk

Why the Shift Happened

The transition from keyword retrieval to vector retrieval was not a product decision made by any single company. It emerged from the convergence of three developments: the maturation of transformer-based language models capable of understanding semantic meaning rather than surface-level word matching; the deployment of retrieval-augmented generation (RAG) as an infrastructure pattern for grounding LLM outputs in factual content; and the public release of generative search interfaces — ChatGPT Search, Perplexity, Google AI Overviews, Microsoft Copilot — that deliver synthesised answers as the primary user experience.

In a keyword retrieval system, the question "what is the RASA framework?" is answered by finding pages that contain those words. In a vector retrieval system, the same question is answered by finding passages whose semantic embedding is closest to the semantic embedding of the question — regardless of whether the exact words appear. This distinction determines everything about how content must be structured to be found.

Content that was optimised exclusively for keyword ranking often fails in vector retrieval — not because it is low quality, but because keyword-optimised writing patterns (broad topic coverage, keyword repetition, thin introductory sections) produce weak embedding signals and low chunk coherence scores. The content becomes semantically diffuse: it resembles millions of other documents instead of being precisely retrievable for a specific query.

What SEO and GEO Share

The shift to GEO does not mean discarding SEO practice. Several SEO foundations remain structurally important in a generative retrieval environment:

Technical accessibility. Content that cannot be crawled or indexed by search engines cannot be ingested by AI training datasets or RAG pipelines. Canonical tags, robots directives, page speed, and clean URL structures remain necessary preconditions.
E-E-A-T signals. Google's Experience, Expertise, Authoritativeness, and Trustworthiness framework overlaps significantly with GEO's authority model. Named authorship, institutional affiliation, and cited sources serve both ranking and retrieval functions.
Structured data markup. JSON-LD schema that defines entities, relationships, and attribution helps both SERP rich snippets and AI entity disambiguation. The schema patterns used across RASA dimension pages (TechArticle → isPartOf ScholarlyArticle → about DefinedTerm) serve both purposes simultaneously.
Content depth and specificity. Thin, generic content performs poorly in both paradigms. Both SEO and GEO reward content that is specific, well-sourced, and developed beyond surface-level coverage.

The practitioner moving from SEO to GEO does not start from zero. They extend their existing practice into a new retrieval dimension — adding chunk-level semantic structure, entity precision, and synthesis compatibility to their existing page-level optimisation habits.

How RASA Operationalises GEO

The Retrieval-Aware Semantic Architecture (RASA) framework, developed by Amit Verma and Sarita Agarwal at Nebula Personalization Tech Solutions Pvt. Ltd. and published under DOI 10.5281/zenodo.20325460, provides the first structured scoring methodology for GEO readiness at the content-chunk level.

RASA decomposes GEO readiness into five measurable dimensions:

Retrieval Probability (RP) — weight 0.25
Measures the density and precision of retrieval signals: named entities, technical terminology, topical anchors, and the absence of generic filler language. RP answers: will this chunk surface for the right query?
Semantic Chunk Coherence (SCC) — weight 0.20
Measures whether a passage is a clean, self-contained unit of meaning that can be retrieved and understood without surrounding context. SCC answers: does this chunk make sense as a standalone retrieval result?
Entity Clarity Score (ECS) — weight 0.20
Measures the precision, consistency, and disambiguation of named entities within a chunk. ECS answers: do AI systems know unambiguously who and what this content is about?
Synthesis Compatibility Index (SCI) — weight 0.20
Measures how well a chunk integrates with others in a RAG synthesis pipeline — factual precision, logical structure, absence of contradiction signals. SCI is the only dimension with a hard override: SCI < 6.0 triggers a REJECT verdict regardless of composite score. SCI answers: can this chunk be safely synthesised into an AI-generated answer?
Citation & Grounding Potential (CGP) — weight 0.15
Measures how citable and attributable the chunk is — presence of named sources, DOIs, statistics, and institutional attribution. CGP answers: will AI systems cite this content when they use it?

The RASA composite score is calculated as:

(RP × 0.25) + (SCC × 0.20) + (ECS × 0.20) + (SCI × 0.20) + (CGP × 0.15)

A composite score of 8.0 or above earns a PUBLISH verdict. Scores between 6.0 and 7.9 require revision. Scores below 6.0 receive a REJECT verdict.

RASA provides what no traditional SEO tool offers: a content-chunk-level score that predicts AI retrieval performance, not SERP ranking. It is the measurement layer that GEO has lacked since the term entered practitioner vocabulary.

Evaluating GEO Readiness with RASA-Analyst

RASA-Analyst is the official evaluation engine for the RASA framework, available as a locally-run model at ollama.com/nebulatech/rasa-analyst. It scores content chunks across all five RASA dimensions, returns a composite score with verdict, identifies specific failure modes by quoting exact phrases from the input, and provides a concrete remediation recommendation for any dimension scoring below 8.0.

To run a GEO readiness evaluation:

ollama run nebulatech/rasa-analyst

Paste your content chunk when prompted. RASA-Analyst will return a full five-dimension score report with improvement guidance.

Full usage documentation: /research/rasa-analyst-guide

A Practical Transition Checklist

For content teams, SEO Agencies, Digital Marketing Agencies and practitioners moving from SEO-only to GEO-aware content strategy:

Audit existing content at the chunk level, not the page level — identify passages, not just pages
Replace broad keyword-optimised language with precise named entities: frameworks, tools, organisations, methodologies, people
Ensure each passage is self-contained: a reader (or AI system) should not need surrounding context to understand it
Add verifiable attribution to every factual claim: named sources, statistics with origin, DOIs where applicable
Implement TechArticle / ScholarlyArticle JSON-LD schema to support both SERP rich results and AI entity disambiguation
Remove synthesis-incompatible language: hedging qualifiers, contradictory claims, and passive-voice ambiguity that AI systems cannot safely quote
Score revised content with RASA-Analyst before publishing — target RP ≥ 8.0, SCI ≥ 6.0 (hard floor)
Continue page-level SEO practice: technical accessibility, canonical tags, Core Web Vitals, and E-E-A-T signals remain relevant

Related Research

RASA Framework Overview — Full framework introduction, composite score formula, and five dimensions
Retrieval Probability (RP) — How AI retrieval signals are measured
Semantic Chunk Coherence (SCC) — Structuring content as clean retrieval units
Entity Clarity Score (ECS) — Named entity precision and disambiguation
Synthesis Compatibility Index (SCI) — Content safety and synthesis fitness
Citation & Grounding Potential (CGP) — Making content citable by AI systems
RASA Research Paper on Zenodo — Full academic paper, DOI: 10.5281/zenodo.20325460

Framework Reference: Verma, A. & Agarwal, S. (2026). Retrieval-Aware Semantic Architectures (RASA) for AI-Native Search. Nebula Personalization Tech Solutions Pvt. Ltd. DOI: 10.5281/zenodo.20325460

RASA Content Audit
How to Score Existing Content for AI Retrieval