Download Synthesis Compatibility Index (SCI) — RASA Framework

What Is the Synthesis Compatibility Index?

The Synthesis Compatibility Index (SCI) is the fourth scored dimension in the Retrieval-Aware Semantic Architecture (RASA) framework. It measures how well a content unit can be combined with other retrieved chunks in a RAG pipeline to produce a coherent, accurate, and attributable AI-generated answer.

SCI is grounded in two components of the RASA framework: Core Architectural Principle 5 (Synthesis Compatibility) and Discoverability Dynamic 3 (Synthesis Compatibility as a retrieval outcome). Principle 5 defines synthesis compatibility as a structural property that must be designed into content — not an emergent quality that appears at retrieval time. Dynamic 3 measures it as an observable outcome: whether retrieved chunks from a corpus actually combine without contradiction when an AI system synthesises an answer.

SCI is scored on a scale of 1 to 10 and carries a weight of 0.20 in the RASA composite scoring formula.

Why Synthesis Compatibility Matters

RAG pipelines do not return a single piece of content in response to a query. They retrieve multiple chunks — often from different pages, sections, or documents — and pass them together to a language model that synthesises a unified answer. The quality of that answer depends not only on the quality of each individual chunk, but on how well those chunks work together.

Content that makes contradictory claims, relies on assumptions not shared by other chunks, uses terminology inconsistently across documents, or takes positions that conflict with adjacent retrieved material will degrade synthesis quality even when each chunk individually scores well on RP, SCC, and ECS.

Low SCI scores indicate that a content unit is structurally isolated — it may retrieve correctly, but it resists synthesis. High SCI scores indicate modular, combinable content that integrates cleanly into AI-generated answers regardless of what other chunks are retrieved alongside it.

What Determines the Synthesis Compatibility Index

The RASA framework identifies four structural factors that drive SCI scores:

1. Absence of contradictory claims
Content that asserts facts, statistics, or positions that directly conflict with established knowledge — or with likely co-retrieved content from the same corpus — introduces synthesis errors. A RAG pipeline that retrieves two chunks making contradictory claims about the same entity or metric cannot produce a reliable answer. High-SCI content makes claims that are verifiable, stable, and consistent with the broader information environment it will be synthesised within.

2. Modular structure
Content structured as discrete, self-standing claims — each supportable and citable independently — is inherently more synthesis-compatible than content structured as a continuous argument that only holds together as a whole. A modular chunk contributes a single well-formed assertion to a synthesised answer. A non-modular chunk requires the synthesising model to extract a partial claim from a larger structure, introducing interpretation error.

3. Attribution and citation readiness
Content that explicitly names its sources, methods, authors, and evidence base is easier for AI systems to attribute correctly in synthesised answers. Unsourced claims, anonymous statistics, and unattributed positions create attribution gaps that synthesising models either fill incorrectly or omit — reducing the accuracy and authority of the final generated response.

4. Terminological alignment with the broader corpus
Content that uses non-standard terminology, invented acronyms without definition, or idiosyncratic naming conventions for established concepts will not synthesise cleanly with co-retrieved chunks that use standard terminology. Terminological alignment — using the same names for the same things that the broader information environment uses — is a prerequisite for synthesis compatibility at scale.

SCI Score Reference Scale

Score | Compatibility Level | Structural Characteristics

9–10 | Plug-and-play | No contradictions, fully citable, modular, terminologically aligned

7–8 | Strong | Mostly compatible, minor structural gaps or one unsourced claim

5–6 | Moderate | Requires significant context to synthesise correctly

3–4 | Weak | Makes contradictory or unsupported claims

1–2 | Incompatible | Conflicts with established knowledge or resists any structured synthesis

Common SCI Failure Modes

The RASA framework's Failure Modes Taxonomy (Section 5, Verma & Agarwal, 2026) identifies patterns that consistently produce low SCI scores:

Duplicate and Redundant Content. Multiple content units making identical or near-identical claims across a corpus create synthesis conflicts — the RAG pipeline retrieves the same assertion twice and the synthesising model must decide which instance to use, often producing repetitive or inconsistent answers. The RASA framework treats corpus-level redundancy as an SCI failure at the system level.

Contradictory Claims Across Documents. Organisations that publish updated statistics, revised positions, or corrected information without retiring or redirecting older content create synthesis environments where contradictory chunks will be co-retrieved. Each individual chunk may score acceptably on RP, SCC, and ECS, but the corpus fails at SCI because synthesis produces unreliable answers.

Unsupported Absolute Claims. Content that makes precise quantitative claims without attributing them to a source ("65% of AI queries return content from the top three results") creates an attribution gap. When synthesised alongside properly sourced claims, unattributed statistics either contaminate the answer with false precision or force the synthesising model to omit them — reducing answer quality in either case.

SCI and the RASA SCI Threshold

The RASA framework establishes a critical SCI threshold at 6.0. Research underpinning the framework (Verma & Agarwal, 2026, DOI: 10.5281/zenodo.20325460) indicates that when a content chunk's SCI score falls below 6.0, it introduces contradictions that degrade retrieval precision by up to 34% in vector similarity searches using cosine distance metrics.

This threshold is operationalised directly in RASA-Analyst: a chunk scoring below 6.0 on SCI triggers a REJECT verdict regardless of its performance on other dimensions, reflecting the framework's position that synthesis-incompatible content is actively harmful to RAG pipeline quality — not merely suboptimal.

How to Score SCI Using RASA-Analyst

RASA-Analyst — the official evaluation engine for the RASA framework, available at ollama.com/nebulatech/rasa-analyst — evaluates SCI as part of a five-dimension analysis alongside RP, SCC, ECS, and CGP.

ollama run nebulatech/rasa-analyst

Paste your content chunk when prompted. RASA-Analyst will return an SCI score with specific observations about contradictions, attribution gaps, and terminological alignment issues, plus a targeted structural fix if the score falls below 8.

Improving SCI: A Practical Checklist

For content teams and digital marketing agencies building synthesis-compatible content for RAG pipelines:

Audit your corpus for contradictory claims across documents and retire or redirect outdated content
Structure each content unit as a set of discrete, independently supportable assertions rather than a continuous argument
Attribute every statistic, metric, and quantitative claim to a named source, study, or methodology
Align terminology with the standard vocabulary of your domain — define any non-standard terms on first use
Remove absolute claims that cannot be sourced — replace with attributed, verifiable equivalents
Treat content updates as corpus hygiene: when a claim changes, update or retire every document that contains the old version

SCI in the RASA Composite Score

SCI contributes 20% of the RASA composite score, calculated as:

RASA Score = (RP × 0.25) + (SCC × 0.20) + (ECS × 0.20) + (SCI × 0.20) + (CGP × 0.15)

SCI shares equal weight with SCC and ECS, reflecting the RASA framework's position that synthesis compatibility is as structurally important as coherence and entity clarity. A content unit can retrieve accurately (high RP), be well-structured (high SCC), and name its entities precisely (high ECS) — but if it introduces contradictions or resists combination at synthesis time, the AI-generated answer it contributes to will be degraded.

Related RASA Dimensions

Retrieval Probability (RP) — Measures how likely a content unit is to be surfaced by AI retrieval systems
Semantic Chunk Coherence (SCC) — Measures whether a content unit is a clean, self-contained chunk
Entity Clarity Score (ECS) — Measures named entity precision and consistency
Citation & Grounding Potential (CGP) — Measures how citable and attributable the content is to AI systems

Framework Reference

Verma, A. & Agarwal, S. (2026). Retrieval-Aware Semantic Architectures (RASA) for AI-Native Search. Nebula Personalization Tech Solutions Pvt. Ltd. DOI: 10.5281/zenodo.20325460

Synthesis Compatibility Index (SCI)

RASA Framework — Core Principle 5 & Discoverability Dynamic 3

Nebula Personalization Tech Solutions Pvt. Ltd.

Research basis: Verma & Agarwal (2026), DOI: 10.5281/zenodo.20325460