CLINICAL INDEX

A transparent, deterministic verification methodology — fully disclosed.

Clinical Index is the verification framework every Stellar formula passes through. This page documents what it does, the 11-stage pipeline behind it, the published frameworks it applies, the scoring math, how Stellar Health Labs relates to it, and the FTC §255.5 disclosure that makes the relationship transparent.

Visit clinicalindex.com Back to methodology

WHAT CLINICAL INDEX DOES

FUNCTION

Dose-vs-research verification. COA audit. Methodology transparency.

Clinical Index reviews supplement formulas against a fixed protocol: does the dose match published research, was the batch tested at an independent accredited lab, and is the methodology documented well enough to be audited?

The framework was built specifically for the documentation-reader category of customer — the audience that reads ingredient panels, looks up PMIDs, and wants the work behind a claim before they buy.

Every Stellar formula passes through this protocol. Each one has a Clinical Index score (or a published 'pending' status while data is in audit) and a per-batch COA on file.

The full methodology — the 11-stage pipeline, the published framework anchors, and the deterministic scoring math — is documented in detail below, and at clinicalindex.com.

THE TWO FUNCTIONS

Audit, then publish.

01 · AUDIT

Dose-vs-research verification + COA review.

Each ingredient's dose is checked against the cited published research. Production batches are reviewed against the independent lab's certificate of analysis (heavy metals, microbials, ingredient identity, active standardization).

02 · PUBLISH

Methodology and scores are public.

Scoring criteria, audit process, and per-product results are published at clinicalindex.com. Customers can review the work — the methodology is open rather than walled-off.

THE CORE INSIGHT

HOW THE SCORING ACTUALLY WORKS

Does AI grade the evidence? No — AI extracts, deterministic math grades.

Large language models are used in Clinical Index as clinical extractors, not clinical reasoners. An AI model reads a label or a study and fills in a structured data form — study design, sample size, reported effect size, dose, risk-of-bias signals. It does the tedious transcription a research assistant would do.

Every numeric score is then computed by deterministic Python over those extracted inputs. The same inputs always produce the same score. No language model decides a grade, a tier, or a badge — fixed framework math does.

This mirrors how a real clinical-evidence team works: research assistants abstract data onto standardized forms; biostatisticians compute the scores. Clinical Index applies the same division of labor — AI does the tedious extraction, the framework does the grading.

THE 11-STAGE PIPELINE

AN EMBEDDED METHODOLOGY COMMITTEE

What does a formula pass through? An 11-stage evidence pipeline.

Each Stellar formula moves through eleven core stages (plus deterministic sub-stages), structured to mirror the roles on a real clinical-evidence team. Stages marked deterministic are fixed Python; stages marked extraction are AI filling structured forms that the math then consumes.

1 · Intake (extraction). A vision model parses the label image into ingredients and claims — the research assistant transcribing the product spec.
1.5 · Ingredient classifier (deterministic). Each ingredient is assigned a role (primary or supporting) and a functional class — the librarian indexing the formula.
2 · Evidence research (extraction). For every ingredient, PubMed literature is retrieved in parallel and per-study data is extracted — design, risk-of-bias signals, sample size, effect size, outcomes — the systematic-review extraction team.
3 · Verification (deterministic). Extracted data is sanity-checked for shape and completeness — data quality control.
4 · Formulation analysis (extraction). Each ingredient's dose adequacy, bioavailability, and form are analyzed against curated therapeutic ranges — the pharmacology consultant.
4.5 · Dusting detector (deterministic). A fixed check flags sub-clinical, under-dosed actives — the quality-control flag.
5 · Claims validation (extraction). Each claim is scanned for FTC compliance and prohibited-claim language — the regulatory-affairs reviewer.
6 · Safety assessment (extraction). Upper limits, LOAELs, and contraindications are scanned — the toxicology consultant.
6.5 · Interaction layer (deterministic). Pairwise ingredient interactions are looked up from a curated interaction table — the drug-interaction pharmacist.
7 · COA verification (extraction). The certificate of analysis is validated for label accuracy — the lab-results auditor.
7.1 / 7.6 · QHC + net-impression (hybrid). Each claim is mapped to a qualified-health-claim tier and the product's overall net impression is assessed — the FDA-compliance reviewer.
8a · Per-ingredient scoring (deterministic). Each ingredient's strength-of-clinical-support score and GRADE certainty are recomputed — the GRADE committee chair.
8b · Composite scoring (deterministic). The product composite is computed as a weighted average — the biostatistician.
9 · Verifier (extraction cross-check). An independent pass cross-checks the outputs — the independent statistical reviewer.
9.5 · Verification confidence (deterministic). Confidence across dimensions is aggregated into a single score — the methodology-board sign-off.
10 · Synthesis (generation). The narrative dossier text is written — the clinical writer.
10.5 · QA review (hybrid). A final quality and safety gate — the managing editor.
10.6 · Badge assignment (deterministic). The four-tier badge is applied by the fixed gate — peer-review approval.

The result is that no single AI step is trusted with a grade. Extraction feeds the math; the math decides.

UNIT OF ANALYSIS

NO UNIT DRIFT

What exactly gets scored — the product, the ingredient, the study, or the claim?

Clinical Index is explicit about the unit being scored at every step, so nothing drifts. The flow is product → ingredient (and claim text) → study → product.

The composite rolls up to the product.
GRADE certainty is keyed per ingredient–outcome pair.
The qualified-health-claim tier is keyed per claim text.

Because each metric is tied to a defined unit, an ingredient's evidence grade is never silently applied to the whole product, and a single claim's tier is never averaged away. The unit transitions are load-bearing, not incidental.

PUBLISHED FRAMEWORK ANCHORS

ESTABLISHED METHODS, APPLIED

Which established frameworks does the methodology apply?

Clinical Index does not invent its own grading scales. It applies published, peer-reviewed methodologies — the same ones used in systematic reviews and regulatory science. Applying a public framework is not endorsement by its authors: these organizations do not certify, accredit, endorse, or affiliate with Clinical Index or Stellar Health Labs.

GRADE. Five downgrade domains (risk of bias, inconsistency, indirectness, imprecision, publication bias) plus three upgrade domains (large effect, dose-response, plausible residual confounding), with an insufficient-data final-step adjustment. Returns an ordinal certainty: high, moderate, low, or very low.
RoB 2 (Sterne 2019). The Cochrane risk-of-bias tool for randomized trials — five domains, each with signaling questions.
ROBINS-I (Sterne 2016). The Cochrane risk-of-bias tool for non-randomized studies — seven domains.
FTC Health Products Compliance Guidance (Dec 2022). Four study-match filters: dose, form, population, outcome.
FDA Qualified Health Claims framework (Federal Register, 2003-07-11). Claim-evidence tier mapping: A, B, C, D, or insufficient.
Health Canada NNHPD. Natural-health-product evidence classification: Level I and Level II.
IOM, EFSA, and NIH ODS dose references. DRI, AI, NRV, UL, and LOAEL anchors for dose adequacy and safety.
Cohen 1988 / Jaeschke 1989. Minimal-clinically-important-difference cutoffs for assessing effect sizes.

Each framework is implemented as published. Where a method returns an ordinal grade, Clinical Index preserves that grade rather than collapsing it into a single number.

THE SCORING MATH

THREE DETERMINISTIC LAYERS, ONE GATE

How is the score actually calculated?

Every score is a deterministic calculation — three layers, then a badge gate. No model output is ever averaged into a number.

Layer 1 — Per-ingredient (Strength of Clinical Support). SCS = 0.40 × evidence + 0.40 × dose + 0.20 × bioavailability. The evidence input is not a separate opinion — it is a transparent transform of the GRADE result: the certainty band's midpoint (very low, low, moderate, high), scaled by effect size and attenuated by risk of bias. The evidence score only ever restates the GRADE math. If every input is missing, the score returns None rather than zero — a three-state honesty rule that keeps 'no data' distinct from 'bad data'.

Layer 2 — Per-product (Composite). A weighted average of evidence, dose, bioavailability, safety, and — when a COA exists — label accuracy. Without a COA the weights are 35 / 30 / 20 / 15; with a COA they become 30 / 25 / 10 / 20 / 15 (adding the label-accuracy term). Missing dimensions are renormalized, not defaulted — an absent dimension never silently counts as a zero.

Layer 3 — Per-claim (Qualified Health Claim tier). A deterministic decision tree over study match-flags, NNHPD evidence-level counts, and GRADE certainty assigns each claim a tier: A, B, C, D, or insufficient.

The badge gate. A formula earns the 'formula reviewed' badge only when all of these hold: composite ≥ 65, verification confidence ≥ 60, every per-claim QHC tier is A, B, or C, and there are no critical safety flags. Miss any one and the badge is withheld.

WHAT MAKES IT A METHODOLOGY

Five disciplines separate this from a generic AI summarizer.

Each one addresses a specific way evidence synthesis goes wrong. Four are below; the fifth — verification confidence — is important enough to stand on its own.

01 · RULE #2

No language model does arithmetic.

Every number — each score, each tier, the badge itself — is computed by deterministic Python. Language models only produce structured extractions; they never output a grade. Each score traces back to specific extracted study data and a documented framework branch.

02 · THREE-STATE

'Not assessable' is its own answer.

Primitives return matched, not-matched, or not-assessable — never a forced true/false. A missing abstract or an absent dose is recorded as 'no data', kept distinct from a studied-but-null result. The system never guesses to fill a blank.

03 · SAFETY NET

Thin evidence cannot score high.

When at least half the studies in an evidence body have no readable abstract, GRADE certainty is automatically downgraded a level — preventing an artificially high rating on a body whose methods can't actually be read.

04 · PROVENANCE

Curated data outranks extracted.

Therapeutic ranges follow a tier hierarchy: board-curated ranges, each backed by multiple citations, take precedence over auto-extracted values, and AI synthesis only fills genuine gaps. Where every number came from is preserved at every level.

THE FIFTH DISCIPLINE

VERIFICATION CONFIDENCE

Is the evidence strong — or are we just confident in the conclusion? Those are different questions.

Most scoring systems answer one question: how strong is the evidence? Clinical Index answers a second, separate one — how confident are we in this measurement?

Verification confidence is independent of evidence quality. It reflects whether the input data was complete, whether every pipeline stage actually ran, and whether any value quietly fell back to a default. A formula can carry strong evidence but low verification confidence because the underlying data was patchy — or modest evidence measured with high confidence.

The badge gate deliberately requires both: a composite of at least 65 and verification confidence of at least 60. Strong evidence alone is not enough — the measurement itself has to be trustworthy before a formula is marked reviewed.

THE RELATIONSHIP TO STELLAR HEALTH LABS

FTC §255.5 DISCLOSURE

Founder-shared. Fully disclosed. Independent where it counts.

Stellar Health Labs and Clinical Index were founded by the same individual. We disclose that directly. Clinical Index is not an arm's-length human auditor — it is a transparent, deterministic computational methodology: AI extracts structured data, fixed framework math computes every score.

What is genuinely independent: the ISO/IEC 17025 accredited laboratory that issues the certificates of analysis is a third party, separate from both entities — neither Stellar nor Clinical Index can influence a lab result. And the scoring math itself is fixed and reproducible: the same inputs always produce the same score, so marketing cannot move a number.

This page exists to fully disclose the relationship, per FTC 16 CFR §255.5 endorsements-and-testimonials guidance. The disclosure is intentional, the architecture is designed for it, and the roadmap below documents the path toward deeper structural separation.

INDEPENDENCE ROADMAP

The 12–18 month structural milestones.

The end-state is a Clinical Index governed at arm's length from the Stellar founder operation, with no founder-shared overlap. The path there is staged and published.

01 · Now → 6 months

Disclosure architecture in place.

Per-page disclosure language, the §255.5 footer disclosure, and this dedicated relationship page. Full methodology published on-page and at clinicalindex.com.

02 · 6 → 12 months

Operating separation deepens.

Independent editorial governance for Clinical Index methodology updates. Public audit logs. Additional brands brought into the Clinical Index audit cycle.

03 · 12 → 18 months

Genuine structural independence.

Target state: Clinical Index governance separated from founder operation, with documented continuity. Updates dated and disclosed as they happen.

READ THE FULL METHODOLOGY

The methodology lives here — and at clinicalindex.com.

The pipeline, the framework anchors, the scoring math, and per-product results are documented above and at the Clinical Index site.