INDEX MODEL v0.1 — FEBRUARY 2026

Scoring Methodology

This document defines the three proprietary indices used in the AI Tools Landscape Report: Agent Maturity Index (AMI), Autonomy Risk Index (ARI), and Ecosystem Power Index (EPI). All dimensions, weights, and grading criteria are published here for transparency and reproducibility.

Guiding Principles

Transparency over authority. Every score is decomposable into its dimension scores. Every dimension score links to evidence.
Confidence labeling is mandatory. No score is presented without a CONFIRMED, INFERRED, or SPECULATIVE tag.
Scores can change. Indices are versioned. When new evidence arrives, scores update. The changelog records every change.
No pay-for-score. Sponsorship does not affect index scores. Frameworks are scored identically regardless of commercial relationship.
Methodology evolves. Dimensions and weights may change between versions. All changes are documented with rationale.

Confidence Labels

Every data point and score carries one of three confidence labels:

✓ CONFIRMED

Based on primary sources: official documentation, published reports, security audits, GitHub repository data, or direct source code inspection. Verifiable by third parties.

◐ INFERRED

Reasonable conclusion drawn from confirmed data + domain expertise. Example: inferring enterprise penetration from known partnerships, pricing tiers, and job postings. Directionally reliable but not directly verified.

◌ SPECULATIVE

Forward-looking projection or estimate based on trends. Example: projected enterprise adoption rates for Q3 2026+. Used sparingly and always labeled. May be wrong.

Index 1: Agent Maturity Index (AMI)

AMI — Agent Maturity Index

Scale: 0–100

Measures how production-ready an agent framework or system is. A high AMI score means the system can be deployed in production with reasonable confidence in reliability, security, and operational control.

Dimensions & Weights

Dimension	Weight	What It Measures
Reliability	20%	Uptime, error handling, recovery from failures, consistent output quality
Observability	15%	Logging, tracing, debugging tools, execution visibility
Tooling Ecosystem	15%	Available integrations, plugins, MCP servers, third-party tools
Security Posture	20%	Vulnerability count, CVEs, audit results, secure defaults
Execution Control	15%	Permission models, sandboxing, human-in-the-loop, approval gates
Deployment Maturity	15%	CI/CD support, containerization, scaling options, enterprise readiness

Scoring Formula

AMI = Σ (dimension_score × dimension_weight) Where: dimension_score = 0–100 (assessed per framework per dimension) dimension_weight = percentage weight (all weights sum to 100%) Example (OpenClaw): AMI = (55×0.20) + (60×0.15) + (75×0.15) + (18×0.20) + (45×0.15) + (60×0.15) = 11.0 + 9.0 + 11.25 + 3.6 + 6.75 + 9.0 = 50.6 → rounded to 52 (with qualitative adjustment for evidence strength)

Letter Grades

80–100

60–79

40–59

0–39

Index 2: Autonomy Risk Index (ARI)

ARI — Autonomy Risk Index

Scale: 0–100 (higher = more risk)

Measures risk exposure when running a system autonomously. Unlike AMI (where higher is better), ARI is an inverse score — lower is safer. A high ARI means the system poses significant risk when running without continuous human oversight.

Dimensions & Weights

Dimension	Weight	What It Measures (Higher = More Risk)
Permission Model Strength	20%	Weak/missing permission boundaries = high score. Granular enforcement = low score.
Sandboxing / Isolation	18%	No isolation = high score. Container/VM isolation with network segmentation = low score.
Default Network Exposure	18%	Open ports, public endpoints = high score. No listening services = low score.
Secret Handling	15%	Plaintext keys = high score. Encrypted vault with rotation = low score.
Human-in-the-Loop Controls	15%	No approval gates = high score. Mandatory review for destructive actions = low score.
Audit Logging	14%	No logs = high score. Tamper-proof audit trail with SIEM export = low score.

Risk Labels

Low

0–25

Medium

26–50

High

51–75

Critical

76–100

Index 3: Ecosystem Power Index (EPI)

EPI — Ecosystem Power Index

Scale: 0–100

Measures distribution strength, community gravity, and ecosystem reach. A high EPI indicates the framework has strong adoption, vendor integration, and community momentum — making it harder to displace and easier to hire for.

Dimensions & Weights

Dimension	Weight	What It Measures
Adoption Signals	25%	GitHub stars, npm downloads, Docker pulls, community size, Stack Overflow activity
Vendor Integration Breadth	20%	Number of platforms, IDEs, services with native support or official integration
Enterprise Penetration	20%	Known enterprise deployments, SOC2 compliance, support contracts, case studies
Standard Alignment	15%	MCP support, OpenAPI compliance, tool protocol adherence, interoperability
Release Velocity	20%	Commit frequency, release cadence, maintainer activity, issue response time

Momentum Tags

Rising: EPI score increasing >10 points quarter-over-quarter. Example: OpenClaw (new entrant, explosive growth).
Stable: EPI score change within ±10 points. Example: LangChain (established, consistent community).
Declining: EPI score decreasing >10 points quarter-over-quarter. No current frameworks in this category.

How Scores Change

Indices are living scores. They update when:

New evidence emerges — A security audit, a new release, a partnership announcement
Corrections are submitted — Framework maintainers or community members can dispute scores with evidence
Methodology updates — Dimension weights may shift between versions as the landscape evolves
Time passes — Enterprise penetration and adoption signals change quarterly

All changes are logged in the version history below. Previous scores are preserved for comparison.

Score Dispute Process

Framework maintainers can dispute scores by providing counter-evidence. The process:

Open a GitHub issue on the report repository with the tag score-dispute
Cite specific dimension(s) and provide evidence supporting a different score
We review within 7 days and publish a response with rationale
If accepted, scores update in the next edition with changelog entry

Known Limitations

Subjectivity in weighting. The choice of 20% for Security vs 15% for Observability is a judgment call. Different use cases may warrant different weights.
Inferred scores are estimates. Where direct evidence isn't available (e.g., enterprise penetration for private companies), we use proxies like job postings, pricing tiers, and partnership announcements.
Snapshot in time. Scores reflect the state as of the edition date. Fast-moving projects may have changed significantly since publication.
Conflict of interest. Clawdia is developed by the same team that produces this report. We address this by: (a) scoring Clawdia using the same methodology as all other frameworks, (b) being transparent about low scores (EPI: 8, AMI: 44), and (c) publishing this methodology for independent verification.
No formal audit. These indices are not produced by a standards body. They are analytical scores by an industry research publication.

Version History

v0.1 February 16, 2026 — Initial Release
3 indices (AMI, ARI, EPI) covering 9 frameworks. 38 cited sources. Dimension weights established. Confidence labeling system introduced. Grading and risk label thresholds defined.

Data sources: View all 38 sources · Raw data: frameworks.json · Report: Agents 2026 Edition