INDEX MODEL v0.1 — FEBRUARY 2026

Scoring Methodology

This document defines the three proprietary indices used in the AI Tools Landscape Report: Agent Maturity Index (AMI), Autonomy Risk Index (ARI), and Ecosystem Power Index (EPI). All dimensions, weights, and grading criteria are published here for transparency and reproducibility.

Guiding Principles

  1. Transparency over authority. Every score is decomposable into its dimension scores. Every dimension score links to evidence.
  2. Confidence labeling is mandatory. No score is presented without a CONFIRMED, INFERRED, or SPECULATIVE tag.
  3. Scores can change. Indices are versioned. When new evidence arrives, scores update. The changelog records every change.
  4. No pay-for-score. Sponsorship does not affect index scores. Frameworks are scored identically regardless of commercial relationship.
  5. Methodology evolves. Dimensions and weights may change between versions. All changes are documented with rationale.

Confidence Labels

Every data point and score carries one of three confidence labels:

✓ CONFIRMED
Based on primary sources: official documentation, published reports, security audits, GitHub repository data, or direct source code inspection. Verifiable by third parties.
◐ INFERRED
Reasonable conclusion drawn from confirmed data + domain expertise. Example: inferring enterprise penetration from known partnerships, pricing tiers, and job postings. Directionally reliable but not directly verified.
◌ SPECULATIVE
Forward-looking projection or estimate based on trends. Example: projected enterprise adoption rates for Q3 2026+. Used sparingly and always labeled. May be wrong.

Index 1: Agent Maturity Index (AMI)

AMI — Agent Maturity Index
Scale: 0–100

Measures how production-ready an agent framework or system is. A high AMI score means the system can be deployed in production with reasonable confidence in reliability, security, and operational control.

Dimensions & Weights

DimensionWeightWhat It Measures
Reliability
20%
Uptime, error handling, recovery from failures, consistent output quality
Observability
15%
Logging, tracing, debugging tools, execution visibility
Tooling Ecosystem
15%
Available integrations, plugins, MCP servers, third-party tools
Security Posture
20%
Vulnerability count, CVEs, audit results, secure defaults
Execution Control
15%
Permission models, sandboxing, human-in-the-loop, approval gates
Deployment Maturity
15%
CI/CD support, containerization, scaling options, enterprise readiness

Scoring Formula

AMI = Σ (dimension_score × dimension_weight) Where: dimension_score = 0–100 (assessed per framework per dimension) dimension_weight = percentage weight (all weights sum to 100%) Example (OpenClaw): AMI = (55×0.20) + (60×0.15) + (75×0.15) + (18×0.20) + (45×0.15) + (60×0.15) = 11.0 + 9.0 + 11.25 + 3.6 + 6.75 + 9.0 = 50.6 → rounded to 52 (with qualitative adjustment for evidence strength)

Letter Grades

A
80–100
B
60–79
C
40–59
D
0–39

Index 2: Autonomy Risk Index (ARI)

ARI — Autonomy Risk Index
Scale: 0–100 (higher = more risk)

Measures risk exposure when running a system autonomously. Unlike AMI (where higher is better), ARI is an inverse score — lower is safer. A high ARI means the system poses significant risk when running without continuous human oversight.

Dimensions & Weights

DimensionWeightWhat It Measures (Higher = More Risk)
Permission Model Strength
20%
Weak/missing permission boundaries = high score. Granular enforcement = low score.
Sandboxing / Isolation
18%
No isolation = high score. Container/VM isolation with network segmentation = low score.
Default Network Exposure
18%
Open ports, public endpoints = high score. No listening services = low score.
Secret Handling
15%
Plaintext keys = high score. Encrypted vault with rotation = low score.
Human-in-the-Loop Controls
15%
No approval gates = high score. Mandatory review for destructive actions = low score.
Audit Logging
14%
No logs = high score. Tamper-proof audit trail with SIEM export = low score.

Risk Labels

Low
0–25
Medium
26–50
High
51–75
Critical
76–100

Index 3: Ecosystem Power Index (EPI)

EPI — Ecosystem Power Index
Scale: 0–100

Measures distribution strength, community gravity, and ecosystem reach. A high EPI indicates the framework has strong adoption, vendor integration, and community momentum — making it harder to displace and easier to hire for.

Dimensions & Weights

DimensionWeightWhat It Measures
Adoption Signals
25%
GitHub stars, npm downloads, Docker pulls, community size, Stack Overflow activity
Vendor Integration Breadth
20%
Number of platforms, IDEs, services with native support or official integration
Enterprise Penetration
20%
Known enterprise deployments, SOC2 compliance, support contracts, case studies
Standard Alignment
15%
MCP support, OpenAPI compliance, tool protocol adherence, interoperability
Release Velocity
20%
Commit frequency, release cadence, maintainer activity, issue response time

Momentum Tags

How Scores Change

Indices are living scores. They update when:

All changes are logged in the version history below. Previous scores are preserved for comparison.

Score Dispute Process

Framework maintainers can dispute scores by providing counter-evidence. The process:

  1. Open a GitHub issue on the report repository with the tag score-dispute
  2. Cite specific dimension(s) and provide evidence supporting a different score
  3. We review within 7 days and publish a response with rationale
  4. If accepted, scores update in the next edition with changelog entry

Known Limitations

Version History

v0.1 February 16, 2026 — Initial Release
3 indices (AMI, ARI, EPI) covering 9 frameworks. 38 cited sources. Dimension weights established. Confidence labeling system introduced. Grading and risk label thresholds defined.

Data sources: View all 38 sources · Raw data: frameworks.json · Report: Agents 2026 Edition