INDEX MODEL v0.1 — FEBRUARY 2026
Scoring Methodology
This document defines the three proprietary indices used in the AI Tools Landscape Report:
Agent Maturity Index (AMI), Autonomy Risk Index (ARI), and
Ecosystem Power Index (EPI). All dimensions, weights, and grading criteria are published here
for transparency and reproducibility.
Guiding Principles
- Transparency over authority. Every score is decomposable into its dimension scores. Every dimension score links to evidence.
- Confidence labeling is mandatory. No score is presented without a CONFIRMED, INFERRED, or SPECULATIVE tag.
- Scores can change. Indices are versioned. When new evidence arrives, scores update. The changelog records every change.
- No pay-for-score. Sponsorship does not affect index scores. Frameworks are scored identically regardless of commercial relationship.
- Methodology evolves. Dimensions and weights may change between versions. All changes are documented with rationale.
Confidence Labels
Every data point and score carries one of three confidence labels:
✓ CONFIRMED
Based on primary sources: official documentation, published reports, security audits,
GitHub repository data, or direct source code inspection. Verifiable by third parties.
◐ INFERRED
Reasonable conclusion drawn from confirmed data + domain expertise. Example: inferring enterprise
penetration from known partnerships, pricing tiers, and job postings. Directionally reliable but not directly verified.
◌ SPECULATIVE
Forward-looking projection or estimate based on trends. Example: projected enterprise adoption
rates for Q3 2026+. Used sparingly and always labeled. May be wrong.
Index 1: Agent Maturity Index (AMI)
Measures how production-ready an agent framework or system is. A high AMI score means the system
can be deployed in production with reasonable confidence in reliability, security, and operational control.
Dimensions & Weights
| Dimension | Weight | What It Measures |
| Reliability |
|
Uptime, error handling, recovery from failures, consistent output quality |
| Observability |
|
Logging, tracing, debugging tools, execution visibility |
| Tooling Ecosystem |
|
Available integrations, plugins, MCP servers, third-party tools |
| Security Posture |
|
Vulnerability count, CVEs, audit results, secure defaults |
| Execution Control |
|
Permission models, sandboxing, human-in-the-loop, approval gates |
| Deployment Maturity |
|
CI/CD support, containerization, scaling options, enterprise readiness |
Scoring Formula
AMI = Σ (dimension_score × dimension_weight)
Where:
dimension_score = 0–100 (assessed per framework per dimension)
dimension_weight = percentage weight (all weights sum to 100%)
Example (OpenClaw):
AMI = (55×0.20) + (60×0.15) + (75×0.15) + (18×0.20) + (45×0.15) + (60×0.15)
= 11.0 + 9.0 + 11.25 + 3.6 + 6.75 + 9.0
= 50.6 → rounded to 52 (with qualitative adjustment for evidence strength)
Letter Grades
Index 2: Autonomy Risk Index (ARI)
Measures risk exposure when running a system autonomously. Unlike AMI (where higher is better),
ARI is an inverse score — lower is safer. A high ARI means the system poses significant risk
when running without continuous human oversight.
Dimensions & Weights
| Dimension | Weight | What It Measures (Higher = More Risk) |
| Permission Model Strength |
|
Weak/missing permission boundaries = high score. Granular enforcement = low score. |
| Sandboxing / Isolation |
|
No isolation = high score. Container/VM isolation with network segmentation = low score. |
| Default Network Exposure |
|
Open ports, public endpoints = high score. No listening services = low score. |
| Secret Handling |
|
Plaintext keys = high score. Encrypted vault with rotation = low score. |
| Human-in-the-Loop Controls |
|
No approval gates = high score. Mandatory review for destructive actions = low score. |
| Audit Logging |
|
No logs = high score. Tamper-proof audit trail with SIEM export = low score. |
Risk Labels
Index 3: Ecosystem Power Index (EPI)
Measures distribution strength, community gravity, and ecosystem reach. A high EPI indicates
the framework has strong adoption, vendor integration, and community momentum — making it harder to displace
and easier to hire for.
Dimensions & Weights
| Dimension | Weight | What It Measures |
| Adoption Signals |
|
GitHub stars, npm downloads, Docker pulls, community size, Stack Overflow activity |
| Vendor Integration Breadth |
|
Number of platforms, IDEs, services with native support or official integration |
| Enterprise Penetration |
|
Known enterprise deployments, SOC2 compliance, support contracts, case studies |
| Standard Alignment |
|
MCP support, OpenAPI compliance, tool protocol adherence, interoperability |
| Release Velocity |
|
Commit frequency, release cadence, maintainer activity, issue response time |
Momentum Tags
- Rising: EPI score increasing >10 points quarter-over-quarter. Example: OpenClaw (new entrant, explosive growth).
- Stable: EPI score change within ±10 points. Example: LangChain (established, consistent community).
- Declining: EPI score decreasing >10 points quarter-over-quarter. No current frameworks in this category.
How Scores Change
Indices are living scores. They update when:
- New evidence emerges — A security audit, a new release, a partnership announcement
- Corrections are submitted — Framework maintainers or community members can dispute scores with evidence
- Methodology updates — Dimension weights may shift between versions as the landscape evolves
- Time passes — Enterprise penetration and adoption signals change quarterly
All changes are logged in the version history below. Previous scores are preserved for comparison.
Score Dispute Process
Framework maintainers can dispute scores by providing counter-evidence. The process:
- Open a GitHub issue on the report repository with the tag
score-dispute
- Cite specific dimension(s) and provide evidence supporting a different score
- We review within 7 days and publish a response with rationale
- If accepted, scores update in the next edition with changelog entry
Known Limitations
- Subjectivity in weighting. The choice of 20% for Security vs 15% for Observability is a judgment call. Different use cases may warrant different weights.
- Inferred scores are estimates. Where direct evidence isn't available (e.g., enterprise penetration for private companies), we use proxies like job postings, pricing tiers, and partnership announcements.
- Snapshot in time. Scores reflect the state as of the edition date. Fast-moving projects may have changed significantly since publication.
- Conflict of interest. Clawdia is developed by the same team that produces this report. We address this by: (a) scoring Clawdia using the same methodology as all other frameworks, (b) being transparent about low scores (EPI: 8, AMI: 44), and (c) publishing this methodology for independent verification.
- No formal audit. These indices are not produced by a standards body. They are analytical scores by an industry research publication.
Version History
v0.1
February 16, 2026 — Initial Release
3 indices (AMI, ARI, EPI) covering 9 frameworks. 38 cited sources.
Dimension weights established. Confidence labeling system introduced.
Grading and risk label thresholds defined.
Data sources: View all 38 sources ·
Raw data: frameworks.json ·
Report: Agents 2026 Edition