Overview
What OpenDecisions is, why it exists, who it's for, and what we're building
Overview
This page explains what we're building and why, so that anyone (partners, funders, builders, or evaluators) can understand the intent and use case behind OpenDecisions and this prototype.
About this prototype
This documentation and the OpenDecisions API you're reading about are part of a prototype. The goal is to demonstrate a working schema, pipeline, and read-only API for decision-relevant education data, so that the approach can be evaluated, extended, or adopted. The intent and design principles below apply to the full OpenDecisions vision, not only to this prototype.
The problem
Millions of people make life-changing education decisions every year with bad information.
- Fragmented data. Cost, admissions, and outcomes live in different places (IPEDS, College Scorecard, institutional PDFs, marketing sites) with no common format, stable IDs, or way to compare.
- No provenance. Numbers are quoted without sources, timestamps, or method. You can't check where "graduation rate" or "net price" came from or how old it is.
- False precision. Tools and rankings imply certainty that doesn't exist. "Admission chance" or "fit" scores are often opaque, un-auditable, and can mislead.
- Inequity. Students without well-resourced counselors bear the biggest cost: more wasted applications, more debt, more mismatch, and less access to transparent, comparable facts.
Result: Bad fit, avoidable debt, dropped dreams, and years of "I wish I'd known."
Why it matters
- Financial: College is one of the largest investments people make. Wrong choices mean debt without commensurate outcomes, or missed chances at affordable, high-value options.
- Time and opportunity: Application cycles are expensive in time and stress. Without comparable, trustworthy signals, people over-apply, under-apply, or target poorly.
- Equity: Transparent, auditable data is a public good. When it's missing, the gap widens between those who have access to good advice and those who don't.
OpenDecisions exists so what is known can be separated from what is uncertain, and so everyone can see the sources, the gaps, and the confidence behind every number.
The idea (in plain language)
OpenDecisions does three things:
- Schematizes decision-relevant facts (cost, competitiveness, outcomes, policy) into a single, open model, with stable IDs and clear provenance for every number.
- Computes "decision labels" (like nutrition labels): comparable scores for competitiveness, affordability, completion risk, and outcome value, each with uncertainty, explanation, and citations. Data quality is always shown so users know how much to trust what they see.
- Publishes a public dataset (bulk files + API) that anyone can query, build on, or redistribute, with reproducible releases and a strict "no source, no claim" rule.
We are not predicting admission decisions from secret rubrics. We are publishing auditable facts + transparent indexes + honest uncertainty so people can decide for themselves.
Use cases (who uses it, and how)
| Who | What they do with OpenDecisions |
|---|---|
| Students (domestic and international) | Compare cost, competitiveness, and outcomes across schools in one place; see where numbers come from and what's missing; get affordability and fit signals with clear bands and caveats instead of fake precision. |
| Counselors and advisors | Answer "Can my student afford this?" and "How selective is it?" with cited, up-to-date data; show clients the drivers and gaps behind each label; reduce time spent chasing inconsistent PDFs and spreadsheets. |
| Nonprofits and access orgs | Power low-cost tools and campaigns with a single, reproducible data layer; serve first-gen and under-resourced students with the same transparency that well-resourced families often get privately. |
| Researchers and policymakers | Study affordability, outcomes, and access with a consistent schema and provenance; audit and extend the formulas; avoid rebuilding one-off datasets. |
| Edtech and product builders | Integrate via API or bulk dumps instead of scraping and normalizing many sources; build search, comparison, and recommendation on a trusted, documented base. |
Impact (what changes when OpenDecisions exists)
- Better decisions: People choose with comparable, cited facts and visible uncertainty, not marketing copy or opaque scores.
- Less waste: Fewer mis-targeted applications, less debt from choices made without clear cost and outcome signals.
- More equity: A public, open layer gives under-resourced students and the orgs that serve them access to the same structured, auditable information.
- Ecosystem leverage: OpenDecisions is infrastructure. Like OpenAlex for research metadata, it's a foundation others build on (counseling tools, comparison sites, policy dashboards, research) without reinventing schema, provenance, or formulas.
- Trust by design: Provenance, data-quality signals, and conservative bands for sensitive outputs (e.g. admission chance) reduce harm from false precision and black-box models.
What we're building (year 1)
- Public dataset: Bulk dumps (e.g. JSONL, Parquet) and a read-only API for US higher ed: institutions, programs, cost, outcomes, and decision labels.
- Decision labels that answer:
- How competitive is this place? (selectivity context)
- Can I afford it given my budget? (affordability)
- What's the downside (e.g. completion risk) and upside (e.g. outcomes)?
- How reliable is this label? (data quality and missingness, always shown)
- Given my profile: academic fit and admission chance as bands (e.g. Reach / Target / Likely) with clear uncertainty, not false precise "odds."
- Governance pack: Data sheets, source inventory, formula docs, and release notes with every publish, so the system is auditable and improvable.
Later: UK and EU country packs; deeper program-level data where open sources exist.
Why OpenDecisions (design principles)
- Open and reproducible. Schema, formulas, and release process are public. Any number can be traced to sources and pipeline version. No proprietary black box.
- Provenance is non-negotiable. Every claim links to source, time, collection method, and license. "No source, no claim."
- Uncertainty is first-class. We show confidence, missingness, and caveats. We use bands and intervals for high-stakes, profile-conditioned outputs, not fake point estimates.
- Safety by default. We suppress small-cell data, avoid protected-attribute harm, and never promise individualized guarantees we can't back with governed, labeled data.
- One schema, many countries. We start in the US and expand via "country packs" (UK, EU, etc.) that map local data into the same core model. No schema forks; global comparability over time.
What we don't do (guardrails)
- We do not predict admission decisions using confidential review rubrics or unsanctioned data.
- We do not publish individualized or subgroup-specific odds that violate privacy or risk discrimination.
- We do not make causal claims ("doing X increases your odds by Y") without explicit design and stated assumptions.
- We do not hide uncertainty or data quality. If data is weak or missing, we say so and reduce confidence. We never dress it up as precise.
Summary
OpenDecisions = nutrition labels for education.
We turn messy, scattered college data into a public, open, citeable dataset with transparent, uncertainty-aware decision labels, so students and the people who advise them can make better choices. We prioritize provenance, safety, and equity; we avoid false precision and black boxes. The result is reusable infrastructure that counselors, edtech, researchers, and policymakers can build on.
One schema. Many country packs. Evidence over guesswork.
Next steps
- Introduction: Documentation home and quick links
- Guides: Get started with the API and schema
- Schema reference: Types, vocabulary, and field reference
- API reference: Endpoints and usage