Evaluation Theatre
Evaluation criteria look rigorous but are structured so any compliant vendor scores similarly. The decision is made on price or relationship.
Evaluation criteria are designed to look rigorous but are structured so that any compliant vendor scores similarly on capability. The decision is effectively made on price, relationship, or incumbent advantage, but the process creates the appearance of merit-based selection.
Recognition signals
- Evaluation criteria are generic — "demonstrate experience in health IT" — rather than scenario-specific. Any vendor with a pulse and a portfolio can meet them. The criteria test compliance, not capability.
- All shortlisted vendors score within 5% on non-price criteria. The evaluation produces the illusion of differentiation but the scores are so compressed that price becomes the deciding factor by default.
- Price weighting exceeds 50% in a high-quality-risk engagement. For complex implementations where the wrong vendor creates years of technical debt, the evaluation is structured as though the primary risk is overpaying.
- No scenario-based evaluation or practical demonstration. Vendors are assessed on what they claim, not what they can show. Written responses are polished by bid teams; practical demonstrations expose actual capability.
- Reference checks are tick-box confirmations, not structured investigations. The questions are generic ("were you satisfied?"), the referees are hand-picked, and the evaluation panel lacks the technical depth to probe capability claims.
Structural cause
Why this happens
Evaluation criteria are often drafted by procurement (who understand process but not domain) or by the PM (who knows what they need but not how to test for it). Generic criteria are safer — less probity risk, faster to produce — and the criteria that would actually differentiate require domain expertise to write and assess.
The mechanism is structural, not intentional. Writing scenario-based evaluation criteria requires someone who understands both the domain and the evaluation process. That combination is rare. Procurement teams default to generic criteria because they're defensible. PMs default to capability statements because they're familiar. Neither produces criteria that differentiate on the dimensions that matter for delivery.
The result is invisible until after contract award. The evaluation process ran, scores were assigned, a winner was selected — everything looks rigorous. But the criteria never tested the things that predict delivery success: how the vendor handles scope ambiguity, how their team responds under pressure, whether their architecture decisions are sound. The evaluation tested whether the vendor could write a compliant bid response.
Risk mapping
| Risk | Description |
|---|---|
| P4 | Evaluation rigour gap — criteria that don't differentiate on delivery-critical dimensions |
Self-assessment
When to worry
- All shortlisted vendors scored within 5% of each other on non-price criteria
- Evaluation criteria are generic capability statements, not scenario-specific assessments
- Price weighting exceeds 50% for a complex engagement
- No practical demonstration or scenario walkthrough in the evaluation process
When you're OK
- Evaluation scores show more than 10% differentiation between vendors on non-price criteria
- The evaluation panel includes domain expertise relevant to the engagement
- Reference check questions are tailored to engagement-specific risks
Related reading
- Specification Gap — generic requirements produce generic evaluation criteria
- Threshold Blindness — procurement thresholds shape scope, and evaluation criteria shape selection — both are structural distortions
If all your vendors score the same on quality criteria, your criteria didn't differentiate — you just selected on price with extra steps.
A procurement readiness review tests whether your evaluation criteria can actually distinguish between vendors on the dimensions that predict delivery success. 10fifteen — programme governance assessments.