Source: frameworks/kit-interview-scorecard-design/04-quality.md

Quality — Scorecard QC Checklists

Gate 1: Binary — all items must pass before building starts. No scoring. No partial credit. A single failure stops the build.

Gate 2: Weighted — 100 points total. Pass threshold: 90/100. Run after every build and after every revision.

Both gates must pass before any scorecard is deployed to an interview team.

Gate 1 — Pre-Build (Gap Protocol)

Run this before opening the build skill. Any failure stops the build. Produce a gap report and resolve before proceeding.

Routing Check — Run This First

Is an extraction interview available for this practitioner's scorecard methodology?

Yes — interview transcript or detailed session notes exist → proceed through Gate 1 with the transcript as primary source
No — only a prior scorecard template or no source material exists → stop here. Do not run Gate 1 yet. Go to 06-consultant-methodology.md and conduct the extraction interview first.

If the advisor confirms that a template-only build is intentional (e.g., producing a working draft for practitioner review) — document that decision, label the output clearly as a working draft, and proceed with the understanding that significant gaps will exist and the scorecard is not deployment-ready.

Upstream Inputs

[ ] Position profile or job description is available for this role
[ ] Must-have and nice-to-have requirements have been confirmed
[ ] Organization's mission, vision, and values are documented or captured
[ ] Interview team composition is known (at minimum, roles; ideally, names)

Extraction Coverage

[ ] Extraction interview transcript or session notes are available
[ ] Practitioner's scoring methodology has been captured (scale, definitions, justification requirement)
[ ] Focus area determination approach has been documented
[ ] Behavior-based question approach has been confirmed (framework, bank vs. custom, number per area)
[ ] Interviewer preparation methodology has been captured

Focus Area Design

[ ] Focus areas have been identified for this role
[ ] Each focus area has a description of what "good" looks like
[ ] Focus areas trace back to must-have requirements or confirmed competency domains
[ ] No orphaned focus areas (every focus area maps to a role requirement)
[ ] No orphaned requirements (every critical must-have is covered by at least one focus area)
[ ] Focus areas have been assigned to specific interviewers (or roles, if names not yet confirmed)

Scoring

[ ] Scoring scale is defined with descriptions for each level
[ ] Overall recommendation scale is defined (Strong Yes / Yes / No / Strong No or equivalent)
[ ] Written justification is required for every score
[ ] Fact-based justification requirement is explicit (not opinions, not impressions)

Gap Report Status

[ ] All gaps identified are listed in the gap report
[ ] Every gap is either RESOLVED or has a documented resolution path with advisor sign-off
[ ] No gap is marked RESOLVED by inference or assumption

Gate 2 — Post-Build (100 points, 90+ to pass)

Run in this order after every build and every revision.

Design Integrity (35 points)

#	Check	Points
1	Every focus area traces to a must-have requirement or confirmed competency domain	6
2	Every critical must-have from the position profile is covered by at least one focus area	6
3	No focus area is duplicated across interviewers unless intentionally overlapped	4
4	Each focus area description defines what "good" looks like in observable terms	5
5	Each focus area contains 3-5 behavior-based questions (not hypothetical, not leading)	5
6	Questions within each focus area are distinct — no overlapping evaluation targets	4
7	Scoring scale levels are defined and distinguishable from each other	5

Design integrity failures are blocking. A scorecard that evaluates the wrong things, misses critical requirements, or uses undefined scoring is not deployable regardless of total score.

Legal Defensibility (20 points)

#	Check	Points
8	Same questions will be asked of every candidate for the same focus area	5
9	Scoring criteria are consistent across candidates — no per-candidate adjustments	4
10	Written justification is required (not optional) for every score	4
11	Justification prompt specifies fact-based evidence, not impressions or feelings	4
12	No questions that could elicit protected-class information (age, family status, religion, disability, national origin)	3

Defensibility failures are blocking. A scorecard that enables inconsistent evaluation or includes legally problematic questions must be fixed before deployment.

Content Accuracy (15 points)

#	Check	Points
13	Role title matches position profile exactly	2
14	Organization name matches reference data exactly	2
15	Interviewer names (if included) match reference data exactly	3
16	Must-have requirements referenced in focus areas match position profile language	4
17	No content drawn from the golden example as a source	2
18	No content from a prior client's scorecard carried into this build	2

Usability (15 points)

#	Check	Points
19	Scorecard can be completed within the interview time allocation (not too many questions per focus area)	3
20	Instructions are clear — interviewer knows what to do with each section without external explanation	3
21	Justification fields have enough space for meaningful responses (not a single-line text box)	2
22	Focus area descriptions are specific enough that two interviewers would evaluate the same things	3
23	Recommendation section is clearly separated from section-level scoring	2
24	Submission instructions are included (where to send, deadline, who receives it)	2

Presentation Section (10 points — skip if no presentation in process)

#	Check	Points
25	Presentation evaluation criteria are defined (not just "how they presented")	3
26	Time management is an explicit criterion (did they stay within limits?)	2
27	Q&A handling is an explicit criterion	2
28	Criteria are consistent for every candidate presenting	3

If no presentation in the process, redistribute these 10 points: add 3 to Design Integrity, 3 to Defensibility, 2 to Usability, 2 to Content Accuracy.

Debrief Readiness (5 points)

#	Check	Points
29	Scorecard structure allows facilitator to aggregate scores across interviewers	2
30	Recommendation scale is consistent across all interviewers' scorecards	2
31	Submission deadline and no-cross-visibility rule are documented in the scorecard or its instructions	1

Scoring Summary

Category	Points
Design Integrity	35
Legal Defensibility	20
Content Accuracy	15
Usability	15
Presentation Section	10
Debrief Readiness	5
Total	100

Pass threshold: 90/100

Blocking failures (must fix regardless of score):

Any design integrity failure (focus areas don't trace to requirements, critical must-haves uncovered)
Any defensibility failure (inconsistent evaluation criteria, legally problematic questions)
Any invented content (focus areas from the golden example, content from another client's scorecard)

After any revision: Return to Gate 2 and run all checks again. A fix may introduce a new issue.

Common Failure Modes

Failure	What It Looks Like	Root Cause	Fix
Generic focus areas	Scorecard evaluates "Leadership" and "Communication" without role-specific definition	Focus areas pulled from a generic competency model, not from the position profile and extraction	Rebuild focus areas from must-haves and extraction; generic competency names are starting points, not finished focus areas
Orphaned requirements	Position profile lists "donor relationship management" as a must-have; no focus area evaluates it	Focus area design didn't systematically trace back to must-haves	Cross-reference every must-have against focus area list; add coverage for any gap
Hypothetical questions	"What would you do if you inherited a team with morale issues?"	Questions not structured as behavior-based; interviewer gets hypothetical answers, not evidence	Rewrite: "Tell me about a time you inherited a team with morale challenges. What did you find, what did you do, and what changed?"
Undefined scoring scale	Scale says "1-5" but never defines what each number means	Scoring methodology not captured during extraction; scale imported from a template without calibration	Define each level in behavioral terms; confirm with practitioner
Duplicated focus areas	Two interviewers both evaluating "strategic thinking" with different questions	Focus area assignments not mapped during alignment; interviewers chose their own areas	Map all focus areas to interviewers before deployment; identify intentional overlaps vs. accidental duplication
Opinion-based justification	Interviewer writes "Seemed like a great culture fit" with no supporting evidence	Justification prompt doesn't explicitly require fact-based evidence	Add explicit prompt: "Reference specific candidate statements, behaviors, or demonstrated competencies"
Prior scorecard recycled	Scorecard uses focus areas from a different role at a different organization	Template-only build without extraction; prior scorecard treated as content source	Every scorecard is built from the current role's position profile and extraction, not from a prior engagement
Protected-class questions	"Do you have children?" appears in a work-life balance focus area	Question bank not reviewed for legal compliance	Remove immediately; review all questions for protected-class exposure; add legal review to QC