Case Studies
From data readiness to evaluation design: a structured approach to AI reliability in medical imaging.
Project: NIH ChestX-ray14 Data Due Diligence Evaluation
|
Dataset Scale: 112,120 images / 30,805 patients |
|
Objective: Applying the VeraDP Data Due Diligence Protocol to identify the dataset’s strengths, limitations, and risks |
|
Identified Risks:
|
|
Strategic Outcome: The evaluation provides a framework for judgment under uncertainty, allowing R&D teams to account for data limitations before locking in technical and validation strategies. |


Project: Evaluation Design of an AI System Trained on the NIH ChestX-ray14 Dataset
|
From data findings to evaluation strategy |
|
What the label distribution tells us
![]() |
|
What the patient distribution tells us
This asymmetry makes patient-level data splitting mandatory to prevent data leakage between train, validation, and test sets. ![]() |
|
Proposed evaluation strategy
Without data due diligence, there is no reliable evaluation design. The data determines the strategy before any model development begins. |
From data evidence to technical decisions
The findings from the Data Due Diligence and Evaluation Design studies raise a practical question that any R&D team must address before model development: what do we do with the minority classes?
Observations
- 9 out of 14 pathology classes represent less than 5% of the dataset each.
- At this level of representation, training a classification model on these classes is unlikely to yield reliable performance and may introduce false confidence if global metrics are used without scrutiny.
A considered option: OOD-by-design
Rather than forcing the model to learn from insufficient data, one strategic option is to:
- deliberately exclude these extreme minority classes from training
- reserve them exclusively for out-of-distribution (OOD) testing.
This transforms a dataset limitation into a reliability instrument:
- The model is trained on classes with sufficient representation
- Minority class images become a dedicated OOD test set
- The model’s behavior on these unseen classes provides measurable evidence of its generalization limits.
This approach produces a reliable system with known and documented boundaries, which privileges a strong clinical and regulatory position.
Different options. Your call.
The data due diligence evaluation clarifies that ignoring minority class representation is not a neutral choice.
The right strategy depends on clinical objectives, regulatory pathway, and acceptable risk thresholds. VeraDP provides clarity and options. Decisions belong to the development team
