Case Studies
From data due diligence to evaluation design: a structured approach to AI reliability in medical imaging.
Project: NIH ChestX-ray14 Data Due Diligence Evaluation
|
Dataset Scale: 112,120 images / 30,805 patients |
|
Objective: Applying the VeraDP Data Due Diligence Protocol to identify the dataset’s strengths, limitations, and risks |
|
Identified Risks:
|
|
Strategic Outcome: The evaluation provides a framework for judgment under uncertainty, allowing R&D teams to account for data limitations before locking in technical and validation strategies. |


Project: Evaluation Design of an AI System Trained on the NIH ChestX-ray14 Dataset
|
From data findings to evaluation strategy |
|
What the label distribution tells us
![]() |
|
What the patient distribution tells us
This asymmetry makes patient-level data splitting mandatory to prevent data leakage between train, validation, and test sets. ![]() |
|
Proposed evaluation strategy
Without data due diligence, there is no reliable evaluation design. The data determines the strategy before any model development begins. |
From data evidence to technical decisions
The findings from the Data Due Diligence and Evaluation Design studies raise a practical question that any R&D team must address before model development: what do we do with the minority classes?
Observations
- 9 out of 14 pathology classes represent less than 5% of the dataset each.
- At this level of representation, training a classification model on these classes is unlikely to yield reliable performance and may introduce false confidence if global metrics are used without scrutiny.
A considered option: OOD-by-design
Rather than forcing the model to learn from insufficient data, one strategic option is to:
- deliberately exclude these extreme minority classes from training
- reserve them exclusively for out-of-distribution (OOD) testing.
This transforms a dataset limitation into a reliability instrument:
- The model is trained on classes with sufficient representation
- Minority class images become a dedicated OOD test set
- The model’s behavior on these unseen classes provides measurable evidence of its generalization limits.
This approach produces a reliable system with known and documented boundaries, which privileges a strong clinical and regulatory position.
This data due diligence makes one thing clear: ignoring minority class representation is not a neutral choice. The right strategy depends on clinical objectives, regulatory pathway, and acceptable risk thresholds.
VeraDP provides the clarity and the options. Technical decisions can now be made on solid ground.
