Chapter 15

Annex A Controls: Data for AI Systems (A.7)

Detailed guidance on implementing Annex A controls for AI data management (A.7), covering data acquisition, quality, provenance, preparation, and management with 5 controls.

20 min read

Chapter Overview

This chapter covers the Data for AI Systems domain (A.7), which ensures organizations properly manage the data that powers their AI systems. Data quality and governance are critical for AI system performance and trustworthiness. This domain contains 5 controls.

A.7 Data for AI Systems

Data is the foundation of AI systems. Poor data leads to poor AI outcomes, regardless of model sophistication.

Why Data Controls Matter

"Garbage In, Garbage Out" - AI systems are only as good as their data. Data issues cause:
• Biased outcomes
• Poor model performance
• Unreliable predictions
• Compliance violations
• Reputational damage

A.7.2 Data Acquisition

AttributeDetails
ControlRequirements for data acquisition shall be defined, documented, and implemented.
PurposeEnsure data is acquired appropriately and legally
Related Clause8.1 (Operational planning and control)

Implementation Guidance

  • Define data requirements before acquisition
  • Identify and evaluate potential data sources
  • Assess legal basis for data collection (consent, legitimate interest)
  • Document data acquisition procedures
  • Establish data contracts and agreements
  • Implement data intake validation
  • Maintain records of data sources

Data Acquisition Considerations

AspectConsiderations
Legal BasisConsent, contract, legal obligation, legitimate interest
Source ReliabilityTrustworthiness, accuracy, completeness of source
Rights & LicensesUsage rights, restrictions, attribution requirements
PrivacyPersonal data, anonymization, data protection compliance
EthicsEthical sourcing, fair compensation, informed participation
RepresentativenessDoes data represent target population/use case?
Audit Questions - A.7.2

• How do you define data requirements for AI systems?
• What is your data acquisition process?
• How do you assess data sources?
• What legal basis do you have for data collection?
• Show me data acquisition documentation
• How do you handle third-party data?

A.7.3 Data Quality

AttributeDetails
ControlQuality of data used in AI systems shall be managed throughout the AI system life cycle.
PurposeEnsure data quality supports AI system performance
Related Clause8.1 (Operational planning and control)

Implementation Guidance

  • Define data quality dimensions and metrics
  • Establish quality thresholds and acceptance criteria
  • Implement data quality checks and validation
  • Monitor data quality continuously
  • Address quality issues promptly
  • Document quality assessments and remediation

Data Quality Dimensions

DimensionDescriptionMetrics Examples
AccuracyData correctly represents realityError rate, validation match rate
CompletenessAll required data is presentMissing value percentage, field fill rate
ConsistencyData is uniform across sourcesConflict rate, format conformance
TimelinessData is sufficiently currentData age, refresh frequency
ValidityData conforms to defined rulesValidation pass rate, range compliance
UniquenessNo unintended duplicatesDuplicate rate, deduplication metrics
RepresentativenessData reflects target populationDistribution analysis, coverage metrics
Data Quality Framework

For each AI system, establish:
• Quality dimensions relevant to the use case
• Metrics for each dimension
• Acceptance thresholds
• Measurement frequency
• Quality monitoring procedures
• Issue escalation process
• Remediation procedures

Audit Questions - A.7.3

• How do you define data quality for AI systems?
• What quality metrics do you use?
• Show me data quality reports
• How do you handle data quality issues?
• How do you monitor quality over time?
• What are your quality thresholds?

A.7.4 Data Provenance

AttributeDetails
ControlProvenance of data used for AI systems shall be documented.
PurposeEnable traceability and understanding of data origins
Related Clause7.5 (Documented information)

Implementation Guidance

  • Track data lineage from source to use
  • Document data origins and transformations
  • Maintain audit trails for data changes
  • Record data processing history
  • Enable traceability for compliance and debugging
  • Implement data versioning

Provenance Information

ElementWhat to Document
OriginOriginal source, collection method, collection date
OwnershipData owner, custodian, steward
TransformationsProcessing steps, algorithms applied, parameters
QualityQuality assessments, issues found, remediation
VersionsVersion history, change records
UsageWhich AI systems use this data, for what purpose
AccessWho accessed, when, for what purpose
Data Lineage Example

Customer Transaction Data Lineage:
1. Source: POS systems, collected daily
2. Ingestion: ETL pipeline, raw data lake storage
3. Cleaning: Deduplication, validation, null handling
4. Transformation: Feature engineering, aggregation
5. Training: Used in fraud detection model v2.3
6. Version: Dataset v4, created 2024-01-15

Audit Questions - A.7.4

• How do you track data provenance?
• Show me data lineage for [AI system]
• How do you document data transformations?
• Can you trace data back to its source?
• How do you version datasets?

A.7.5 Data Preparation

AttributeDetails
ControlProcesses for preparing data for use in AI systems shall be documented, including preprocessing, labeling, and cleaning.
PurposeEnsure consistent and appropriate data preparation
Related Clause8.1 (Operational planning and control)

Implementation Guidance

  • Document data preparation procedures
  • Standardize preprocessing pipelines
  • Establish labeling guidelines and quality standards
  • Implement data cleaning procedures
  • Validate prepared data before use
  • Version control preparation scripts and configurations

Data Preparation Activities

ActivityDescriptionConsiderations
CleaningHandle missing values, outliers, errorsDocument decisions, avoid introducing bias
TransformationNormalize, encode, aggregateReversibility, information preservation
Feature EngineeringCreate features for model inputDomain knowledge, leakage prevention
LabelingAnnotate data for supervised learningLabeler training, inter-rater reliability
SplittingCreate train/validation/test setsStratification, temporal considerations
BalancingAddress class imbalanceTechnique selection, impact assessment
Labeling Quality Controls

For labeled data, implement:
• Labeling guidelines and instructions
• Labeler training and qualification
• Inter-rater reliability measurement
• Quality sampling and review
• Disagreement resolution process
• Label audit trails

Audit Questions - A.7.5

• What is your data preparation process?
• Show me preparation documentation
• How do you handle missing data?
• What are your labeling procedures?
• How do you ensure labeling quality?
• How do you validate prepared data?

A.7.6 Data Management

AttributeDetails
ControlData shall be managed according to defined requirements throughout the data life cycle and the AI system life cycle.
PurposeEnsure comprehensive data governance
Related Clause8.1 (Operational planning and control)

Implementation Guidance

  • Establish data governance framework
  • Define data ownership and stewardship
  • Implement access controls
  • Manage data retention and disposal
  • Ensure data security and privacy
  • Monitor data usage and compliance

Data Management Areas

AreaActivities
GovernancePolicies, standards, ownership, accountability
SecurityEncryption, access control, threat protection
PrivacyConsent management, anonymization, rights fulfillment
StorageStorage architecture, backup, disaster recovery
RetentionRetention schedules, archival, disposal
ComplianceRegulatory requirements, audit support
Data Lifecycle Stages

Manage data through:
1. Creation/Collection: Acquisition, generation
2. Storage: Secure, accessible storage
3. Use: Processing, analysis, AI training
4. Sharing: Internal and external sharing
5. Archival: Long-term preservation
6. Disposal: Secure deletion when no longer needed

Audit Questions - A.7.6

• What is your data governance framework?
• Who owns data used in AI systems?
• How do you control access to AI data?
• What are your data retention policies?
• How do you ensure data security?
• How do you handle data disposal?

Control Implementation Summary

ControlKey EvidenceCommon Gaps
A.7.2 Data AcquisitionAcquisition procedures, source assessments, contractsNo documented acquisition process
A.7.3 Data QualityQuality metrics, monitoring reports, thresholdsQuality not monitored continuously
A.7.4 Data ProvenanceLineage documentation, version historyCannot trace data to source
A.7.5 Data PreparationPreparation procedures, labeling guidelinesUndocumented preparation steps
A.7.6 Data ManagementGovernance framework, policies, access controlsNo formal data governance
Key Takeaways - A.7

1. Data acquisition must have documented requirements and legal basis
2. Data quality requires defined dimensions, metrics, and continuous monitoring
3. Provenance enables traceability from source through transformations to use
4. Data preparation (including labeling) must be documented and quality-controlled
5. Data management spans the entire data and AI system lifecycle
6. Poor data governance is a common source of AI failures

AI Assistant
00:00