Annex A Controls: Data for AI Systems (A.7)
Detailed guidance on implementing Annex A controls for AI data management (A.7), covering data acquisition, quality, provenance, preparation, and management with 5 controls.
Chapter Overview
This chapter covers the Data for AI Systems domain (A.7), which ensures organizations properly manage the data that powers their AI systems. Data quality and governance are critical for AI system performance and trustworthiness. This domain contains 5 controls.
A.7 Data for AI Systems
Data is the foundation of AI systems. Poor data leads to poor AI outcomes, regardless of model sophistication.
"Garbage In, Garbage Out" - AI systems are only as good as their data. Data issues cause:
• Biased outcomes
• Poor model performance
• Unreliable predictions
• Compliance violations
• Reputational damage
A.7.2 Data Acquisition
| Attribute | Details |
|---|---|
| Control | Requirements for data acquisition shall be defined, documented, and implemented. |
| Purpose | Ensure data is acquired appropriately and legally |
| Related Clause | 8.1 (Operational planning and control) |
Implementation Guidance
- Define data requirements before acquisition
- Identify and evaluate potential data sources
- Assess legal basis for data collection (consent, legitimate interest)
- Document data acquisition procedures
- Establish data contracts and agreements
- Implement data intake validation
- Maintain records of data sources
Data Acquisition Considerations
| Aspect | Considerations |
|---|---|
| Legal Basis | Consent, contract, legal obligation, legitimate interest |
| Source Reliability | Trustworthiness, accuracy, completeness of source |
| Rights & Licenses | Usage rights, restrictions, attribution requirements |
| Privacy | Personal data, anonymization, data protection compliance |
| Ethics | Ethical sourcing, fair compensation, informed participation |
| Representativeness | Does data represent target population/use case? |
• How do you define data requirements for AI systems?
• What is your data acquisition process?
• How do you assess data sources?
• What legal basis do you have for data collection?
• Show me data acquisition documentation
• How do you handle third-party data?
A.7.3 Data Quality
| Attribute | Details |
|---|---|
| Control | Quality of data used in AI systems shall be managed throughout the AI system life cycle. |
| Purpose | Ensure data quality supports AI system performance |
| Related Clause | 8.1 (Operational planning and control) |
Implementation Guidance
- Define data quality dimensions and metrics
- Establish quality thresholds and acceptance criteria
- Implement data quality checks and validation
- Monitor data quality continuously
- Address quality issues promptly
- Document quality assessments and remediation
Data Quality Dimensions
| Dimension | Description | Metrics Examples |
|---|---|---|
| Accuracy | Data correctly represents reality | Error rate, validation match rate |
| Completeness | All required data is present | Missing value percentage, field fill rate |
| Consistency | Data is uniform across sources | Conflict rate, format conformance |
| Timeliness | Data is sufficiently current | Data age, refresh frequency |
| Validity | Data conforms to defined rules | Validation pass rate, range compliance |
| Uniqueness | No unintended duplicates | Duplicate rate, deduplication metrics |
| Representativeness | Data reflects target population | Distribution analysis, coverage metrics |
For each AI system, establish:
• Quality dimensions relevant to the use case
• Metrics for each dimension
• Acceptance thresholds
• Measurement frequency
• Quality monitoring procedures
• Issue escalation process
• Remediation procedures
• How do you define data quality for AI systems?
• What quality metrics do you use?
• Show me data quality reports
• How do you handle data quality issues?
• How do you monitor quality over time?
• What are your quality thresholds?
A.7.4 Data Provenance
| Attribute | Details |
|---|---|
| Control | Provenance of data used for AI systems shall be documented. |
| Purpose | Enable traceability and understanding of data origins |
| Related Clause | 7.5 (Documented information) |
Implementation Guidance
- Track data lineage from source to use
- Document data origins and transformations
- Maintain audit trails for data changes
- Record data processing history
- Enable traceability for compliance and debugging
- Implement data versioning
Provenance Information
| Element | What to Document |
|---|---|
| Origin | Original source, collection method, collection date |
| Ownership | Data owner, custodian, steward |
| Transformations | Processing steps, algorithms applied, parameters |
| Quality | Quality assessments, issues found, remediation |
| Versions | Version history, change records |
| Usage | Which AI systems use this data, for what purpose |
| Access | Who accessed, when, for what purpose |
Customer Transaction Data Lineage:
1. Source: POS systems, collected daily
2. Ingestion: ETL pipeline, raw data lake storage
3. Cleaning: Deduplication, validation, null handling
4. Transformation: Feature engineering, aggregation
5. Training: Used in fraud detection model v2.3
6. Version: Dataset v4, created 2024-01-15
• How do you track data provenance?
• Show me data lineage for [AI system]
• How do you document data transformations?
• Can you trace data back to its source?
• How do you version datasets?
A.7.5 Data Preparation
| Attribute | Details |
|---|---|
| Control | Processes for preparing data for use in AI systems shall be documented, including preprocessing, labeling, and cleaning. |
| Purpose | Ensure consistent and appropriate data preparation |
| Related Clause | 8.1 (Operational planning and control) |
Implementation Guidance
- Document data preparation procedures
- Standardize preprocessing pipelines
- Establish labeling guidelines and quality standards
- Implement data cleaning procedures
- Validate prepared data before use
- Version control preparation scripts and configurations
Data Preparation Activities
| Activity | Description | Considerations |
|---|---|---|
| Cleaning | Handle missing values, outliers, errors | Document decisions, avoid introducing bias |
| Transformation | Normalize, encode, aggregate | Reversibility, information preservation |
| Feature Engineering | Create features for model input | Domain knowledge, leakage prevention |
| Labeling | Annotate data for supervised learning | Labeler training, inter-rater reliability |
| Splitting | Create train/validation/test sets | Stratification, temporal considerations |
| Balancing | Address class imbalance | Technique selection, impact assessment |
For labeled data, implement:
• Labeling guidelines and instructions
• Labeler training and qualification
• Inter-rater reliability measurement
• Quality sampling and review
• Disagreement resolution process
• Label audit trails
• What is your data preparation process?
• Show me preparation documentation
• How do you handle missing data?
• What are your labeling procedures?
• How do you ensure labeling quality?
• How do you validate prepared data?
A.7.6 Data Management
| Attribute | Details |
|---|---|
| Control | Data shall be managed according to defined requirements throughout the data life cycle and the AI system life cycle. |
| Purpose | Ensure comprehensive data governance |
| Related Clause | 8.1 (Operational planning and control) |
Implementation Guidance
- Establish data governance framework
- Define data ownership and stewardship
- Implement access controls
- Manage data retention and disposal
- Ensure data security and privacy
- Monitor data usage and compliance
Data Management Areas
| Area | Activities |
|---|---|
| Governance | Policies, standards, ownership, accountability |
| Security | Encryption, access control, threat protection |
| Privacy | Consent management, anonymization, rights fulfillment |
| Storage | Storage architecture, backup, disaster recovery |
| Retention | Retention schedules, archival, disposal |
| Compliance | Regulatory requirements, audit support |
Manage data through:
1. Creation/Collection: Acquisition, generation
2. Storage: Secure, accessible storage
3. Use: Processing, analysis, AI training
4. Sharing: Internal and external sharing
5. Archival: Long-term preservation
6. Disposal: Secure deletion when no longer needed
• What is your data governance framework?
• Who owns data used in AI systems?
• How do you control access to AI data?
• What are your data retention policies?
• How do you ensure data security?
• How do you handle data disposal?
Control Implementation Summary
| Control | Key Evidence | Common Gaps |
|---|---|---|
| A.7.2 Data Acquisition | Acquisition procedures, source assessments, contracts | No documented acquisition process |
| A.7.3 Data Quality | Quality metrics, monitoring reports, thresholds | Quality not monitored continuously |
| A.7.4 Data Provenance | Lineage documentation, version history | Cannot trace data to source |
| A.7.5 Data Preparation | Preparation procedures, labeling guidelines | Undocumented preparation steps |
| A.7.6 Data Management | Governance framework, policies, access controls | No formal data governance |
1. Data acquisition must have documented requirements and legal basis
2. Data quality requires defined dimensions, metrics, and continuous monitoring
3. Provenance enables traceability from source through transformations to use
4. Data preparation (including labeling) must be documented and quality-controlled
5. Data management spans the entire data and AI system lifecycle
6. Poor data governance is a common source of AI failures