Ensuring the security of an AI system’s development

02 July 2024

The security of AI systems is an issue too often overlooked by their designers. However, it remains an obligation to guarantee data protection both during the development of the system and in anticipation of its deployment. This how-to sheet details the risks and measures to be taken as recommended by the CNIL.

This content is a courtesy translation of the original publication in French. In the event of any inconsistencies between the French version and this English translation, please note that the French version shall prevail.

The security of the processing of personal data is an obligation laid down in Article 32 of the GDPR. It states that it must be implemented taking into account “the state of the art, the costs of implementation and the nature, scope, context and purposes of processing as well as the risk of varying likelihood and severity for the rights and freedoms of natural persons”. The security of processing is therefore an obligation to be implemented by risk-appropriate measures.

In practice, for the development of an AI system, a “traditional” security analysis should be combined, in particular on environmental security (including infrastructure, clearances, backups, physical security) and the security of software development and its maintenance (which includes in particular the implementation of good development practices, or the management of vulnerabilities and updates) with a risk analysis specific to AI systems and large training datasets.

This sheet details:

The methodological approach to manage the security of the development of an AI system,
The main security objectives to be pursued when developing an AI system,
the risk factors to be taken into account, some of which are AI-specific,
recommended measures to make the level of residual risk acceptable.

The methodological approach to be adopted

Security objectives related to AI development

The security objectives to be pursued when developing an AI system concern:

the confidentiality of data: in the case of data with restricted access, but also when it is publicly accessible (and their annotation may also include personal data), a lack of security of the dataset can lead to loss of confidentiality of the data. Breaches are also likely to occur when manipulating the model trained on the personal data (and independently of the data itself). Indeed, risks of memorization, reconstruction or membership inference exist and may lead to the disclosure of training data. Such loss of confidentiality may have serious consequences for the data subjects, in particular on their reputation, or lead to malicious practices such as phishing. Taking into account the risks to data privacy is critical, where data is usually in large quantities and can contain personal, highly personal or sensitive data about individuals.
the performance and integrity of the system: although the risks related to poor system performances only materialize in the deployment phase, the majority of measures need to be taken during the development phase. Indeed, a risk to data integrity, unreliability of the data, tools or protocols used, flaws in development, testing, or organisational procedures can lead to serious consequences for users – i.e. organisations operating the AI system in the deployment phase – and thus end-users – i.e. the people to whom the output produced by the system is applied. On the other hand, systems incorporating continuous learning features need to be given increased vigilance on maintaining performance over time and monitoring their impact on people.
the security of the information system at large: the functionality of AI systems generally relies largely on the models used, but the most likely risks today relate to other components of the system (such as backups, interfaces, and communications). It can thus be easier for attackers to exploit a vulnerability of the software to access training data than to carry out an attack by membership inference. The corresponding risks and measures relating to information systems are described in the CNIL’s guide on the security of personal data.

Risk factors for the security of an AI system

Some factors are to be considered when assessing the level of risk for the AI system in question. These factors relate to:

The nature of the data: security measures should be adapted to the seriousness of the consequences that a security breach could have for people;
Example: a person’s postal code is personal data that should be protected, but its disclosure will have fewer consequences than the disclosure of his or her bank details.
Control on the data, models and tools used: AI is an area where the use of open resources, or decentralised protocols such as federated learning, is common, but the reliability of sources is not always verified. The use of these unverified tools increases the likelihood of a security breach;
Example: AI models from collaborative platforms can contain corrupted files or backdoors and introduce a security flaw into the system, or cause unexpected behavior. These risks were highlighted in a study on the safety of the YOLOv7 model by TrailOfBits.
System access modalities (such as restricted access, software as a service or SaaS, open source distribution of the model, web exposure) and the content of system outputs (such as system operation indicators, or confidence scores): access modalities increase the attack surface and can facilitate its implementation. Information on how the system works facilitates attacks such as membership or attribute inference attacks. These modalities and information thus increase the likelihood of a vulnerability;
Example: a language model trained on personal data could have memorised training data and allow the inference of information about people. Its open source publication will increase the possibility of such attacks. This risk should be taken into account in advance.
The intended use context for the AI system: the severity of a vulnerability depends on the criticality of the system in the overall processing, both in its availability and in the role given to its outputs, as well as on the technical maturity of its users. When the AI system interacts with other components of an information system, the quality of its outputs can be critical to security. Thus, the effective and robust integration of the AI system into information systems and its user control can reduce the likelihood of a security breach;
Example: a medical diagnostic assistance system will require more measures to improve its robustness and ownership by caregivers than an augmented reality system used for virtual dress fittings.

Security measures to consider for the development of an AI system

The following measures may be considered to ensure the security of the AI system. They concern the training data, the development and operation of the system, as well as transversal measures.

As mentioned above, the measures listed here are indicative and do not necessarily have to be put in place for all the development of AI systems: it is the responsibility of the controller to determine the measures necessary to limit the risks it has identified, taking into account its context.

Measures on training data

The quality and reliability of an AI system relies mainly on the data used for its training. The measures recommended for them and for their safety are therefore paramount. The following measures are particularly recommended:

Verifying the reliability of training data sources and their annotations: this verification can take place at the stage of collection (or acquisition if the collection is carried out by a third party), and should be continued throughout the life cycle of the AI system when a change can be expected (such as when the data is collected continuously, or where there is a risk to the integrity of the data);
Checking the quality of the training data and their annotations: implementing, or verifying that the collection was carried out under strict conditions, and then studying the quality of the data during collection and then during its life cycle, reduces the risk of data quality loss (in particular where data drift could take place, such as in the case of continuous learning, or where the collection is based on sensors whose performance could be deteriorated over time);
Verifying the integrity of training data and their annotations throughout their life cycle: this verification should be carried out on a regular basis, with a view to detecting the most common flaws such as attempts to poison data. These checks may be implemented by means of iterative learning control techniques, in particular through active learning, as described in the LINC article “Sécurité des systèmes d’IA : les gestes qui sauvent”;
Logging and managing versions of datasets (versioning): tracking and documenting changes to datasets makes it possible to detect an attempt to intrude or modify it, and thus prevent malicious disclosure or poisoning of the data. It is also recommended to keep traces of the version of the dataset on which a model has been trained;
Using fictitious or synthesised data: when the use of real data is not necessary, such as for security testing, integration, or for some audits, resorting to fictitious data limits the risks to individuals. The techniques, advantages and limitations of the synthesis of data are described in the two part LINC article “Synthetic data”;
Encrypting backups and communications: using state-of-the-art cryptography protocols helps to limit the severity of the consequences of an intrusion, especially when web-exposed access points are provided as in the case of software as a service (SaaS), or in the case of federated learning. Recent scientific advances in the field of cryptography can provide strong safeguards for data protection. As indicated in the sheet “Taking data protection into account in the system design choices”, depending on the use cases, it may be relevant, for example, to explore the possibilities offered by secure multiparty computing or homomorphic encryption. The techniques used in this field make it possible to train an AI model on data that remains encrypted throughout the learning process. However, they remain limited in that they cannot be applied to all types of models and because of the additional calculation cost they induce. In addition, some of them, such as homomorphic encryption for training neural networks, are still being studied. As technical developments are frequent in this area, it is advisable to keep an active watch on this subject;
Controlling access to data: where the data is not disseminated as an open source, restrict access to the data and provide authentication procedures for access to data, according to differentiated levels of authorization according to the types of access (in particular user or administrator) prevent certain malicious intrusions, thereby reducing the risk of a loss of integrity, availability or confidentiality;
Anonymising or pseudonymising data: deletion of certain data (e.g. by obfuscation), adding random disturbances (e.g. by differential privacy), generalisation (by data aggregation or decreasing accuracy) are all categories of measures to be considered according to the types of data processed and the context in which the model is used. Such processes may in particular make it possible to visualise or export anonymised extracts from the dataset, or to anonymise query results;
Partitioning sensitive datasets: where the training data contains a significant proportion of special categories of data (e.g. a health database), it is recommended to provide for logical storage partitioning, for example via encryption distinct from that of the system in which it is hosted, as well as dedicated access modalities (project-specific accounts and access rights);
Preventing loss of control over data by organisational measures: monitoring and tracking of exports and data sharing helps to keep track of the data journey, in particular by following the recipients of the data. The tracking of data after its publication or sharing can be ensured through methods such as digital watermarking which will allow a posteriori verification of the origin of the data, and thus to identify the origin of a loss of confidentiality. Similar measures, such as digital watermarking or signature measures, including hashing, may also be applied to trained models in order to ensure their traceability and integrity;

Measures relating to the development of the system

From the upstream reflection of the design to the integration of the model into a system, several safety measures must be taken into account:

Taking data protection into account in the design choices of the system: the methodology to be followed for this preliminary phase is described in the dedicated sheet. The minimisation of data, which is a mandatory approach, resulting from this phase will have beneficial consequences for security by limiting the consequences for people;
Complying with good security practices in the field: using libraries, tools (programming, such as development, development environments, such as version management or versioning tools, or access to data, such as APIs), pre-trained templates and verified configuration files will limit the risk of attack or failure. Special attention should be paid to the presence of backdoors in these files. A check of their reliability as well as compliance with updates is recommended. Finally, several good development practices are recommended, such as:
- the use of secure formats for the backup of configuration files and templates such as safetensors (rather than formats that facilitate code injection attacks, such as YAML or pickle files);
- prohibiting the use of overly permissive functions that could execute undetected malicious code;
- cleaning and revising the computer code;
- paying attention to alerts raised by tools;
- the compilation of the code (rather than the use of tracing or scripting) in order to prevent alteration of the system’s behaviour once deployed;
Using a controlled, reproducible and easily deployable development environment: many tools such as containers or virtual machines make it easier to control the development environment by automating its configuration. In addition, a secure test environment, often referred to as a sandbox, should be used in case of doubt;
Implementing a continuous development and integration procedure: this mandatory procedure, based on a specific environment and comprehensive and robust unit testing and the modification of which is restricted by authentication, in particular for changes to the production code, will limit the risk of the insertion of a loophole into the system;
Building a comprehensive documentation: this documentation for developers and users of the system may include information regarding:
- the design of the system, including the data and models used and the analyses which led to their selection and validation, and the results of those analyses;
- operation of the system throughout its life cycle, performance, analysis of bias and performance obtained, conditions of use and limitations of use, such as cases where performance may be insufficient;
- the material equipment necessary for the use of the system, the expected latency or the maximum capacity for systems accessible in SaaS;
- the protection measures implemented, including the management of access, secrets or encryption measures.
Conducting security audits: these analyses can be conducted internally or by third parties, based on recognised references (see the useful resources section below), and implement the most common attack attempts on the example of red teaming.

The CNIL has also published a GDPR guide for the development team identifying a set of best practices that are applicable more generally.

Measures relating to the operation of the system

Although the risks to the operation of the system relate to the deployment phase (the security of which will be addressed in subsequent publications), it is more appropriate to take into account some of them in the development phase. These measures include:

Informing the user of the limitations of the in-lab system and the intended use contexts: an explanation of the guaranteed or recommended conditions of use and conditions limiting performance or excluded (whether in terms of use or operating conditions) is recommended;
Providing information enabling the user to interpret the results: this information should in particular identify a possible error (such as a confidence score, or information on the logic of the system, such as a saliency map for detection, taking into account the risk of an attacker exploiting that information). A dedicated interface or communication channel can also be used to collect feedback from users, in order to identify and correct system defects observed during the deployment phase. This information may also be collected by providing for a specific logging of the system’s outputs, in particular when it is integrated into an information system;
Providing for the possibility of stopping the system: this will be mandatory for uses that fall within the scope of automated decision-making (Article 22 GDPR) and have certain consequences for individuals;
Checking the outputs of AI: the productions of AI systems, including generative systems, can be problematic, can contain personal data, or can be misleading. Measures such as output filters, Reinforcement Learning from Human Feedback (RLHF) or digital watermarking of generated content (the effectiveness of which is demonstrated in particular on image and video content) reduce their likelihood.

Transversal measures

These measures concern the development of the system as a whole:

Implementing and monitoring a security action plan: including the measures to be taken into account in an action plan will make it possible to verify their implementation and the effective reduction of risks throughout the development of the system, and until the secure deletion of the data;
Setting up a development team with multidisciplinary skills: seeking diversified expertise and advice will help identify security vulnerabilities throughout the data lifecycle. It is recommended:
- To include employees with complementary skills (such as data analysis and engineering, user interface and experience, quality control, IT infrastructure administration, or business need) in the development process;
- To ensure that developers and data analysts are trained on good security practices in data management and model development, and on the most common vulnerabilities;
Managing clearances, tracking accesses, and analyzing traces: even when the data comes from open sources, the annotations affixed to it may constitute information to be protected. Restricting access to datasets and templates to authorised persons according to their profile and maintaining a list of privileged users, logging access, changes, additions and deletions to datasets and analysing these traces (in real time in an automated or, failing that, on a regular basis) reduces the likelihood and severity of an intrusion by preventing or detecting it as soon as possible;