Developing and training an algorithm
Implementing best practices during this crucial phase.
The AI system provider must face a series of checks and precautions in order to guarantee the quality of the system. However, the following questions are also of interest to users of the system (if they are not the provider), who may be liable should the processing carried out not comply with the GDPR. They can then check the extent to which the provider has taken data protection issues into account in the system design.
Designing and developing a reliable algorithm
When designing the processing, particular attention should be paid to the choice of algorithm, tools and development infrastructure in order to achieve reliable and robust processing.
The following questions may help the data controller to assess whether a balance is maintained between the complexity of the solutions chosen and the loss of explainability.
What type of algorithm is used and how does it work (supervised, unsupervised, continuous or federated learning, learning by reinforcement, etc.)?
Why was this algorithm chosen?
How was it implemented (source of the code, libraries used, etc.)?
Has the algorithm used been tested by independent third parties?
Has a literature review been carried out?
Has a comparison with similar systems been made?
Which criteria were used?
Are third-party tools used?
Are they considered reliable and proven?
Has a search for possible flaws in the tool documentation and among the developer community raised any points of interest?
Is any monitoring scheduled for tool updates?
Is the AI algorithm available as open source?
Have the algorithms been sufficiently tested by third parties?
If so, by whom and how?
Are the algorithms state of the art?
Is community feedback on the algorithms encouraged and taken into account?
Applying a meticulous training protocol
By establishing a protocol for training the AI system, the provider will be able to challenge the choice of methods for validating the performance and fairness of the algorithm, while incorporating validation tests at the most critical stages. Since they provide the first window of opportunity to observe the algorithm, the training and validation metrics must be chosen carefully: in a seropositivity test for a contagious disease, it is more important to limit the false negative rate than the false positive rate, for example.
Which learning strategies are used?
What is the distribution between training, test and validation datasets?
Are strategies such as cross-validation used?
Is the quality of the AI system's output considered sufficient?
Which metrics are used?
Do they allow satisfactory measurement of the performance with due regard to the consequences for the data subjects?
Have cases of error been investigated?
Was a correlation sought with any of the variables in the data?
Has a correlation been sought with the value taken by any of the variables in the data (e.g. is the error rate greater for people of a particular gender?) ?
Are boundary situations where system outputs are not sufficiently reliable clearly identified?
Are safety mechanisms added to handle these situations (by systematically handing control to a human operator, for example)?
Continuous learning scenario: are measures taken upstream to avoid deteriorated performances, model drift or attacks aimed at influencing the results of the algorithm (e.g. online chatbots becoming “racist”)?
Which measures?
Checking the quality of the system in a controlled environment
The tests on the algorithm and the system as a whole must be carried out under representative conditions allowing for a comprehensive validation of the processing. The needs of the users and their expertise should be taken into account as much as possible in this phase.
Has the processing been validated through experimentation?
Has the user of the AI system (business team, supplier's customers, etc.) been included in the experimentation process?
Has their opinion been taken into account in the design of the tool to best fit their needs and to correct any flaws they may have identified?
Is the context in which the AI system exists (number of variables to be considered, difficulty in assessing the representativeness of the data, etc.) particularly complex?
Has the algorithm in question previously been used in a similar context?
In which environment was the experiment carried out (controlled/uncontrolled? closed/open? on simulated/actual cases of use?)?
Do the conditions sufficiently represent the actual conditions that will be met during the deployment?
Have appropriate precautions been taken, such as systematic control by a human operator which could then be reduced in the production phase?
Are the metrics used to validate the experiment appropriate and sufficient?
Has an exhaustive and objective assessment of the experiment been carried out?
No information is collected by the CNIL.
Would you like to contribute?
Write to ia[@]cnil.fr