The legal basis of legitimate interests: Focus sheet on open source models
In view of their potential benefits, open source practices should be considered when assessing legitimate interests of an AI system provider. However, it is necessary to adopt safeguards to limit the harm they can cause to individuals.
Open source in AI
Given the absence of a commonly accepted definition for open source models, the CNIL observes that, in this field, it encompasses a variety of practices. While the publication of model parameters is a minimum condition to talk about open source, other practices can also be beneficial in many cases. These practices can be categorized as follows:
- Transparency in model development, including the publication of:
- the documentation of the procedure followed to develop the model (including the data collection phase for training), possibly in the form of a scientific publication,
- the code used to train the model,
- the training data.
- Transparency of the model obtained, including the publication of:
- the model documentation, detailing for example its architecture, performance, and limitations, possibly in the form of a descriptive sheet (often called a model card);
- the model weights.
- Access to the model, including publication of:
- a library allowing its use,
- an API for its use,
- the code to use the model,
- the model under a licence allowing its use, modification, or redistribution.
Some of these practices, such as the publication of the dataset used for the training, may nevertheless entail risks for data subjects, and therefore cannot be recommended in all cases. Open source practices may have benefits, even if not all of the elements listed above are disclosed. The publication of certain elements may, however, be necessary to ensure significant gains in terms of transparency or peer review. In these cases, additional measures are recommended to limit the impact on individuals’ rights.