Open Collaborative Data – using OSS principles to share data in SW engineering
Reliance on data for software systems engineering is increasing, e.g., to train machine learning applications. As a consequence, we foresee increasing costs for data collection and maintenance, leading to the risk of development budgets eaten up by commodity features, leaving little resources for differentiation and innovation. We therefore propose Open Collaborative Data (OCD) - a concept analogous to Open Source Software (OSS) - as a means to share commodity data. In contrast to Open Data (OD), which e.g., governmental agencies provide to catalyze innovation, OCD is shared in open collaboration between commercial organizations, similar to OSS. To achieve this, there is a need for technical infrastructure (e.g., tools for version and access control), licence models, and governance models, all of which have to be tailored for data. However, as data may be sensitive for privacy, anonymization and obfuscation of data is also a research challenge. In this paper, we define the concept of Open Collaborative Data, demonstrate it by the example of map data and image recognition examples, and outline a research agenda for OCD in software engineering as a basis for more efficient evolution of software systems.