Springer Nature, a leading scientific publisher, has taken swift action to address a significant ethical concern by retracting and removing nearly 40 publications. These publications had utilized a dataset that raised eyebrows due to its controversial nature. The dataset, created by retired engineer Gerald Piosenka, contained photos of children's faces, with half labeled as autistic and the other half as non-autistic. However, the ethical implications of this dataset are far from simple. The issue lies in the fact that the children's faces were downloaded from various websites without their or their families' consent, raising serious privacy concerns. This is a critical point that has sparked debate and raised questions about the responsibility of researchers and the ethical boundaries of data collection.
The dataset's reliability is also in question. Dorothy Bishop, an emeritus professor of developmental neuropsychology, expressed her shock at the dataset's creation, stating, 'This is absolute bonkers.' She highlights the challenge of confirming the autism status of the children due to the lack of identification and the variability in the dataset. This variability, including different angles, lighting, and expressions, adds noise to the data, making it difficult to draw meaningful conclusions.
The controversy extends further as Kaggle, the platform where the dataset was hosted, removed it due to violations of its terms of service. However, Piosenka later re-uploaded the files to Google Drive, and the Springer Nature team discovered similar datasets posted by other Kaggle users. This raises concerns about the potential misuse of the dataset and the need for stricter data sharing practices.
The impact of this issue is far-reaching. Springer Nature plans to retract 38 publications, including papers, conference proceedings, and book chapters, and remove all but one. This proactive approach is praised by experts like Bishop, who acknowledges the publisher's swift action. However, the search for other publications using the dataset continues, with at least 90 publications identified so far, some of which are from renowned institutions like the Institute of Electrical and Electronics Engineers (IEEE).
The controversy also extends to the research community. While some argue that facial features could potentially be linked to autism, experts like Bishop and Alvares emphasize that facial features alone cannot diagnose autism. They stress the importance of clinical assessments and controlled samples for accurate research. The debate surrounding this dataset highlights the need for ethical considerations and responsible data handling practices in scientific research.