In further developing the Swiss-AL speech data platform into an open-research-data (ORD) resource, the ZHAW Digital Discourse Lab is doing something better that AI-based text generation systems like Chat GPT are not yet paying attention to – and adding value for researchers in the applied sciences.
Guest post by Daniela Baumann, Institute for Applied Media Studies.
Cover photo by Ricardo Farina Mora, multimedia specialist ZHAW digital.
AI-based text generation systems like Chat-GPT show what large amounts of text data can be used for. From an Open Science perspective, however, much criticism can be levelled at such systems: legal and ethical problems are ignored, the compilation of the data used is non-transparent and by no means representative. Evaluation and reproduction of the models is therefore hardly possible.
Largest language collection in Switzerland
With Swiss-AL, a platform for language data for applied research, the Digital Discourse Lab of the Department of Applied Linguistics takes a different approach. With more than 4.5 million texts, the platform contains the largest corpus family in Switzerland (DE, FR, IT, RM). Included are texts from central actors of public communication in Switzerland (e.g. journalistic media from all over Switzerland, federal and cantonal authorities, professional associations, universities, NGOs), which serve researchers as a data basis for the study of current social discourses. By documenting the data processing and making the data available, Swiss-AL contributes to the current Open Science Transformation.