Are social scientists hired as data scientists?

Data science in the social sciences

More and more aspects of our daily life are leaving digital traces. This applies to the working world shaped by digitization as well as to private internet and media consumption, shopping behavior and interpersonal communication on digital platforms. The resulting data sets provide an increasingly precise “digital shadow” of human behavior and social interactions.

In the important ongoing debate about the opportunities and risks that result from the application of data science methods to such data sets, the balance between ethical and data protection requirements is usually in the foreground. In contrast, the enormous potential for the further development of empirical social research in the 21st century often takes a back seat.

Data science methods offer new possibilities for the exploration and analysis of large data sets on human behavior. At the same time, modern computer-aided simulation and modeling methods promise insights into the mechanisms underlying collective social phenomena, especially when compared with empirical data. The combination of these approaches offers the potential to fundamentally improve our understanding of social phenomena.

The investigation of social science issues using data science methods is the focus of an interdisciplinary research area, which is often summarized under the term "Computational Social Science". Some social scientists equate the resulting possibilities with a methodological revolution. At the same time, however, new challenges arise, not only for the social sciences, but also for computer science. Measured against the commitment of universities and funding institutions to the importance of interdisciplinarity, real "inter" -disciplinary work at the interface between computer science and social sciences, i.e. work in which social science theories are tested and further developed using computer science methods, is unfortunately still the exception. On the other hand, there is an increasing number of "data-driven" studies that show correlations and patterns in large amounts of data, for example from social media, without being able to provide information about the underlying mechanisms of action.

Indeed, many of these studies give the impression that the focus is on analyzing the data rather than answering a scientific question. So they can only be a first step towards a “theory-driven” and “data-driven” research discipline that deserves the name Computational Social “Science”. The explanation of social science phenomena and the demonstration of causal mechanisms require more than just data analysis skills.

Scientific theoretical foundations are just as important as domain knowledge of sociological theories and methods. In addition, competencies in the modeling of collective phenomena in complex systems of interacting agents are required, which are of great importance in statistical and interdisciplinary physics.

In order to make meaningful use of the possibilities that arise thanks to new data sources and data science methods, computer science curricula must therefore be further developed as well as social science courses. The aim is to enable students to answer the following questions, among others: How can social science theories and hypotheses be checked using data science? How meaningful and representative are the results of studies based, for example, on publicly available data from social media? How can sample biases in such data be recognized, quantified and, if necessary, corrected? What specific challenges, e.g. for machine learning methods or social network analysis, result from incorrect, incomplete and temporally resolved data sets? And what role does data science play in theory formation?

So what is Computational Social Science? An "auxiliary science" that joins the long list of successful computational sciences and hyphenated informatics such as bio, business, medical, geo, environmental, media and agricultural informatics? I am convinced that Computational Social Science has a special place in this list. In addition to the undeniable potential for the social sciences, the convergence of social and technical systems also results in new problem areas for computer science. Because the feedback of technical and social aspects in IT systems leads to a complexity that is difficult to control with existing approaches to system design.

Important questions arise, some of which coincide with the Grand Challenges formulated by the Gesellschaft für Informatik and whose answers require quantitative modeling of social aspects. What new types of systemic risks arise in global socio-technical systems? Which social science findings have to be taken into account when designing resilient technical systems? To what extent do the mechanisms of IT systems (e.g. intelligent recommendation systems, reputation mechanisms, etc.) influence social phenomena such as polarization or discrimination? Which data analysis and modeling methods can we use to quantify, predict or even influence such phenomena? And what new approaches are there for the analysis and management of human aspects in collaborative software development?

The application of computational social science methods to large data sets of socio-technical systems promises answers to these important questions. For this reason, Computational Social Science is of great importance not only for the social sciences, but also for computer science.