Critical data studies track at the 2016 4S/EASST Annual Conference

28 minute read

Published:

This is an updated schedule of track 113, Critical Data Studies, at the 2016 Annual Meeting of the Society of the Social Study of Science (4S) and the European Association for the Study of Science and Technology (EASST). Please contact Stuart Geiger if you have any questions.

Convenors:

  • Charlotte Cabasse-Mazel (University of California, Berkeley)
  • Stuart Geiger (UC-Berkeley)
  • Laura Noren (New York University)
  • Brittany Fiore‐Gartland (University of Washington)
  • Gretchen Gano (University of California Berkeley)
  • Massimo Mazzotti (University of California, Berkeley)

Description

This track brings together Science and Technology Studies scholars who are investigating data-driven techniques in academic research and analytic industries. Computational methods with large datasets are becoming more common across disciplines in academia (including the social sciences) and analytic industries. However, the sprawling and ambiguous boundaries of “big data” makes it difficult to research. The papers in this track investigate the relationship between theories, instruments, methods, practices, and infrastructures in data science research. How are such practices transforming the processes of knowledge creation and validation, as well as our understanding of empiricism and the scientific method?

Many of the papers in this track are case studies that focus on one particular fieldsite where data-intensive research is taking place. Other papers explore connections between emerging theory, machinery, methods, and practices. These papers examine a wide variety of data-collection instruments, software, inscription devices, packages, algorithms, disciplines, institutions, and many focus on how a broad sociotechnical system is used to produce, analyze, share, and validate knowledge. In looking at the way these knowledge forms are objectified, classified, imagined, and contested, this track looks critically on the maturing practices of quantification and their historical, social, cultural, political, ideological, economical, scientific, and ecological impacts.

When we say “critical,” we are drawing on a long lineage from Immanuel Kant to critical theory, investigating the conditions in which thinking and reasoning takes place. To take a critical approach to a field like data science is not to universally disapprove or reject it; it is more about looking at a broad range of social factors and impacts in and around data science. The papers in this track ask questions such as: How are new practices and approaches changing the way science is done? What does the organization of “big science” look like in an era of “big data”? What are the historical antecedents of today’s cutting-edge technologies and practices? How are institutions like hospitals, governments, schools, and cultural industries using data-driven practices to change how they operate? How are labor and management practices changing as data-intensive research is increasingly a standard part of major organizations? What are the conditions in which people are able to sufficiently understand and contest someone else’s data analysis? What happens when data analysts and data scientists are put in the position of keeping their colleagues accountable to various metrics, discovering what music genres are ‘hot’, or evaluating the impacts of public policy proposals? And how ought we change our own concepts, theories, approaches, and methods in Science and Technology Studies given these changes we are witnessing?

Schedule (without abstracts)

T113.1: Data and scientific practice

Sat Sept 3rd, 09:00-10:30am; Room 116

Chairs: Charlotte Cabasse-Mazel and Stuart Geiger

  • Scientific Open Data: Questions of Labor and Public Benefit
    • Irene Pasquetto (UCLA)
    • Ashley E. Sands (UCLA)
  • Condensing Data into Images, Uncovering ‘the Higgs’
    • Martina Merz (Alpen‐Adria‐University Klagenfurt / Wien / Graz)
  • Data Pedagogy: Learning to Make Sense of Algorithmic Numbers
    • Samir Passi (Cornell University)
  • Big Data or Big Codata? Flows in Historical and Contemporary Data Practices
    • Michael Castelle (University of Chicago)

T113.2: Data and expertise

Sat Sept 3rd, 11:00am-12:30pm; Room 116

Chair: Nick Seaver

  • It’s the context, stupid: Reproducibility as a scientific communication problem [note: previously scheduled in 9am panel]
    • Brittany Fiore‐Gartland (University of Washington)
    • Anissa Tanweer (University of Washington)
  • Emerging Practices of Data‐Driven Accountability in Healthcare: Individual Attribution of C-Sections
    • Kathleen Pine (ASU)
  • The (in)credibility of data science methods to non‐experts
    • Daan Kolkman (University of Surrey)
  • Big data and the mythology of algorithms
    • Howard Rosenbaum (Indiana University)

T113.3: Learning, pedagogy, and practice

Sat Sept 3rd, 14:00-15:30; Room 116

Chair: TBD

  • Infrastructuring data analysis in Digital methods with digital data and tools
    • Klara Benda (IT University of Copenhagen)
  • “An afternoon hack” Enabling data driven scientific computing in the open
    • Charlotte Mazel‐Cabasse (University of California, Berkeley)
  • Playing with educational data: the Learning Analytics Report Card (LARC)
    • Jeremy Knox (The University of Edinburgh)
  • Data science / science studies
    • Cathryn Carson (University of California, Berkeley)

T113.4: Data, theory, and looking forward

Sat Sept 3rd, 16:00-17:30; Room 116

Chairs: Stuart Geiger and Charlotte Cabasse-Mazel

  • Critical Information Practice
    • Yanni Loukissas (Georgia Tech); Matt Ratto (University of Toronto); Gabby Resch (University of Toronto)
  • Actor‐Network VS Network Analysis VS Digital Networks Are We Talking About the Same Networks?
    • Tommaso Venturini (King’s College); Anders Kristian Munk (University of Aalborg); Mathieu Jacomy (Sciences Po)
  • The Navigators
    • Nicholas Seaver (Tufts University)
  • Broad discussion on lessons learned and next steps
    • Everyone!

Schedule with abstracts

T113.1: Data and scientific practice

Sat Sept 3rd, 09:00-10:30am; Room 116

Chairs: Charlotte Cabasse-Mazel and Stuart Geiger

  • Scientific Open Data: Questions of Labor and Public Benefit
    • Irene Pasquetto (UCLA) and Ashley E. Sands (UCLA)

      Openness of publicly funded scientific data is policy enforced, and its benefits are normally taken for granted: increasing scientific trustworthiness, enabling replication and reproducibility, and preventing duplication of efforts.

      However, when public data are made open, a series of social costs arise. In some fields, such as biomedicine, scientific data have great economic value, and new business models based on the reuse of public data are emerging. In this session we critically analyze the relationship between the potential benefits and social costs of opening scientific data, which translate in changes in the workforce and challenges for current science funding models. We conducted two case studies, one medium-scale collaboration in biomedicine (FaceBase II Consortium) and one large-scale collaboration in astronomy (Sloan Digital Sky Server). We have conducted ethnographic participant observations and semi-structured interviews of SDSS since 2010 and FaceBase since 2015. Analyzing two domains sharpened our focus on each by enabling comparisons and contrasts. The discussion is also based on extensive document analysis.

      Our goal is to unpack open data rhetoric by highlighting its relation to the emergence of new mixed private and public funding models for science and changes in workforce dynamics. We show (1) how open data are made open “in practice” and by whom; (2) how public data are reused in private industry; (3) who benefits from their reuse and how. This paper contributes to the Critical Data Studies field for its analysis of the connections between big data approaches to science, social power structures, and the policy rhetoric of open data.

  • Condensing Data into Images, Uncovering ‘the Higgs’
    • Martina Merz (Alpen‐Adria‐University Klagenfurt / Wien / Graz)

      Contemporary experimental particle physics is amongst the most data-intensive sciences and thus provides an interesting test case for critical data studies. Approximately 30 petabytes of data produced at CERN’s Large Hadron Collider (LHC) annually need to be controlled and processed in multiple ways before physicists are ready to claim novel results: data are filtered, stored, distributed, analyzed, reconstructed, synthesized, etc. involving collaborations of 3000 scientists and heavily distributed work. Adopting a science-as-practice approach, this paper focuses on the associated challenges of data analysis using as an example the recent Higgs search at the LHC, based on a long-term qualitative study. In particle physics, data analysis relies on statistical reasoning. Physicists thus use a variety of standard and advanced statistical tools and procedures.

      I will emphasize that, and show how, the computational practice of data analysis is inextricably tied to the production and use of specific visual representations. These “statistical images” constitute “the Higgs” (or its absence) in the sense of making it “observable” and intelligible. The paper puts forward two main theses: (1) that images are constitutive of the prime analysis results due to the direct visual grasp of the data that they afford within large-scale collaborations and (2) that data analysis decisively relies on the computational and pictorial juxtaposition of “real” and “simulated data”, based on multiple models of different kind. In data-intensive sciences such as particle physics images thus become essential sites for evidential exploration and debate through procedures of black-boxing, synthesis, and contrasting.

  • Data Pedagogy: Learning to Make Sense of Algorithmic Numbers
    • Samir Passi (Cornell University)

      This paper conceptualizes data analytics as a situated process: one that necessitates iterative decisions to adapt prior knowledge, code, contingent data, and algorithmic output to each other. Learning to master such forms of iteration, adaption, and discretion then is an integral part of being a data analyst.

      In this paper, I focus on the pedagogy of data analytics to demonstrate how students learn to make sense of algorithmic output in relation to underlying data and algorithmic code. While data analysis is often understood as the work of mechanized tools, I focus instead on the discretionary human work required to organize and interpret the world algorithmically, explicitly drawing out the relation between human and machine understanding of numbers especially in the ways in which this relationship is enacted through class exercises, examples, and demonstrations. In a learning environment, there is an explicit focus on demonstrating established methods, tools, and theories to students. Focusing on data analytic pedagogy, then, helps us to not only better understand foundational data analytic practices, but also explore how and why certain forms of standardized data sensemaking processes come to be.

      To make my argument, I draw on two sets of empirics: participant-observation of (a) two semester long senior/graduate-level data analytic courses, and (b) a series of three data analytic training workshops taught/organized at a major U.S. East Coast university. Conceptually, this paper draws on research in STS on social studies of algorithms,sociology of scientific knowledge, sociology of numbers, and professional vision.

  • Big Data or Big Codata? Flows in Historical and Contemporary Data Practices
    • Michael Castelle (University of Chicago)

      Presently existing theorizations of “big data” practices conflate observed aspects of both “volume” and “velocity” (Kitchin 2014). The practical management of these two qualities, however, have a comparably disjunct, if interwoven, computational history: on one side, the use of large (relational and non-relational) database systems, and on the other, the handling of real-time flows (the world of dataflow languages, stream and event processing, and message queues). While the commercial data practices of the late 20th century were predicated on an assumption of comparably static archival (the site-specific “mining” of data “warehouses”), much of the novelty and value of contemporary “big data” sociotechnics is in fact predicated on the harnessing/processing vast flows of events generated by the conceptually-centralized/ physically-distributed datastores of Google, Facebook, LinkedIn, etc.

      These latter processes—which I refer to as “big codata”—have their origins in IBM’s mainframe updating of teletype message switching, were adapted for Wall Street trading firms in the 1980s, and have a contemporary manifestation in distributed “streaming” databases and message queues like Kafka and StormMQ, in which one differentially “subscribes” to brokered event streams for real-time visualization and analysis. Through ethnographic interviews with data science practitioners in various commercial startup and academic environments, I will contrast these technologies and techniques with those of traditional social-scientific methods—which may begin with empirically observed and transcribed “codata”, but typically subject the resultant inert “dataset” to a far less real-time sequence of material and textual transformations (Latour 1987).

T113.2: Data and expertise

Sat Sept 3rd, 11:00am-12:30pm; Room 116

Chair: Nick Seaver

  • It’s the context, stupid: Reproducibility as a scientific communication problem [note: previously scheduled in 9am panel]
    • Brittany Fiore‐Gartland (University of Washington) and Anissa Tanweer (University of Washington)

      Reproducibility has long been considered integral to scientific research and increasingly must be adapted to highly computational, data-intensive practices. Central to reproducibility is the sharing of data across varied settings. Many scholars note that reproducible research necessitates thorough documentation and communication of the context in which scientific data and code are generated and transformed. Yet there has been some pushback against the generic use of the term context (Nicolini, 2012); for, as Seaver puts it, “the nice thing about context is everyone has it” (2015). Dourish (2004) articulates two approaches to context: representational and interactional. The representational perspective sees context as stable, delineable information; in terms of reproducibility, this is the sort of context that can be captured and communicated with metadata, such as location, time, and size. An interactional perspective, on the other hand, views context not as static information but as a relational and dynamic property arising from activity; something that is much harder to capture and convey using metadata or any other technological fix.

      In two years of ethnographic research with scientists negotiating reproducibility in their own data-intensive work, we found “context” being marshalled in multiple ways to mean different things within scientific practice and discourses of reproducibility advocates. Finding gaps in perspectives on context across stakeholders, we reframe reproducibility as a scientific communication problem, a move that recognizes the limits of representational context for the purpose of reproducible research and underscores the importance of developing cultures and practices for conveying interactional context.

  • Emerging Practices of Data‐Driven Accountability in Healthcare: Individual Attribution of C-Sections
    • Kathleen Pine (ASU)

      This paper examines the implementation and consequences of data science in a specific domain: evaluation and regulation of healthcare delivery. Recent iterations of data-driven management expand the dimensions along which organizations are evaluated and utilize a growing array of non-financial measures to audit performance (i.e. adherence to best practices). Abstract values such as “quality” and “effectiveness” are operationalized through design and implementation of certain performance measurements—it is not just what outcomes that demonstrate the quality of service provision, but the particular practices engaged during service delivery.

      Recent years have seen the growth of a controversial new form of data-driven accountability in healthcare: application of performance measurements to the work of individual clinicians. Fine-grained performance measurements of individual providers were once far too resource intensive to undertake, but expanded digital capacities have made provider-level analyses feasible. Such measurements are being deployed as part of larger efforts to move from “volume-based” to “value- based” or “pay for performance” payment models.

      Evaluating individual providers, and deploying pay for performance at the individual (rather than the organizational) level is a controversial idea. Critics argue that the measurements reflect a tiny sliver of any clinician’s “quality,” and that such algorithmic management schemes will lead professionals to focus on only a small number of measured activities. Despite these and other concerns, such measurements are on the horizon. I will discuss early ethnographic findings on implementation of provider-level cesarean section measurements, describing tensions between professional discretion and accountability and rising stakes of data quality in healthcare.

  • The (in)credibility of data science methods to non‐experts
    • Daan Kolkman (University of Surrey)

      The rapid development and dissemination of data science methods, tools and libraries, allows for the development of ever more intricate models and algorithms. Such digital objects are simultaneously the vehicle and outcome of quantification practices and may embody a particular world-view with associated norms and values. More often than not, a set of specific technical skills is required to create, use or interpret these digital objects. As a result, the mechanics of the model or algorithm may be virtually incomprehensible to non-experts.

      This is of consequence for the process of knowledge creation because it may introduce power asymmetries and because successful implementation of models and algorithms in an organizational context requires that all those involved have faith in the model or algorithm. This paper contributes to the sociology of quantification by exploring the practices through which non-experts ascertain the quality and credibility of digital objects as myths or fictions. By considering digital objects as myths or fictions, the codified nature of these objects comes into focus.

      This permits the illustration of the practices through which experts and non-experts develop, maintain, question or contest such myths. The paper draws on fieldwork conducted in government and analytic industry in the form of interviews, observations and documents to illustrate and contrast the practices which are available to non-experts and experts in bringing about the credibility or incredibility of such myths or fictions. It presents a detailed account of how digital objects become embedded in the organisations that use them.

  • Big data and the mythology of algorithms
    • Howard Rosenbaum (Indiana University)

      There are no big data without algorithms. Algorithms are sociotechnical constructions and reflect the social, cultural, technical and other values embedded in their contexts of design, development, and use. The utopian “mythology” (boyd and Crawford 2011) about big data rests, in part, on the depiction of algorithms as objective and unbiased tools operating quietly in the background. As reliable technical participants in the routines of life, their impartiality provides legitimacy for the results of their work. This becomes more significant as algorithms become more deeply entangled in our online and offline lives. where we generate the data they analyze. They create “algorithmic identities,” profiles of us based on our digital traces that are “shadow bodes,” emphasizing some aspects and ignoring others (Gillespie 2012). They are powerful tools that use these identities to dynamically shape the information flows on which we depend in response to our actions and decisions made by their owners

      Because this perspective tends to dominate the discourse about big data, thereby shaping public and scientific understandings of the phenomenon, it is necessary to subject it to critical review as an instance if critical data studies. This paper interrogates algorithms as human constructions and products of choices that have a range of consequences for their users and owners; issues explored include: The epistemological implications of big data algorithms; The impacts of these algorithms in our social and organizational lives; The extent to which they encode power ways in which this power is exercised; The possibility of algorithmic accountability

T113.3: Learning, pedagogy, and practice

Sat Sept 3rd, 14:00-15:30; Room 116

Chair: TBD

  • Infrastructuring data analysis in Digital methods with digital data and tools
    • Klara Benda (IT University of Copenhagen)

      The Digital methods approach seeks the strategic appropriation of digital resources on the web for social research. I apply the grounded theory to theorize how data practices in Digital methods are entangled with the web as a socio-technical phenomenon. My account draws on public sources of Digital methods and ethnographic research of semester-long student projects based on observations, interviews and project reports. It is inspired by Hutchin’s call for understanding how people “create their cognitive powers by creating the environments in which they exercise those powers”. The analysis draws on the lens of infrastructuring to show that making environments for creativity in Digital methods is a distributed process, which takes place on local and community levels with distinct temporalities. Digital methods is predicated on creating its local knowledge space for social analysis by pulling together digital data and tools from the web, and this quick local infrastructuring is supported by layers of slower community infrastructures which mediate the digital resources of the web for a Digital methods style analysis by means of translation and curation.

      Overall, the socially distributed, infrastructural style of data practice is made possible by the web as a socio-technical phenomenon predicated on openness, sharing and reuse. On the web, new digital resources are readily available to be incorporated into the local knowledge space, making way for an iterative, exploratory style of analysis, which oscillates between infrastructuring and inhabiting a local knowledge space. The web also serves as a socio-technical platform for community practices of infrastructuring.

  • “An afternoon hack” Enabling data driven scientific computing in the open
    • Charlotte Mazel‐Cabasse (University of California, Berkeley)

      The scientific computing, or e-science, has enabled the development of large data driven scientific initiatives. A significant part of these projects relies on the software infrastructures and tool stacks that make possible to collect, clean and compute very large data sets.

      Based on an anthropological research among a community of open developers and/or scientists contributing to SciPy, the open source Python library used by scientists to enable the development of technologies for big data, the research focuses on the socio-technical conditions of the development of free and reproducible computational scientific tools and the system of values that supports it.

      Entering the SciPy community for the first time is entering a community of learners. People who are convinced that for each problem there is a function (and if there is not, one should actually create one), who think that everybody can (and probably should) code, who have been living between at least two worlds (sometime more) for a long time: academia and the open software community, and for some, different versions of the corporate world.

      Looking at the personal trajectories of these scientists that turned open software developers, this paper will investigate the way in which a relatively small group of dedicated people has been advancing a new agenda for science, defined as open and reproducible, through carefully designed data infrastructures, workflows and pipelines.

  • Playing with educational data: the Learning Analytics Report Card (LARC)
    • Jeremy Knox (The University of Edinburgh)

      Education has become an important site for computational data analysis, and the burgeoning field of ‘learning analytics’ is gaining significant traction, motivated by the proliferation of online courses and large enrolment numbers. However, while this ‘big data’ and its analysis continue to be hyped across academic, government and corporate research agendas, critical and interdisciplinary approaches to educational data analysis are in short supply. Driven by narrow disciplinary areas in computer science, learning analytics is not only ‘blackboxed’, - in other words a propensity to ‘focus only on its inputs and outputs and not on its internal complexity’ (Latour 1999, p304), but also abstracted and distanced from the activities of education itself. This methodological estrangement may be particularly problematic in an educational context where the fostering of critical awareness is valued.

      The first half of this paper will describe three ways in which we can understand this ‘distancing’, and how it is implicated in enactments of power within the material conditions of education: the institutional surveilling of student activity; the mythologizing of empirical objectivity; and the privileging of prediction. The second half of the paper will describe the development of a small scale and experimental learning analytics project undertaken at the University of Edinburgh that sought to explore some of these issues. Entitled the Learning Analytics Report Card (LARC), the project investigated playful ways of offering student choice in the analytics process, and the fostering of critical awareness of issues related to data analysis in education.

  • Data science / science studies
    • Cathryn Carson (University of California, Berkeley)

      Inside universities, data science is practically co-located with science studies. How can we use that proximity to shape how data science gets done? Drawing on theorizations of collaboration as a research strategy, embedded ethnography, critical technical practice, and design intervention, this paper reports on experiments in data science research and organizational/strategic design. It presents intellectual tools for working on data science (conceptual distinctions such as data science as specialty, platform, and surround; temporal narratives that capture practitioners’ conjoint sense of prospect and dread) and explores modes of using these tools in ways that get uptake and do work. Finally, it draws out possible consequences of the by now sometimes well-anchored situation of science studies/STS inside universities, including having science studies scholars in positions of institutional leverage.

T113.4: Data, theory, and looking forward

Sat Sept 3rd, 16:00-17:30; Room 116

Chairs: Stuart Geiger and Charlotte Cabasse-Mazel

  • Critical Information Practice
    • Yanni Loukissas (Georgia Tech); Matt Ratto (University of Toronto); Gabby Resch (University of Toronto)

      Big Data has been described as a death knell for the scientific method (Anderson, 2008), a catalyst for new epistemologies (Floridi, 2012), a harbinger for the death of politics (Morozov, 2014), and “a disruptor that waits for no one” (Maycotte, 2014). Contending with Big Data, as well as the platitudes that surround it, necessitates new kind of data literacy. Current pedagogical models, exemplified by data science and data visualization, too often introduce students to data through sanitized examples, black-boxed algorithms, and standardized templates for graphical display (Tufte, 2001; Fry, 2008; Heer, 2011). Meanwhile, these models overlook the social and political implications of data in areas like healthcare, journalism and city governance. Scholarship in critical data studies (boyd and Crawford, 2012; Dalton and Thatcher, 2014) and critical visualization (Hall, 2008; Drucker 2011) has established the necessary foundations for an alternative to purely technical approaches to data literacy.

      In this paper, we explain a pedagogical model grounded in interpretive learning experiences: collecting data from messy sources, processing data with an eye towards what algorithms occlude, and presenting data through creative forms like narrative and sculpture. Building on earlier work by the authors in the area of ‘critical making’ (Ratto), this approach—which we call critical information practice—offers a counterpoint for students seeking reflexive and materially-engaged modes of learning about the phenomenon of Big Data.

  • Actor‐Network VS Network Analysis VS Digital Networks Are We Talking About the Same Networks?
    • Tommaso Venturini (King’s College); Anders Kristian Munk (University of Aalborg); Mathieu Jacomy (Sciences Po)

      In the last few decades, the idea of ‘network’ has slowly but steadily colonized broad strands of STS research. This colonization started with the advent of actor-network theory, which provided a convenient set of notions to describe the construction of socio-technical phenomena. Then came network analysis, and scholars who imported in the STS the techniques of investigation and visualization developed in the tradition of social network analysis and scientometrics. Finally, with the increasing ‘computerization’ of STS, scholars turned their attention to digital networks a way of tracing collective life.

      Many researchers have more or less explicitly tried to link these three movements in one coherent set of digital methods for STS, betting on the idea that actor-network theory can be operationalized through network analysis thanks to the data provided by digital networks. Yet, to be honest, little proves the continuity among these three objects besides the homonymy of the word ‘network’. Are we sure that we are talking about the same networks?

  • The Navigators
    • Nicholas Seaver (Tufts University)

      Data scientists summon space into existence. Through gestures in the air, visualizations on screen, and loops in code, they locate data in spaces amenable to navigation. Typically, these spaces embody a Euro-American common sense: things near each other are similar to each other. This principle is evident in the work of algorithmic recommendation, for instance, where users are imagined to navigate a landscape composed of items arranged by similarity. If you like this hill, you might like the adjacent valley. Yet the topographies conceived by data scientists also pose challenges to this spatial common sense. They are constantly reconfigured by new data and the whims of their minders, subject to dramatic tectonic shifts, and they can be more than 3-dimensional. In highly dimensional spaces, data scientists encounter the “curse of dimensionality,” by which human intuitions about distance fail as dimensions accumulate. Work in critical data studies has conventionally focused on the biases that shape these spaces.

      In this paper, I propose that critical data studies should not only attend to how representative data spaces are, but also to the techniques data scientists use to navigate them. Drawing on fieldwork with the developers of algorithmic music recommender systems, I describe a set of navigational practices that negotiate with the shifting, biased topographies of data space. Recalling a classic archetype from STS and anthropology, these practices complicate the image of the data scientist as rationalizing, European map-maker, resembling more closely the situated interactions of the ideal-typical Micronesian navigator.