Skip to content

Composition Forum 50, Fall 2022

Review of Amanda Licastro and Benjamin Miller’s Composition and Big Data

John J. Silvestro

Licastro, Amanda, and Benjamin Miller, editors. Composition and Big Data. U of Pittsburgh P, 2021.

Composition and Big Data, edited by Amanda Licastro and Benjamin Miller, endeavors to accelerate composition’s engagement with “big data.” Big data is the catch-all term used to describe the entwined processes through which almost all digital actions get turned into data and the proliferation of tools for computationally analyzing that digital data, tools that were exclusively the domain of corporations but that have recently become accessible to individuals (Licastro and Miller 3-4). Over the past decade, several compositionist researchers have done big data work, such as Laura Aull, Aaron Beveridge, and Derek Mueller. Seeking to expand the methods and insights of previous big-data composition research so that they can be applied to all aspects of composition, Licastro and Miller situate Composition and Big Data as an effort to both normalize and advance big data work. As part of their efforts, the co-editors situate big data work as equivalent to assessment. Referencing Ed White’s dictum that WPAs must assess or be assessed, the co-editors write that “if [compositionists] do not use data, we may well be used by data” (Licastro and Miller 4; emphasis original).

Compositionists’ views on big data should likely view it as even more significant than assessment. Almost all contemporary writing activity becomes big data, whether a writer wants it to or not. All writing activity on social media platforms gets turned into data (Gelms and Edwards), and even writing created via word-processing and email software generates data that feeds into multiple algorithms (Haswell). Almost all contemporary writing activity is entangled in and managed by big data systems. Thus, compositionists and their students must expand their writing literacies to include big data’s effects on writing processes and products. Expanding Licastro and Miller’s phrasing, almost all writers are used by big data. Thus, we need frameworks, pedagogies, and methodologies for expanding our abilities for grappling with big data.

The collection offers four perspectives on how compositionists can engage big data:

  • Big Data in Students’ Hands

  • Data Across Contexts

  • Data and The Discipline

  • Dealing with Data’s Complications

A few of these perspectives have already been explored. Most famously, big data methods have been used to examine and expand composition as an academic discipline, such as Derek Mueller’s Grasping Rhetoric and Composition by Its Long Tail and Miller et al.’s The Roots of an Academic Genealogy. The chapters in this collection that deal with disciplinarity expand this work nicely, revealing how big data methods can be used to generate new perspectives and insights. For example, Big-Time Disciplinarity by Kate Pantelides and Derek Mueller uses big data methods to examine the dates of composition conferences. The co-authors use their work to outline fresh perspectives on “disciplinary time” and how conferences create alternative labor demands that can be unfair and biased against graduate students, non-tenured faculty, and faculty with small children (Pantelides and Mueller 190-191).

Speaking of the discipline, the most compelling aspect of the collection is its argument for big data as a valid and necessary composition research method. In their introduction, the co-editors argue that big-data research should be treated as equal to other common composition research methods, such as case studies, ethnographies, and philosophical inquiry (Licastro and Miller 9). Composition should embrace big data research methods that generate possibilities for seeing writing literacies anew, as the chapters on Data and The Discipline demonstrate. The co-editors acknowledge that there has yet to be a critical mass of big data research studies, let alone frameworks for connecting the seemingly disparate research methods and the data sets they generate.

Cheryl E. Ball et al. in The Boutique is Open articulate a framework that aims to both expand who does big data work and then connect existing and future composition-related big data work. They use big data perspectives to argue that most compositionists regularly generate big data. Compositionists do this when they participate in typical composition work like assessment, programing norming, and writing case studies (Ball et al. 199-200). While these datasets are small scale, isolated, and difficult to connect, they could be re-situated through big data perspectives so that they could be connected and examined in ways that generate truly significant insights into writing literacies, pedagogies, and programs. So, the co-authors propose both a frame for these “small-scale” datasets, and they present the frame of “boutique datasets,” which recontextualize the aforementioned “small-scale” datasets in ways that enable the datasets to be connected and aligned (Ball et al. 200-201). Ball et al. argue that if most compositionists involved in research understand their work to be, in part, about generating “boutique datasets,” then others in the field could develop principles and databases for connecting those datasets. More databases could be built that connect and align assessment and programming norming datasets. Databases that could replace the once promising but no longer active Composition researchers could then use the databases to do distant readings of student writing from multiple universities (Ball et al. 205-207). Ball et al. use big data to re-contextualize the small, localized research that many compositionists regularly generate. Furthermore, they use big data to articulate how to frame that localized, “boutique” data in a way that makes it critical to the discipline’s future.

Furthering the argument for more big data research in composition, the collection offers several chapters outlining invaluable insights into ways to do big data research more ethically and soundly. In Ethics in Big Data Composition Research, Andrew Kulak draws from principles in the cybersecurity field to generate a framework for ethical big data research. He outlines how “privacy,” “cybersecurity,” and “transparency” are essential to doing ethical big data research. Along those same lines, Juho Paakkonen in Data Do Not Speak for Themselves articulates an important ethical component of reporting and writing about big data projects. Through an examination of his own big data work, he outlines how topic modeling, a common big data method, is not unbiased (Paakkonen 253-254). He demonstrates how the number of topics that a big-data researcher selects has significant impacts on what computational methods generate. He thus argues that big-data researchers need to clearly explain and justify the number of topics they selected when the present their findings. As part of this, he argues that big data researchers should be transparent about the fact that had they selected a different number of topics the results of their study would likely be different. In other words, Paakkonen articulates a specific method articulation program that would enable audiences to better understand how big data research was performed and how its findings were shaped by human and algorithmic elements (Paakkonen 256-257). The collection's strength lies in the perspectives on and approaches to big data research that it provides. However, the collection does not offer much in the way of expanding writing pedagogies and/or student engagement with big data methods.

Unfortunately, many of the chapters in the collection that address student writing present big-data research that mostly seems to re-affirm well-established composition perspectives. For example, in Chris Holcomb and Duncan A. Buell’s chapter, A Corpus of First-Year Composition, the co-authors examine writing students’ stylistic complexities to explore how academic writing stylistic conventions should be taught. The co-authors studied a corpus of texts from first-year students at their university, using big-data methods to analyze the stylistic moves the students made, with a particular focus on sentence structure and the usage of phrases and clauses (Holcomb and Buell 37). Through their research, the co-authors discover that the students write in “hybrid register.” Holcomb and Buell draw from some non-academic and some academic conventions in their first-year writing courses, negotiating the two (49). Their findings seem to mostly echo decades-old wisdom and research in composition that first-year composition students combine their existing writing literacies with the conceptions of academic writing that they are taught and that they perceive (see David Bartholomae’s 1986 Inventing the University).

Furthermore, the collection seems unable to broaden big data work beyond its connections to “distant reading.” “Distant reading” being the approach to studying texts, typically literature, that uses computational methods to generate new perspectives on texts, writers, and/or histories. For example, using algorithms to scan all of a writer’s poems to note the frequency of certain words or the regularity of certain words are used together. The pedagogical work in the collection, which there is little of, outlines ways for students to use big data methods to complete digital humanities projects. In Learning to Read Again, Trevor Hoag and Nicole Emmelhainz outline how they taught big data tools and methods in an Introduction to Digital Humanities course. They present their curriculum for the course in which they had students critically analyzing different bodies of texts (Hoag and Emmelhainz 22). Through examples of student work, the co-authors demonstrate that their curriculum, and big-data distant reading work in general, can strengthen students’ reflection and rhetorical analysis skills (Hoag and Emmelhainz 32-33). The curriculum they present, though, is from a literature-focused critical reading course, and not a composition course. It thus offers an exciting way to expand introductory literature, digital humanities, and/or reading courses, but fails to offer much for compositionists.

In sum, the collection coronates big data as a composition research method. The collection shows that for composition as a discipline big data is a viable, ethical, and needed research method. The collection offers new insights into how and why compositionists should use big data research methods as well as outlines new ethical considerations and approaches for this work. However, the collection, much like most of the prior composition big data work, remains ambiguous as to whether big data is a method that can only be used by researchers or if it can be expanded so that it can become something useful to students. For compositionists looking for ways to integrate big data into their composition courses or develop ways to have students compose with big data, the collection does not offer much. For compositionists interested in or doing big data research of their own, though, the collection is essential.

Works Cited

Bartholomae, David. Inventing the University. Journal of Basic Writing, vol. 5, no. 1, 186, pp. 4-23.

Gelms, Bridget, and Dustin Edwards. A Technofeminist Approach to Platform Rhetorics. Computers and Composition Online. Special issue on Technofeminism: (Re)Generations and Intersectional Futures. Eds. Jacqueline Rhodes, Angela Haas, and Danielle Nicole DeVoss. 2019:

Haswell, Rich. Automated Text-Checkers: A Chronology and Bibliography of. Computers and Composition Online. Fall 2005.

Miller, Benjamin, et al. The Roots of an Academic Genealogy: Composing the Writing Studies Tree. Kairos, vol. 20, no. 2, 2016:

Mueller, Derek. Grasping Rhetoric and Composition by Its Long Tail: What Graphs Can Tell Us About the Field’s Changing Shape. College Composition and Communication, vol. 64, no. 1, 2012, pp. 195-223.

Return to Composition Forum 50 table of contents.