Skip to content

Composition Forum 47, Fall 2021
http://compositionforum.com/issue/47/

Correlating What We Know: A Mixed Methods Study of Reflection and Writing in First-Year Writing Assessment

Jeff Pruchnic, Ellen Barton, Sarah Primeau, Thomas Trimble, Nicole Varty, and Tanina Foster

Abstract: Over the past two decades, reflective writing has occupied an increasingly prominent position in composition theory, pedagogy, and assessment as researchers have described the value of reflection and reflective writing in college students’ development of higher-order writing skills, such as genre conventions (Yancey, Reflection; White). One assumption about the value of reflection has been that skill in reflective writing also has a positive connection with lower-order writing skills, such as sentence-level conventions of academic discourse. However, evidence to confirm this assumption has been limited to small qualitative studies or deferred to future longitudinal research (Downs and Wardle). In the mixed methods assessment study presented here, we first investigated this assumption empirically by measuring the relationship between evaluative skills embedded in the genre of reflective writing and lower-order writing skills that follow sentence-level conventions of academic discourse. We found a high-positive correlation between reflection and writing assessment scores. We then used qualitative methods to describe key features of higher- and lower-scored reflective essays.

§1 Introduction

In early First-Year Composition (FYC) pedagogy, the skills of “good writing” have long been bifurcated into what McAndrew and Reigstad called higher-order concerns (HOCs) and lower-order concerns (LOCs) in an argument against the emphasis on LOCs in current-traditional pedagogies. Researchers have since used these terms to study a wide variety of writing contexts, including writing centers (Cross and Catchings), English language-learning programs (Moussu and David), and FYC pedagogy (e.g., Krest on teacher feedback and McDonald on revisions). With some variations, HOCs focus on purpose, audience, organization, and development; LOCs focus on grammatical constructions, such as sentence structure, word choice, punctuation, and spelling (Purdue OWL). Similarly, current research on outcome-based assessment in First-Year Writing (FYW) foregrounds the rhetorical knowledge of genres in and across disciplines and the use of a flexible writing process that develops “ideas before surface-level editing” (WPA Outcomes 3). The WPA Outcomes also recommend that FYW students “develop knowledge of linguistic structures [and] learn strategies for controlling conventions to align with readers’ and writers’ perceptions of correctness or appropriateness” (2).

Over the past two decades, reflective writing has occupied an increasingly prominent position in Writing Studies and composition pedagogy, based largely on its potential to represent the metacognition underpinning the writing process, provide a medium for the self-monitoring of learning, and facilitate the successful transfer of writing skills across new situations and contexts. However, as prominent as reflection has become as both a vector of writing theory and pedagogy and a tool for the direct assessment of student writing, discerning the explicit connection between skills in reflection and skills in writing has been more challenging. While the connection between reflection and writing skills have been made, researchers have mainly traced the transfer of such knowledge and skills across contexts using qualitative methods that demonstrate “higher-order processes, strategies and metacognitive reflections” (Reiff and Bawarshi 315). Similarly, reflection has also been forwarded as a particularly salutary category for assessment of both written and multimodal texts (Fiscus; Shipka; White; Yancey, Reflection; Yancey, Reflection). However, the comparative assessment of skills in reflection as related to lower-order sentence-level writing skills has not been examined rigorously. More specifically, empirical studies of the connection between instruction and training in reflection and skill in writing have long been deferred in hopes of longitudinal studies (Downs and Wardle) limited to data sets as small as a single student (Beaufort, College Writing) or a single pilot section of a course as compared to control sections (Yancey et al.).

Adler-Kassner and Wardle use the term “naming what we know” to describe the value of threshold concepts like reflection—constructs and practices that drive progress in composition scholarship and pedagogy even as they may not have been previously identified or defined (2-3). As a threshold concept that has become increasingly prominent in scholarship, reflection functions as both a concept important for students to understand as part of their individual praxis as well for researchers as they work to understand and incorporate reflection into their classroom pedagogies. We build on this work by measuring the relationships between the knowledge and skills of higher-order reflective writing and the knowledge and skills of lower-order conventions of sentence-level writing.

In this study, we analyzed a representative, randomized sample of reflective essays from a large-enrolling FYW course to determine the correlation between students’ ability to make arguments about their growth and achievement as writers and their demonstration of lower-order writing skills related to sentence-level conventions. Our assessment revealed that raters’ scores for reflective essays did indeed correlate well with scores for sentence-level writing conventions. This result, we argue, provides empirical evidence supporting claims for the value of the genre of reflective writing and the teaching of reflective activities for improving higher-order genre knowledge but also lower-order dimensions of students’ writing as well.

§2 Background

2.1 Reflection

Following Moon and Yancey (Reflection), we define reflection as a mental process engaged for a specific purpose. Since reflection is a cognitive process that also manifests as a product (i.e., reflective writing), it can occur in a variety of contexts that can be as transparent or opaque as the modality used (Yancey et al.). For example, reflection can happen silently and individually, as a student thinks back over a lesson or a project to gain insights about her own learning. Reflection can happen verbally and in a small group setting, as group members talk about struggles they are having in a particular project to gain insights on how best to problem-solve. In a very specific way, reflection can happen during writing, in response to a specific prompt designed to kick-start or continue thought processes to sort through existing knowledge or gain new perspectives (Moon; Jankens and Trimble; Yancey, Reflection; Yancey et al.). This latter mode (reflective writing) creates an artifact of the reflection, which can then be used in evaluation or assessment of various aspects of writing expertise. Because reflective writing as a genre requires students to both perform writing skills and reflect on those skills, it can tell us many things when used as an artifact for assessing student learning, especially about students’ processes of composing and their own understanding of the strengths and weaknesses of their writing skills.

Writing Studies research has also worked to define and articulate the role of reflective writing in transfer (Downs and Wardle; Yancey, Reflection; Yancey et al.). Beaufort’s Reflection: The Metacognitive Move toward Transfer of Learning brought us a step forward in our understanding of reflection’s role in the transfer of learning. Beaufort argues for reflective writing’s standing as “a standard pedagogical tool in Writing Studies ... an excellent means to begin to foster metacognition” (33). She further reinforces the principle of transfer of learning in which learners can apply abstract concepts, such as genre and rhetorical context, as “problem-solving tools to new situations” (36). For Beaufort, the question for transfer is a matter of “what is being reflected upon,” not just that reflection is occurring (38). To consider the “what” of transfer in assessing student reflections, Downs and Wardle’s case studies found that students demonstrated “increased self-awareness about writing,” “improved reading abilities and confidence,” and “raised awareness of research writing as conversation” (572-573). However, it has yet to be proven that such awareness alone “will transfer beyond course or student” in the performance of writing (Kutney 573). Moreover, beyond assessing awareness and metacognition, the role of reflection in improving the narrow genre of reflective writing does not account for the application of abstract concepts in new writing situations. In other words, while scholars have shown that a focus on increasing students’ capacity for reflection leads to better performance in response to writing prompts that explicitly encourage articulation of reflective abilities, far less attention has been paid to whether reflective writing can be operationalized to demonstrate skills in other types of writing. Our study seeks to fill this gap by measuring the correlation between students’ performance of reflection and writing skills.

2.2 Mixed Methods Approach to Assessment

The study presented here is part of an ongoing research program aimed at developing innovative methods for the assessment for FYW reflective essays, specifically the use of mixed methods approach to the large-scale direct assessment of FYW reflective essays (Pruchnic et al., Slouching; Barton et al., Thin-Slice Methods). Our research studies are conducted by the Composition Research Committee (CRC), a publication practicum for faculty, graduate students, and part-time faculty. In our composition program, we follow White by using the reflective essay as our main assessment artifact. We also follow the robust empirical approach for assessment in White et al.:

We need to be clear about our reason for advocating empirical techniques of program assessment: preventing over-simplified measures of writing programs—measures that fail to support instruction—is simply not possible without the use of sophisticated quantitative and qualitative processes. ...The use of descriptive and inferential statistics is more effective in administrative meetings than the rhetoric often employed. (114)

Assessment researchers have well-known options in choosing qualitative methods for analyzing FYW reflective essays, such as empirical qualitative research (Broad); textual analysis (Huckin); discourse analysis (Barton); and more. Quantitative methods for large-scale direct assessments of student writing are increasing in the assessment literature (Condon and Kelly-Riley; Elliot; Kelly-Riley; Pagano et al; and more).

For our study, we developed our quantitative assessment methods from an unlikely source—a method from Behavioral Psychology called thin-slice methods (Ambady). (Thin-slice methods were popularized as “fast cognition” in Gladwell’s best-seller Blink, and analyzed in Kahneman’s award-winning Thinking, Fast and Slow.) Thin-slice methods have been proved to be efficient and reliable in scoring within a wide variety of domains, including assessment in higher education: in one of the best known thin-slice studies, raters could reliably predict end-of-term teacher evaluation scores by scoring three 10-second video clips (Ambady and Rosenthal).

Thin-slice methods select and analyze “slices” for multiple raters to score with a common rubric. To select our slices, we followed Ambady’s recommendation: if a text has a definite beginning, middle, and end, the slices should come from these segments. We thus selected three paragraphs based on the traditional categories of essay structure for our slices: the first paragraph/introduction, a middle paragraph/body paragraph, and the final paragraph/conclusion. The first full paragraph on the middle page of the essay was excerpted for the middle/body paragraph (e.g., page three of a five-page essay or page four of a six-page essay).

We previously tested the use of FYW thin-slice methods in a comparative study of a full essay team and a thin-slice methods team, finding that thin-sliced methods were more reliable and more efficient (Pruchnic et al. Slouching). In reliability, the full essay team achieved an inter-rater reliability (IRR) of 0.60 (good reliability), while the thin-slice team achieved an IRR of 0.76 (excellent reliability). In efficiency, the teams were different as well: the full essay team required an assessment of twelve hours, while the thin-slice team required only five hours. Finally, on the thin-slice team, five raters read and scored each reflective essay; on the full essay team, only two raters read and scored each essay in the sample. (For a full description of our first study, including our arguments about reliability and validity, see Pruchnic et al., Slouching.)

Below we present our quantitative thin-slice methods and analysis (§3) and our qualitative methods and analysis (§4). We then discuss the connections between higher-order skills in reflective writing and lower-order writing skills in Writing Studies theory, pedagogy, and assessment (§5). We then turn to the limitations of this study and our related suggestions for future research (§6).

§3 Quantitative Methods and Results

3.1 Methods

3.1.1 Site, Institutional Review Board Approval (IRB), FYW Outcomes and Reflective Essay Assignment.

This research took place at an urban research university with a diverse enrollment of over 26,000 students. Our IRB reviewed and approved our protocol (Exempt).

3.1.2 Outcomes and Reflective Essay Assignment

In Naming What We Know, Adler-Kassner and Wardle encouraged composition programs to tailor the WPA Outcomes in their local context. Influenced by this framework, our composition program developed four standing FYW outcomes: reading, writing, researching, and reflection (see Table 1).

Table 1. FYW Learning Outcomes

Reading

Use reading strategies in order to identify, analyze, evaluate, and respond to arguments, rhetorical elements, and genre conventions in college-level texts and other media.

Writing

Compose persuasive academic genres, including argument and analysis, using rhetorical and genre awareness.

Researching

Use a flexible research process to find, evaluate, and use information from secondary sources to support and formulate new ideas and arguments.

Reflection

Use written to reflection to plan, monitor, and evaluate one’s own learning and writing.

Additionally, our program uses a common syllabus and a required assignment sequence. For this study, we chose to assess our reflection and writing outcomes, as they are based upon student writing.

The final assignment of the FYW course is an edited, evaluative reflective essay addressing the following prompt:

In this assignment, you will evaluate your growth as an English (ENG) 1020 student, using your choice of experiences and work on the projects to support your claims. In an essay of 4-5 pages, make an argument that analyzes your work in ENG 1020 in relation to the course learning outcomes listed on the syllabus for the course. Explain what you have achieved for the learning outcomes by citing specific passages from your essays and other assigned writings for the course, and by explaining how those passages demonstrate the outcomes. Also, consider describing the process you used to complete this work.

3.1.3 Sample

One affordance of our mixed methods approach to FYW assessment is a programmatic commitment to assess a statistically representative number of randomized reflective essays. We consider this a crucially important point: our prior Phase 2 assessments were small-scale, often with raters scoring fewer than ten percent of FYW essays, which had the dubious consequence of making pedagogical decisions without a solid understanding of the FYW student population as a whole.

In one FYW semester, composition instructors submitted 1,174 reflective essays from their sections. Our statistician determined that our representative, randomized sample size was 291 reflective essays (n=291/1174, 25%).

3.1.4 Rubrics and Norming

A second affordance of our mixed methods approach to FWY assessment is our development of a contextualized norming protocol as part of thin-slice methods. (Barton et al., Thin-Slice Methods). To develop rubrics for the reflection and writing outcomes, we followed Colombini and McBride’s storming and norming protocol. We stormed our way to rubrics based on the alignment of our programmatic outcomes, our FYW reflective assignment, and our assessment rubrics. We consider this a second crucially important point in our thin-slice methods: we are assessing student writing grounded in exactly what we asked the students to write. Finally, we chose a six-point Likert scale, which asks raters to make an explicit decision between achieving the outcome or not (see Table 2 and Table 3).

Table 2. Rubric for Reflection Assessment

6

5

4

3

2

1

Learning Outcome

Sufficient

Insufficient

Use written reflection to evaluate one’s own learning and writing.

Excellent
Argument (thesis, claim, relation to course outcomes).

Good
Argument (thesis, claim, relation to course outcomes).

Adequate
Argument (thesis, claim, relation to course outcomes).

Limited
Argument (thesis, claim, relation to course outcomes).

Poor
Argument (thesis, claim, relation to course outcomes).

No
Argument (thesis, claim, relation to course outcomes).

Excellent
Evidence (examples, analysis, experiences, discussion).

Good
Evidence (examples, analysis, experiences, discussion).

Adequate
Evidence (examples, analysis, experiences, discussion).

Limited
Evidence (examples, analysis, experiences, discussion).

Poor
Evidence (examples, analysis, experiences, discussion).

No
Evidence (examples, analysis, experiences, discussion).

Decision Rule: if an essay has a mixed score, record the lower score.

Example: if an essay has excellent evidence and a good argument, then it is a 5/Good.

Example: if an essay has an adequate argument but poor evidence, then it is a 2/Poor.

Explanation: Both rubrics had a decision rule directing raters to record the lower score across the two traits: for example, in the writing assessment, a score of four for paragraph organization and a score of three for sentence-level conventions, yielded a final score of three. Since the goal of assessment is to “move the needle” so that more students achieve the FYW outcomes, we asked raters to record the lower score of the two traits in order to capture the features of lower-scored essays (see §4).

Table 3. Rubric for Writing Assessment

6

5

4

3

2

1

Learning Outcome

Sufficient

Insufficient

Demonstrates effective paragraph organization for academic discourse

Excellent
Paragraph organization

(topic sentence, paragraph development)

Good
Paragraph organization

(topic sentence, paragraph development)

Adequate
Paragraph organization

(topic sentence, paragraph development)

Limited
Paragraph organization

(topic sentence, paragraph development)

Poor
Paragraph organization

(topic sentence, paragraph development)

No
Paragraph organization (topic sentence, paragraph development

Uses sentence-level conventions of academic discourse

Excellent
Editing
(grammar, punctuation, spelling, awkward sentences)

Good
Editing
(grammar, punctuation, spelling, awkward sentences)

Adequate
Editing
(grammar, punctuation, spelling, awkward sentences)

Limited
Editing
(grammar, punctuation, spelling, awkward sentences)

Poor
Editing
(grammar, punctuation, spelling, awkward sentences)

No
Editing
(grammar, punctuation, spelling, awkward sentences)

Decision Rule: if an essay has a mixed score, record the lower score.

Example: If an essay has excellent editing and good paragraph organization, then it is a 5/Good.

Example: If an essay has adequate paragraph organization and poor editing, then it is a 2/Poor.

Explanation: Both rubrics had a decision rule directing raters to record the lower score across the two traits: for example, in the writing assessment, a score of four for paragraph organization and a score of three for sentence-level conventions, yielded a final score of three. Since the goal of assessment is to “move the needle” so that more students achieve the FYW outcomes, we asked raters to record the lower score of the two traits in order to capture the features of lower-scored essays (see §4).

Note: Our rubric for the Writing Assessment had two components, paragraphing and sentence-level conventions of academic discourse. In this study, we analyzed conventions of academic discourse. Space prevented a full study of paragraphing, which we must leave for future research (see §6).

We normed our teams to the rubrics using a best-practices contextualized norming protocol for thin-slice methods (Barton et al., Thin-Slice Methods). Our assessment took place over two mornings. On day one, our assessment coordinator, an experienced member of the CRC, first introduced raters to reliability as an important goal of the assessment, defining reliability as a measure of consistency across raters. He also provided an overview of our thin-sliced methods used in the assessment. To achieve reliability, he asked raters to focus solely on scoring with the rubric, deferring other considerations, such as voice, narrative, readiness for the next composition course, quality of writing in general, or personal heuristics, which, he respectfully acknowledged, can be challenging for experienced composition instructors. The coordinator normed using multiple techniques, including a didactic review of six anchor essays for reflection and writing rubrics. He continually made direct connections between the language of the rubric (e.g., argument and evidence for reflection; sentence-level conventions for writing). He also focused on both easy-to-score essays and difficult-to-score essays, with special attention to the distinctions between a final score of four (achieving the outcome) and a final score of three (not achieving the outcome). Individual raters then scored a set of four sample essays drawn from difficult-to-score essays in previous research. For each essay, raters recorded their scores on a whiteboard, and the coordinator led a guided discussion of the scores in terms of the rubric. The first norming took ninety minutes. On day two of the assessment, the coordinator provided a booster norming by reviewing the importance of scoring solely with the rubrics, asking raters to score three more difficult-to-score essays, and guiding the discussion about the scores. Overall, the coordinator and raters worked slowly and carefully with thirteen thin-sliced reflective essays, an affordance of time allowed in thin-slice methods.

3.1.5 Teams, Scoring, and Data Collection

We formed two teams of ten experienced FYW instructors (faculty, graduate teaching assistants, and part-time faculty), one for the reflection assessment and one for the writing assessment. We divided each team into two groups of five, each of which scored half of the essays in the sample (n=147, n=144, total n=291). After the norming, raters received their thin-sliced essays. The assessment coordinator asked them to score with the rubrics individually, without consultation or discussions with other raters.

For quantitative data collection, the final score for each reflective essay was the average of the five raters’ scores.

3.2 Data Analysis

3.2.1 Research Questions

The purpose of this quantitative thin-slice investigation was to determine whether higher-order skills in reflective writing correlate with the lower-order writing skills of conventions of academic discourse. Our research questions (RQs) were the following:

  1. What was the inter-rater reliability of the reflection assessment?

  2. What was the inter-rater reliability of the writing assessment?

  3. What was the correlation between reflection scores and writing scores?

3.2.2 Reliability

Reliability and instrument validity are crucially important in thin-sliced methods: raters must achieve good or excellent inter-rater reliability (IRR) in scoring, which, in turn, provides evidence supporting the validity of the rubrics (the higher, the better).

To answer RQs #1 and #2, we established the IRR of the reflection team scores and the writing team scores. In general, IRR establishes the degree of agreement among raters’ scores (in statistical language, the covariance of scores). Our statistician chose to use the Intra-Class Correlation Coefficient (ICC) as our measure of reliability because it is an inferential measure used for numerical data scored by multiple raters (Hallgren). An ICC measurement will be high when there is little variation between raters’ scores, and low when there is greater variation between raters’ scores.

To interpret the ICC results we followed Cicchetti’s classifications: Excellent reliability (>0.75); Good reliability (0.60-0.74); Fair reliability (0.40-0.59); and Poor reliability (<0.39). The ICC indicates whether the members of the reflection team and the writing team were using their rubrics to score the reflective essays consistently. ICC results indicate the degree to which these scores would be reproducible given the same data, rubric, and conditions (Hallgren).

3.2.3 Correlation

A correlation is a numeric measure to determine the strength of the relationship between two variables (here, reflection scores and writing scores). To answer RQ #3, we tested the null hypothesis for a correlation: that there is no significant relationship between the scores of the reflection outcome and the scores of the writing outcome (see Figure 1).

Visualization using dots aligned on an axis to demonstrate the spectrum of correlation running from perfect positive correlation to perfect negative correlation.

Figure 1. Correlation (Visualization).

To determine the correlation of our scores in reflection and scores in writing, we used the Pearson Correlation Coefficient—Pearson’s r—(Hinkle et al.). Our statistician chose Pearson’s r as our measure of correlation because it, too, is used for numeric data and because it is an inferential measure of similarity (a linear relationship on a line graph).

To interpret the results of the correlation calculation, we followed Mukaka’s classifications: Perfect positive correlation (+1.0); High positive correlation (+0.51 to +0.99); Low positive correlation (+0.01 to +0.50); No correlation (0); Low negative correlation (-0.01 to -0.50); High negative correlation (-0.51 to -0.99); Perfect negative correlation (-1.0).

While the ICC measures the degree of variance in agreement (consistency), Pearson’s r is the measure of the extent to which two sets of scores co-vary together (similarity).

3.3 Quantitative Results

3.3.1 Reliability

We first established the IRR of the two sets of scores using the ICC. The reflection team achieved ICC results of 0.79, interpreted as excellent reliability, and the writing team achieved ICC results of 0.68, interpreted as good reliability (Cicchetti).

3.3.2 Correlation

Having established reliability and instrument validity, we then calculated the correlation between the reflection scores and the writing scores using the Pearson Correlation Coefficient measure (Pearson’s r).

Graph showing the correlation between reflection scores and
writing scores in this study as calculated using Pearson’s r.

Figure 2. Pearson’s r for Reflection Scores and Writing Scores.

The correlation between the reflection scores and the writing scores was significant: r=0.641, p= < .001, a high positive correlation, as shown in the linear graph in Figure 2.

In sum, these quantitative results answered RQs #1-3 positively by confirming a robust relationship between skills in reflective writing and lower-order writing skills.

§4 Qualitative Methods and Results

After confirming this positive relationship between reflection and writing scores, we turned to qualitative methods to examine our FYW reflective essays. The purpose of the qualitative analysis was to identify key textual features in reflective essays receiving higher or lower scores in reflection and writing.

4.1 Sample

To build our sample, we followed Miles and Huberman's suggestion to examine “surprises” in our data to better understand the higher- or lower-scores in reflection and writing. Our interest in these essays is consistent with principles of qualitative research, which see outliers not as challenges to findings, but as generative sources of description and theory building. As such, the purpose of the qualitative analysis was to identify key features in reflective essays receiving higher or lower scores in reflection and writing.

Our sample included five essays with higher-order reflection scores and four essays with lower-order writing scores (n=9).

4.2 Rich Features Analysis

We examined our nine essays in full detail, using rich feature analysis: rich features point to the connections between features and their contexts (Barton). In this study, we used the outcomes and rubrics for initial prompts:

In a group CRC meeting, members read each essay silently, called out features they noticed by using the rubrics, and jointly identified rich features related to higher- and lower-order reflection and writing.

4.3 Qualitative Results

We found that reflection scores were tied to the presence or absence of three rich features: summary, application, and evidence. Summary statements involve a “look back,” recounting specific writing experiences or assignments (Moon; Yancey, Reflection). Application statements connect specific writing experiences to current learning or assumptions about future writing experiences. Finally, evidence statements involve citations of a student’s writing as a way of demonstrating learning.

As an example of summary, application, and evidence in action, the following excerpt is from an essay that scored high on both writing and reflection:

In the reflection, I talk about the parts of the essay that I felt I could have made stronger, and that helped me evaluate my own learning. One example of me pointing out my own flaws is in the end of the introduction. I say... “I was trying to create a general understanding that the environment affects your ideologies, but I didn’t really put it into words too well. In my mind it sounded perfect.” By pointing out these flaws about myself, I have learned next time to choose my words carefully and organize my thoughts before I start writing, which is precisely what I did not do in my argument.

This student summarizes a specific assignment, articulates what was learned, and provides evidence from the student’s own writing to support the claim made at the end of the excerpt, following sentence-level conventions of academic discourse.

We also found essays with lower-order features at the sentence level. As an example, the following excerpt contains higher-order statements of summary, application, and evidence in reflection; however, the essay included several lower-lower-order grammatical constructions:

Throughout this semester I had to use research to get information on my writing to make sure my paper was accurate. An example of when I used this was in the rhetorical paper. In this paper I analyzed the speech Ballot or Bullet which deals with the freedom fighting movement that took place in America in the late 20th century. I had to research about different forms of communication in relation to this movement to help draw comparison the speech I was analyzing (italics added) ...Without research beforehand I would have struggled to find examples of this and also would not have been able to fully understand the examples.

This excerpt contains all three textual features associated with high reflection scores. The writer summarizes the assignment (“I analyzed...”), considers the relevance of their work on the assignment for their learning (“Without research beforehand, I would have struggled...”) and includes evidence from their own writing.

We also identified three lower-order features in the italicized sentence: the insertion of the preposition about, the dropped ‘s’ on the word comparison, and the absence of the preposition to between comparison and speech. Across the sample, we also identified several features of awkward sentences (Barton et al., The Awkward), including the anthropomorphism in “I trained my techniques”; the adjective in “a quick amount of stress”; and the lack of parallel constructions in the sentence, “I entered this course feeling confident that I would work effectively and well ahead of the time.”

Finally, we found essays in which one or more reflective features were missing (i.e., summary, application, evidence) despite raters’ recognition of writers’ ability to clearly formulate a claim without lower-order sentence conventions. As an example of essays in this category, one student wrote, “we were instructed to use online library databases, namely ProQuest and Academic OneFile. These two databases are helpful, especially when scholarly or peer-reviewed sources are needed for a paper.” In this passage, the student summarizes the instructions given during the research project but does not attempt to connect this experience to the student’s learning or cite from their own writing.

In sum, this qualitative analysis identified three higher-order features related to higher scores in reflection as well as a number of lower-order features related to writing.

§5. Discussion

We discuss the contributions from our study in three areas: Writing Studies assessment (5.1), theory (5.2), and pedagogy (5.3).

5.1 Contributions to FYW Assessment

Our research team’s previous work (Barton et al, Thin Slice Methods, and Pruchnic et al) suggests thin-slice sampling of student work combined with structured norming protocols produces more reliable data in less time than traditional assessment scoring. This allows assessors to read larger samples of student work, to assess more features and outcomes of interest, and ideally, to spend less time on gathering data and more time on using data to improve instruction. The results of this study demonstrate that thin-slice assessment can also be used to identify meaningful correlations between features of student writing and course learning outcomes.

We have two further implications for writing assessment. First, our findings identify potential efficiencies for assessment teams tasked with assessing multiple course outcomes at a given time. Knowing that student achievement on particular outcomes correlate provides a warrant for making claims based on a single stream of data, thereby allowing teams to cover more assessment ground in the same amount of time. While this study suggested a positive relationship between reflection and lower order features, we hope that further research will identify interesting correlations between other elements of student writing.

Second, while our work confirms our field’s assumptions that features of writing are interrelated, our methods and findings also open lines of inquiry into how raters and administers perceive and interpret connections between elements of student writing, as inscribed in course learning outcomes, composition assignments, pedagogies, and innovative mixed methods for assessment.

5.2 Implications for Writing Studies Theory

The end-of-semester reflective essay described in our FYW composition program tasked students with composing an evaluative, reflective argument about their growth and achievement over the duration of their FYW course using sentence-level conventions of writing. As suggested by Scott and Levy and extended by Gorzelsky et al., being able to evaluate one’s learning and using one’s skills of writing, is one of several metacognitive behaviors that can be the target of reflective writing. Argumentative evaluation with evidentiary support, however, is also a dimension of writing skill as articulated within guiding documents such as the WPA Outcomes Statement for First-Year Composition (3.0), which has influenced our institution’s FYW learning outcomes. As such, we argue that Writing Studies theory should be more precise when considering the specific characteristics of reflection as opposed to other dimensions of writing skills. Difficulty in separating these two domains has likely been a detriment to the empirical testing of the correlation of skill in reflection and skill in writing, broadly defined, in at least two ways. On the one hand, the more that the outcome of writing is expanded to include metacognitive and self-referential behaviors, the more intuitive the connection between the two domains seems; subsequently testing the correlation between the two seems unnecessary. On the other hand, blurring the boundaries between the two categories makes it functionally more difficult to determine how they might be assessed separately as well as how to determine more specific correlations between discrete skills or moves in academic writing that might be more or less aided by training in reflection and reflective writing.

5.3 Implications for Writing Studies Pedagogy

The fact that our study focused on a curriculum that emphasizes formal reflective assignments as well as our qualitative analysis of the high-scoring student papers suggests that instructors would do well to provide explicit instruction around the rhetorical situations for reflective assignments. This includes practicing the key rhetorical moves that we argue are associated with high-scoring reflective evaluations (i.e., summary, application, and evidence), as well as practicing paragraph- and sentence-level concerns involving effective organization and editing/proofreading. Since written reflections are often associated with personal writing and/or used as subsidiary to other projects (e.g., reflections turned in alongside a longer project), these rhetorical features may not always be emphasized appropriately. Even when students use reflective writing in informal journal entries, which are positioned as a strategy for learning, writers should be encouraged to practice these key rhetorical features of reflection as ways of moving beyond superficial observations or narrative descriptions of what happened, to more deeply consider what particular experiences, actions, and attitudes mean in the life of one’s writing development. At the same time, writers need to be reminded of the differences between journal writing, which is often written for oneself, and reflective writing used to persuade others. In this way, writers should understand that persuasive reflective writing should be revised, edited, and proofread like other forms of public writing. While it might be the case that informal reflective assignments with less emphasis on the rhetorical situation and audience concerns might also build skills in reflection as well as lower order writing skills, our study did not allow us to support that conclusion and it is quite possible that combining reflective activity with informal assignments could hurt the correlation we saw demonstrated between skills in reflection and lower-order writing skills.

§6. Conclusion

6.1 Limitations

There is one major limitation to our study design: we did not have access to a comparative set of student essays from classes in which reflective writing was not taught and/or assigned to students. While one of our research questions focuses on the correlation between skills in reflection and lower-order dimensions of writing and we were able to demonstrate correlation in our study, the motivation behind that question is to find evidence that there is a connection between learning and demonstrating skill in reflection and learning and demonstrating lower-order writing skills. Comparing demonstration lower-order writing skills as a single measurement in two sets of papers, one in which reflection was emphasized in the classroom and one in which it was not, would have provided additional evidence for the connection between these skills. However, because of the centrality of reflection to the course learning outcomes and pedagogical training of the writing program that was the focus of this study, institutional and ethical barriers prevented us from designing a study that would omit reflection entirely from the relevant course curriculum.

6.2 Future Research

Although this study provides evidence of a correlation between writing skills and reflective skills, research on learning transfer and metacognition suggests a causal relationship between reflection and writing (i.e., that practice with reflective writing improves writing skill), and future studies might use these methods to further explicate that relationship. For example, future studies may use our assessment methods to answer the question: do certain types of reflective assignments, instruction, or feedback support students’ writing development more than others?

Another promising avenue for future research would be a closer evaluation of how the cognitive skills used in reflection overlap those of lower order writing skills. As an early reviewer of this manuscript suggested to us, it is possible that skill in reflection correlates with lower-order writing skills because there is a relationship between the cognitive skills demonstrated in making decisions about, for example, sentence-level conventions, which might be similar to those used in the cognitive process of reflection. Such an analysis would be a promising focus for a future study of the relationship between reflection and writing.

Future research that builds on our study of the correlation between skills in reflection and lower order writing skills might also some of the same research design to assess correlation between reflection and other common learning outcomes for FYC curricula. For instance, a recent study by Lindenman et al. analyzing the (dis)connection between students’ reflections and their revision processes is a great example of pathways for researching the nuanced relationships between students’ reflective practices and discrete components of the writing process. There are, of course, a large number of additional traits that could be studied in regard to their correlation (or lack thereof) with reflection in addition to editing practices and the lower order writing skills under review in this essay.

Regardless of the possible paths future research might take, the present study speaks to the importance of developing evidence to support our field’s common assumptions about reflection. In her entry on reflection in Keywords in Writing Studies, Yancey argues that reflection has “informed Writing Studies almost from the beginning of the modern iteration of the field” (150) as she traces its role in research on pedagogical practice, assessment, and most recently learning transfer (152-153). Similarly, position statements from the National Council of Teachers of English (NCTE) and the Council of Writing Program Administrators (CWPA) recommend that instructors help students develop reflective practices in the writing classroom, suggesting that reflection has an established place in Writing Studies (Framework). The recent explicit attention to reflection in Writing Studies scholarship, however, is likely the result of its position as a prominent threshold concept during a time in which threshold concepts have themselves emerged as key conceptual devices in both research and pedagogical design within Writing Studies. Indeed, we might even go so far as to suggest that reflection is not just a threshold concept but is something of a microcosm for threshold concepts: it encompasses the activity of a writer discovering their own individual concepts and practices, examining the thresholds and progress in their personal development as writers. As our study suggests here, it might also be the case that the benefit of work on reflection and its correlation to other skills in writing would also apply to many threshold concepts and there is much potential for future scholarship that attempts to empirically test the theoretical presuppositions embedded in threshold concepts. As threshold concepts have themselves tended to emerge from observed practice, such research would in many ways close the theory-practice loop and help us not only “name what we know” but also allow us to test and correlate the lesser known and more discrete effects of these concepts and practices as delivered through curricula or combined with other pedagogical activities.

Works Cited

Adler-Kassner, Linda, and Elizabeth Wardle. Naming What We Know: The Project of this Book. Naming What We Know: Threshold Concepts of Writing Studies, Utah State University Press, 2015, pp. 1-11.

Ambady, Nalini, et al. Toward a Histology of Social Behavior: Judgmental Accuracy from Thin Slices of the Behavioral Stream. Advances in Experimental Social Psychology, vol. 32, 2000, pp. 201-57.

Ambady, Nalini, and Robert Rosenthal. Half a Minute: Predicting Teacher Evaluations from Thin Slices of Nonverbal Behavior and Physical Attractiveness. Journal of Personality and Social Psychology, vol. 64, no. 3, 1993, pp. 431-41.

Barton, Ellen. Linguistic Discourse Analysis: How the Language in Texts Works. What Writing Does and How It Does It: An Introduction to Analyzing Texts and Textual Practices, edited by Charles Bazerman and Paul Prior, Routledge, 2004, pp. 57-83.

Barton, Ellen, et al. The Awkward Problem of Awkward Sentences. Written Communication, vol. 15, no. 1, 1998, pp. 57-83.

---., et al. Thin-Slice Methods and Contextualized Norming: Innovative Assessment Methodologies for Austere Times. The Expanded Universe of Writing Studies: Higher Education Writing Research, edited by Kelly Blewett, Tiane Donahue, and Cynthia Monroe, Peter Lang (forthcoming).

Beaufort, Anne. College Writing and Beyond: A New Framework for University Writing Instruction. Utah State University Press, 2007.

---. Reflection: The Metacognitive Move toward Transfer of Learning. A Rhetoric of Reflection, edited by Kathleen Blake Yancey, Utah State University Press, 2016, pp.23-41.

Broad, Bob. Strategies and Passions in Empirical Research. Writing Studies Research in Practice: Methods and Methodologies, edited by Lee Nickoson and Mary Sheridan, Southern Illinois University Press, 2012, pp. 197-209.

Cicchetti, Domenic. Guidelines, Criteria and Rules of Thumb for Evaluating Normed and Standardized Assessment Instruments in Psychology. Psychological Assessment, vol. 6, no. 4, 1994.

Colombini, Crystal, and Maureen McBride. ‘Storming and Norming’: Exploring the Value of Group Development Models in Addressing Conflict in Communal Writing Assessment. Assessing Writing, vol. 17, no. 4, Oct. 2012, pp. 191-207.

Condon, William, and Kelly-Riley, Diane. Assessing and Teaching What We Value: The Relationship Between College-level Critical Thinking and Writing Abilities. Assessing Writing, vol. 9, no. 1, 2004, pp. 56-75.

Cross, Susan, and Libby Catchings. Consultation Length and Higher Order Concerns: A RAD Study. WLN:A Journal of Writing Center Scholarshipvol. 43, no. 4, 2018, pp. 18-25.

Downs, Douglas, and Elizabeth Wardle. Teaching about Writing, Righting Misconceptions: (Re)Envisioning ‘First-Year Composition’ as ‘Introduction to Writing Studies.’ College Composition and Communication, vol. 58, no. 4, 2007, pp. 552-84.

Elliot, Norbert. You Will Not Be Able to Stay Home: Quantitative Research in Writing Studies. Explanation Points: Publishing in Rhetoric and Composition, edited by J.R. Gallagher and D. N Voss, Utah State University Press, 2019, pp.84-89.

Fiscus, Jaclyn. Genre, Reflection, and Multimodality: Capturing Uptake in the Making. Composition Forum, vol. 37, Fall 2017, https://files.eric.ed.gov/fulltext/EJ1162165.pdf. Accessed 15 June 2020.

Framework for Success in Postsecondary Writing. CWPA, 17 July 2019, https://wpacouncil.org/aws/CWPA/pt/sd/news_article/242845/_PARENT/layout_details/false. Accessed 15 June 2020.

Gladwell, Malcolm. Blink: The Power of Thinking without Thinking. Little, Brown and Co., 2005.

Gorzelsky, Gwen, et al. Cultivating Constructive Metacognition: A New Taxonomy for Writing Studies. Critical Transitions: Writing and the Question of Transfer, edited by Chris Anson and Jessie Moore, The WAC Clearninghouse, 2017, https://wac.colostate.edu/docs/books/ansonmoore/chapter8.pdf. Accessed 15 June,2020.

Hallgren, Kevin. Computing Inter-Rater Reliability for Observational Data: An Overview. Tutorials in Quantitative Methods for Psychology, vol. 8, no. 1, 2012, pp. 23-24.

Hinkle, Dennis, et al. Applied Statistics for the Behavioral Sciences. 5th ed., Houghton Mifflin, 2003.

Huckin, Thomas. Context-Sensitive Text Analysis." Methods and Methodology in Composition Research, edited by Gesa Kirsch and Patricia Sullivan, Southern Illinois University Press, 1992, pp. 84-104.

Jankens, Adrienne, and Thomas Trimble. Using Taxonomies of Metacognitive Behaviors to Analyze Student Reflection and Improve Teaching Practice. Pedagogy, vol. 19, no. 3, 2019, pp. 433-454.

Kahneman, Daniel. Think, Fast and Slow. Farrar, Strar and Gioux, 2011.

Kelly-Riley, Diane. Validity of Race and Shared Evaluation Practices in a Large-scale, University-wide Writing Portfolio Assessment. Journal of Writing Assessment, vol. 4, issue 1, 2011, https://www.journalofwritingassessment.org/article.php?article=53. Accessed 19 October 2021.

Krest, Margie. Monitoring Student Writing: How Not to Avoid the Draft. Journal of Teaching Writing, vol. 7, no. 1, 1988, pp. 27-39.

Kutney, Joshua. INTERCHANGES - Will Writing Awareness Transfer to Writing Performance?—Response to Downs and Wardle. College Composition and Communication., vol. 59, no. 2, 2007, p. 276.

Lindenman, Heather, et al. Revision and Reflection: A Study of (Dis)Connections Between Writing Knowledge and Writing Practice. College Composition and Communication, vol. 69, no. 4, 2018, pp. 581-611.

McAndrew, Donald, and Thomas Reigstad. Tutoring Writing: A Practical Guide for Writing Conferences. Heinemann, 2001.

McDonald, W.U. The Revising Process and the Marking of Student Papers. College Composition and Communication, vol. 24, no. 2, 1978, pp. 167-170.

Miles, Matthew B, and A M. Huberman. Qualitative Data Analysis: An Expanded Sourcebook. Sage, 1994.

Moon, Jennifer. A Handbook of Reflective and Experiential Learning: Theory and Practice. Routledge, 2004.

Moussu, Lucie, and Nicholas David. Writing Centers: Finding a Center for ESL Writers. ESL Readers and Writers in Higher Education: Understanding Challenges, Providing Support, edited by Norman Evans, Neil Anderson, and William Eggington, Routledge, 2015, 49-63.

Mukaka, Mavuto. Statistics Corner: A Guide to Appropriate Use of Correlation Coefficient in Medical Research. Malawi Medical Journal, vol. 24, no. 3, 2012, pp. 69-71.

Pagano, Neil, et al. An Inter-Institutional Model for College Writing Assessment. College Composition and Communication, vol. 60, no. 2, 2008, pp. 285-320.

Pruchnic, et al. Slouching Toward Sustainability: Mixed Methods in the Direct Assessment of Student Writing. The Journal of Writing Assessment, vol. 11, no. 1, 2018, https://journalofwritingassessment.org/article.php?article=125. Accessed 15 June 2020.

Purdue OWL. Higher Order Concerns (HOCs) and Lower Order Concerns (LOCs). https://owl.purdue.edu/owl/general_writing/mechanics/hocs_and_locs.html Accessed 15 June 2020.

Reiff, Mary Jo, and Anis Bawarshi. Tracing Discursive Resources: How Students Use Prior Genre Knowledge to Negotiate New Writing Contexts in First-Year Composition. Written Communication, vol. 28, no. 3, 2011, pp. 312-37.

Scott, Brianna, and Matthew Levy. Metacognition: Examining the Components of a Fuzzy Concept. EREJ Educational Research eJournal, vol. 2, no. 2, 2013, pp. 120-31.

Shipka, Jody. Negotiating Rhetorical, Material, Methodological, and Technological Difference: Evaluating Multimodal Designs. College Composition and Communication, vol. 61, no. 1, 2009, pp. 343-66.

WPA Outcomes For First-Year Composition (3.0), Approved July 17, 2014. CWPA, 18 July 2019, https://wpacouncil.org/aws/CWPA/pt/sd/news_article/243055/_PARENT/layout_details/false. Accessed 15 June 2020.

White, Edward. The Scoring of Writing Portfolios: Phase 2. College Composition and Communication, vol. 56, no. 4, 2005, pp. 581-600.

White, Edward, et al. Very Like a Whale: The Assessment of Writing Programs. Utah State University Press, 2015.

Yancey, Kathleen. Reflection. Keywords in Writing Studies, edited by Paul Heilker and Peter Vandenberg, Utah State University Press, 2015, 150-154.

---. Reflection in the Writing Classroom. Utah State University Press, 1998.

Yancey, Kathleen, et al. Writing across Contexts: Transfer, Composition, and Cultures of Writing. Utah State University Press, 2014.

Return to Composition Forum 47 table of contents.