= Writing assessment =

Writing assessment refers to an area of study that contains theories and practices that guide the evaluation of a writer's performance or potential through a writing task. Writing assessment can be considered a combination of scholarship from composition studies and measurement theory within educational assessment. Writing assessment can also refer to the technologies and practices used to evaluate student writing and learning. An important consequence of writing assessment is that the type and manner of assessment may impact writing instruction, with consequences for the character and quality of that instruction.

==Contexts==
Writing assessment began as a classroom practice during the first two decades of the 20th century, though high-stakes and standardized tests also emerged during this time. During the 1930s, College Board shifted from using direct writing assessment to indirect assessment because these tests were more cost-effective and were believed to be more reliable. Starting in the 1950s, more students from diverse backgrounds were attending colleges and universities, so administrators made use of standardized testing to decide where these students should be placed, what and how to teach them, and how to measure that they learned what they needed to learn. The large-scale statewide writing assessments that developed during this time combined direct writing assessment with multiple-choice items, a practice that remains dominant today across U.S. large scale testing programs, such as the SAT and GRE. These assessments usually take place outside of the classroom, at the state and national level. However, as more and more students were placed into courses based on their standardized testing scores, writing teachers began to notice a conflict between what students were being tested on—grammar, usage, and vocabulary—and what the teachers were actually teaching—writing process and revision. Scholars and experts who measured education valued different kinds of standards to assess, while writing studies scholars wanted the focus of writing assessments to be on students learning. Because of this divide, educators began pushing for writing assessments that were designed and implemented at the local, programmatic and classroom levels. As writing teachers began designing local assessments, the methods of assessment began to diversify, resulting in timed essay tests, locally designed rubrics, and portfolios. In addition to the classroom and programmatic levels, writing assessment is also hugely influential on writing centers for writing center assessment, and similar academic support centers.

==History==
Because writing assessment is used in multiple contexts, the history of writing assessment can be traced through examining specific concepts and situations that prompt major shifts in theories and practices. Writing assessment scholars do not always agree about the origin of writing assessment.

The history of writing assessment has been described as consisting of three major shifts in methods used in assessing writing. The first wave of writing assessment (1950-1970) sought objective tests with indirect measures of assessment. The second wave (1970-1986) focused on holistically scored tests where the students' actual writing began to be assessed. And the third wave (since 1986) shifted toward assessing a collection of student work (i.e. portfolio assessment) and programmatic assessment.

The 1961 publication of Factors in Judgments of Writing Ability in 1961 by Diederich, French, and Carlton has also been characterized as marking the birth of modern writing assessment. Diederich et al. based much of their book on research conducted through the Educational Testing Service (ETS) for the previous decade. This book is an attempt to standardize the assessment of writing and is responsible for establishing a base of research in writing assessment.

==Major concepts==

===Validity and reliability===
The concepts of validity and reliability have been offered as a kind of heuristic for understanding shifts in priorities in writing assessment as well interpreting what is understood as best practices in writing assessment.

In the first wave of writing assessment, the emphasis is on reliability: reliability confronts questions over the consistency of a test. In this wave, the central concern was to assess writing with the best predictability with the least amount of cost and work. Some scholars like David Slomp held concern over the definition of reliability and how interrater reliability could affect how writing was assessed. He argued that often times there was not enough context based on the learner to appropriately judge their writing.

Then there was a shift toward the second wave marked a move toward considering principles of validity. Validity confronts questions over a test's appropriateness and effectiveness for the given purpose. Methods in this wave were more concerned with a test's construct validity: whether the material prompted from a test is an appropriate measure of what the test purports to measure. Teachers began to see an incongruence between the material being prompted to measure writing and the material teachers were asking students to write. Holistic scoring, championed by writing scholar Edward M. White, emerged in this wave. It is one method of assessment where students' writing is prompted to measure their writing ability.

The third wave of writing assessment emerges with continued interest in the validity of assessment methods. This wave began to consider an expanded definition of validity that includes how portfolio assessment contributes to learning and teaching. In this wave, portfolio assessment emerges to emphasize theories and practices in Composition and Writing Studies such as revision, drafting, and process.

===Direct and indirect assessment===
Indirect writing assessments typically consist of multiple choice tests on grammar, usage, and vocabulary. Examples include high-stakes standardized tests such as the ACT, SAT, and GRE, which are most often used by colleges and universities for admissions purposes. Other indirect assessments, such as Compass, are used to place students into remedial or mainstream writing courses. Direct writing assessments, like Writeplacer ESL (part of Accuplacer) or a timed essay test, require at least one sample of student writing and
are viewed by many writing assessment scholars as more valid than indirect tests because they are assessing actual samples of writing. Portfolio assessment, which generally consists of several pieces of student writing written over the course of a semester, began to replace timed essays during the late 1980s and early 1990s. Portfolio assessment is viewed as being even more valid than timed essay tests because it focuses on multiple samples of student writing that have been composed in the authentic context of the classroom. Portfolios enable assessors to examine multiple samples of student writing and multiple drafts of a single essay.

==Methods==
Methods of writing assessment vary depending on the context and type of assessment. The following is an incomplete list of writing assessments frequently administered:

===Portfolio===
Portfolio assessment is typically used to assess what students have learned at the end of a course or over a period of several years. Course portfolios consist of multiple samples of student writing and a reflective letter or essay in which students describe their writing and work for the course. "Showcase portfolios" contain final drafts of student writing, and "process portfolios" contain multiple drafts of each piece of writing. Both print and electronic portfolios can be either showcase or process portfolios, though electronic portfolios typically contain hyperlinks from the reflective essay or letter to samples of student work and, sometimes, outside sources.

===Timed-essay===
Timed essay tests were developed as an alternative to multiple choice, indirect writing assessments. Timed essay tests are often used to place students into writing courses appropriate for their skill level. These tests are usually proctored, meaning that testing takes place in a specific location in which students are given a prompt to write in response to within a set time limit. The SAT and GRE both contain timed essay portions.

===Rubric===
A rubric is a tool used in writing assessment that can be used in several writing contexts. A rubric consists of a set of criteria or descriptions that guides a rater to score or grade a writer. The origins of rubrics can be traced to early attempts in education to standardize and scale writing in the early 20th century. Ernest C Noyes argues in November 1912 for a shift toward assessment practices that were more science-based. One of the original scales used in education was developed by Milo B. Hillegas in A Scale for the Measurement of Quality in English Composition by Young People. This scale is commonly referred to as the Hillegas Scale. The Hillegas Scale and other scales used in education were used by administrators to compare the progress of schools.

In 1961, Diederich, French, and Carlton from the Educational Testing Service (ETS) publish Factors in Judgments for Writing Ability a rubric compiled from a series of raters whose comments were categorized and condensed into a five-factor rubric:

- Ideas: relevance, clarity, quantity, development, persuasiveness
- Form: Organization and analysis
- Flavor: style, interest, sincerity
- Mechanics: specific errors in punctuation, grammar, etc.
- Wording: choice and arrangement of words

As rubrics began to be used in the classroom, teachers began to advocate for criteria to be negotiated with students to have students stake a claim in the how they would be assessed. Scholars such as Chris Gallagher and Eric Turley, Bob Broad, and Asao Inoue (among many) have advocated that effective use of rubrics comes from local, contextual, and negotiated criteria.

Criticisms:

The introduction of the rubric has stirred debate among scholars. Some educators have argued that rubrics rest on false objective claims and thus rest on subjectivity. Eric Turley and Chris Gallagher argued that state-imposed rubrics are a tool for accountability rather than improvements. Many times rubrics originate outside of the classroom from authors with no relation to the students themselves and they are then interpreted and adapted by other educators. Turley and Gallagher note that "the law of distal diminishment says that any educational tool becomes less instructionally useful -- and more potentially damaging to educational integrity -- the further away from the classroom it originates or travels to." They go on to say it is to be interpreted as a tool for writers to measure a set of consensus values, not to be substituted for an engaged response.

A study by Stellmack et al evaluated the perception and application of rubrics with agreed upon criteria. The results found that when different graders evaluated the same draft, the grader who had already given feedback previously was more likely to note improvement. The researchers concluded that a rubric that had higher reliability would result in greater results to their "review-revise-resubmit procedure".

Anti Rubric: Rubrics both measure the quality of writing, and reflect an individual's beliefs of what a department or particular institution’s rhetorical values. But rubrics lack detail on how an instructor may diverge from their these values. Bob Broad notes that an example of an alternative proposal to the rubric is the “dynamic criteria mapping.”

The single standard of assessment raises further questions, as Elbow touches on the social construction of value in itself. He proposes a communal process stripped of the requirement for agreement, would allow the class “see potentialagreements – unforced agreements in their thinking – while helping them articulate where they disagree.” He proposes that grading could take a multidimensional lens where the potential for ‘good writing’ opens. He points out that in doing so, a singular dimensional rubric attempts to assess a multidimensional performance.

===Multiple-choice test===
Multiple-choice tests contain questions about usage, grammar, and vocabulary. Arthur Applebee adds that multiple-choice tests are a valid and reliable way of testing specific aspects of writing such as mentioned earlier, vocabulary. These specific types of tests should be used in conjunction to other written assessments rather than independently. Standardized tests like the SAT, ACT, and GRE are typically used for college or graduate school admission. Other tests, such as Compass and Accuplacer, are typically used to place students into remedial or mainstream writing courses.

===Automated essay scoring===

Automated essay scoring (AES) is the use of non-human, computer-assisted assessment practices to rate, score, or grade writing tasks. Early on people used syntax as a way for computers to judge writing. It would prove to be a simple way for computers to evaluate essays. Automated essay scoring through computers made assessing and grading essays time and money efficient.

Some software programs will take previously submitted essays to the same prompt as a control to compare new essays. Other programs like the Intelligent Essay Assessor judge how correct and factual the information is within essays. The purpose was to score and test knowledge rather than other complexities of writing such as syntax and tone.

==== Criticisms ====
Writing processors like Google Docs and Microsoft Word are used for automated essay scoring and allow the writing process to be done in a simplified manner. They have various tools to fix grammar and spelling, which allows editing to be done in an easier way compared to the traditional pencil and paper. However, there are discrepancies between the level of access to the Internet. Not every student is able to complete computer-based writing assessments to the same degree as some are well-versed in using computers while others may have issues with typing.

Computerized scoring is criticized for not being able to understand the intricacies and nuances of writing. Often it breaks down essays into smaller parts and grades them mathematically rather than the essay as a whole. The consequence is that essays like narratives will receive a much lower score because it cannot take into account the choices and personality behind those written choices.

==Race==
Some scholars in writing assessment focus their research on the influence of race on the performance on writing assessments. Scholarship in race and writing assessment seek to study how categories of race and perceptions of race continues to shape writing assessment outcomes. However, some scholars in writing assessment recognize that racism in the 21st century is no longer explicit, but argue for a 'silent' racism in writing assessment practices in which racial inequalities in writing assessment are typically justified with non-racial reasons. Some scholars argue that the current grading system and education system is ingrained with whiteness. These scholars advocate for new developments in writing assessment, in which the intersections of race and writing assessment are brought to the forefront of assessment practices. Scholars advocate for these new writing assessments to consider the circumstances and context of the learner and the examiner.

==See also==

- Assessing Writing
- Concurrent validity
- Content validity
- External validity
- Face validity
- Predictive validity
- Rhetoric
