Before taking this course (ED 615), I
had never heard of the words “authentic assessment,” “formative assessment” or
“summative assessment.” However, I do remember the first time I heard the
phrase “performance task.” It was my third year working with the Saturday Math
Academy and my “co-teacher” was telling me that she was in a Master’s program
and she wanted to give our students a performance task to complete. She had
been learning all about these performance tasks in her night classes and she
insisted that it was the next “big thing” that the students would be learning
in their home schools. I was hesitant because I did not know anything about
performance tasks and I felt “out of the loop.” In fact, from what she
explained, it seemed like a waste of time and I was anxious to give the
students the material in the traditional methods that we had been using. That semester
was my last semester with the Saturday Math Academy. Soon after, I started
hearing about the “common core” and once again, the phrase “performance task”
reemerged.
In authentic assessment, the teacher
bases the students’ grades on how well they perform a real-world (or
“authentic”) task which demonstrates their learning through their application
of knowledge. Authentic assessment differs from traditional assessment in
several aspects. Traditional assessment usually entails a student selecting a
response (such as a multiple choice, true-false or matching exercise), compared
to authentic assessment which finds its basis upon students performing a complex
task which demonstrates meaningful application. Traditional assessment stems
from a contrived scenario that the teacher makes up for convenience of giving
grades. Authentic assessment is rooted in real-life situations in which people
currently encounter. Traditional assessment relies upon students recognizing or
recalling and regurgitating information that the teacher has shown in the past.
Authentic assessment, however, focuses on the students’ construction and
application of knowledge. Traditional assessments are largely
teacher-structured, whereas authentic assessments are student-driven. Lastly,
traditional assessments provide indirect evidence of student learning, while
authentic assessments give direct evidence that students understand. One model
defines four types of questions which an instructor should use in authentic
assessments: (1) analytical, (2) creative, (3) practical, and (4) wisdom-based.
(Sternberg, 2007) By asking students questions with these
themes in mind, the teacher can elicit a deeper understanding from the students
and cause them to think on a more critical or higher level. Instead of simply
asking the student to give the correct answer to a math problem, the instructor
might ask the student to create his own problem using the knowledge he gained
from a previous activity (creative) or he might question the student about how
arriving at an incorrect answer would affect someone in the real world
(wisdom).
There are various alternative names
for authentic assessment. Many people call it performance-based assessment because
teachers grade students on how well they “perform” the assigned task. Direct
assessment gets its name from the fact that the students give direct evidence
that they can apply their learning. This contrasts with traditional forms of
assessment such as a multiple choice test, for instance, which would only give
us indirect evidence of student learning. We could imply that the student
achieved the correct answer through knowledge, but he could have also guessed
the correct answer. When we observe a student performing a task, we draw
conclusions about his knowledge based on direct assessment. Lastly, some people
refer to authentic assessment as alternative assessment because it is an
alternative to traditional assessment. (Meuller, 2014)
Formative assessment describes the
majority of “low-stakes” tests that students receive in class which give the
teacher a chance to provide students with immediate feedback on their work.
These may include a short quiz, a class discussion, or a sentence demonstrating
understanding of a main topic. The results of the formative assessment help the
teacher and students to measure student progress as well as give the teacher an
idea of how to modify the lesson in terms of what needs “re-teaching” and what
concepts the students have already mastered.
Summative assessment is the opposite
of formative assessment. It essentially gives a summary or level of proficiency
that the students have learned at the end of an instructional unit compared to
a standard or benchmark. It includes high-stakes tests such as the SATs and the
PARCC exam. Other examples include giving a grade on a final exam or assessing
a student for a senior recital. A student can use a summative assessment as a
formative assessment, however, if students use the results of the summative
assessment to guide their future learning.
I believe authentic assessment would
be beneficial to student learning because in the real world, we do not always have
multiple choices from which we can choose. We typically have to do something or
perform a task to prove that we know a topic. Music and athletics are a great
example of this. Typically, the coach will not allow his players to perform in
a real game unless they have demonstrated their competency through practice of
the sport. It is very rare that the coach gives a player a written test and
then allows the player to be a first string player without any evidence of the
player’s performance. In addition, music schools usually require a performance
or audition to gain acceptance into the school. There are instances where we
must complete both traditional as well as authentic assessment. One example of
this is with the driving test. There is a written portion and a driving
portion. We must pass both parts of the test in order to earn our driver’s
license. However, what would our roads look like if there were not a
performance-based test portion of the driving exam? The roads would be a lot
less safe. The portion of the test where the driver has to prove his knowledge
of driving a car through actual driving is crucial to demonstrating mastery of
the subject. Teacher’s educational programs incorporate student teacher
experiences, nurses and counselors must undergo clinical exams, and doctors pursue
residencies. There are many situations in real life in which people must apply
their knowledge to show evidence of their competency. By starting this practice
in K-12 schools, we are better preparing our students for what they will
encounter as adults.
Authentic assessment could be helpful
in achieving educational standards such as the common core for several reasons.
The common core curriculum holds students to a higher level of thinking by
requiring them to engage in real-world situations and problem-based learning. The
PARCC exam, which was created to correspond to the common core, incorporates
authentic assessment. Instead of a traditional test which might ask a student,
“What is the color of Mary’s hat?” the PARCC exam would ask the students, “Based
on your knowledge of Mary from your reading of the text, what type of hat would
Mary wear.” This type of question gives students a chance to demonstrate their
creativity and the open-endedness of the question promotes critical thinking.
Studies have shown that authentic
assessment actually increases performance of minority and low-income students. (Sternberg, 2007) Some students tend
to do better on multiple choice questions because of strategies they have
learned, not necessarily demonstrating their knowledge. Thus, traditional
assessments show bias towards these students. Authentic assessment better
predicts student success once they get to college because by using the higher
level thinking early on, the students are more used to thinking in this manner
and it becomes more comfortable for them. Since different groups learn in
different ways, authentic assessment promotes cultural and ethnic strengths.
(i.e. Native Americans scoring higher on the storytelling portions of the
test.) By removing ethnic bias, more minority students will successfully gain
acceptance into selective colleges. (Sternberg, 2007) We will be assessing students in the
ways in which they learn best. More students will have a chance to do well
instead of a select few because traditional assessment gauges such a narrow
“window” of student achievement.
Some people say that authentic
assessment is like planning backwards, because you are starting with the end
result in mind, but to me it seems like it is using logic and planning
forward. After researching authentic
assessment, it is clear to me that this is what we should be doing to best
prepare our students for success in their futures. In fact, we probably should
have started testing this way about 30 years ago. The world is changing and in
order to succeed in our current world as well as the future, students will need
to broaden their thinking, know how to solve problems, and work collaboratively
to achieve solutions. Authentic assessments emphasize and cultivate this type
of thinking.
In
creating any type of assessment, two very important topics that one should
consider and address are the reliability and the validity of the assessments.
When I was in high school, my teacher told us that the car mechanic service
called “Precision Tune” should actually have the name “Accuracy Tune.”
Precision, much like reliability in the field of educational assessment, just
means that someone completes an activity the same way every time, whether or
not it is the correct way of doing it. Accuracy, like validity in the field of
educational assessment, refers to getting something such as an answer or a
procedure as close as possible to the desired outcome. When getting one’s car
fixed, one would prefer to have the car fixed properly each time instead of
repeatedly having it fixed incorrectly.
Reliability
falls into the same category as precision. Reliability is synonymous with
consistency. When a teacher administers an assessment, she will know the scores
are reliable when she gets the same results over time. If a test generates one
outcome on Tuesdays and a different outcome on Fridays or varies from class to
class or year to year, then the scores are not reliable. This measure is termed
stability reliability. Another kind of reliability is alternate reliability,
which means that different versions or forms of one test give the same results.
This is important because many times teachers will administer different forms
of a test to prevent cheating. Different students within one class may receive
two different forms of the same test or different classes may receive different
forms. The last kind of reliability is internal consistency. This form compares
the answer of each test item to the results of the test as a whole. Thus, the
students who scored the highest on the test should receive the highest grade on
each test item. In one instance, a teacher used the internal consistency test
to determine that the students who scored the highest overall on the test did
not score as well as the rest of the class on three particular test items. This
proved to the teacher that she must have confused the brightest students on one
particular subtopic of the test. Another aspect of reliability is the standard
error of measurement (SEM). This feature allows the teacher to determine an
individual student’s consistency of performance instead of group performance.
If the teacher gives the same test to the same student repeatedly, she should
be able to estimate how that student would fare. The teacher can use the SEM as
a predictor instead of actually administering and re-administering the test. The
ideal SEM is the smallest value. Alternatively, validity determines whether the test measures
what it says it should measure. One distinction that David Frisbie notes in his
article, ‘Measurement 101: Some Fundamentals Revised’ is that “Reliability is a
property of a set of scores, not of the assessment that produced the scores.” (Frisbie, 2005)
There
are three types of validity measures: content, criterion, and construct
validity. Content validity refers to whether the test given actually measures
the content that the teacher intended. One example of when there was not
content validity is last year when students across the state of Maryland took
the MSA test, but their curriculum had changed to the Common Core State
Standard curriculum. Thus, the content that the students learned did not match
up with the test that they took and there was not content-related evidence of
validity.
Criterion
validity, much like a predictor test such as the SAT or ACT, measures how well
we think a student might fare at some specific criterion in the future. The
usual criteria include grades, GPA, on-the-job performance ratings,
citizenship, etc. However, these are not always accurate and can
sometimes only account for a portion of the reasons for why a student performs
a certain way.
Construct
validity simply occurs when a teacher makes a hypothesis about how a student
will perform on a test, collects empirical data on that student and then proceeds
to determine whether the hypothesis was correct. Teachers execute this via a
series of tests as opposed to one big test. The three types of strategies for
this include (1) intervention studies (a test given after a major intervention),
(2) differential-population studies (comparing populations based on one
changing variable), and (3) related-measures studies (giving a second test whose
results either support or deny a different but related, previously administered
test).
Another aspect
of assessments on which I had not previously reflected is the absence-of-bias
in classroom assessments that teachers personally create. I have always heard
about bias in high-stakes tests like the SAT. I have read articles about how
the vocabulary in the SAT and similar standardized tests present favor towards
middle-class students. For example, there may be an analogy question on the SAT
using the word “mast” where knowing the definition of the word is crucial to
getting the correct answer on the analogy. I can see how a student who frequently
goes on boats with his parents, who may even own a boat, will have an advantage
over another student who has never been on a boat before in his life, has never
had a reason to learn the word “mast” and thus has no idea what the word means.
Therefore, it follows that a teacher could create that same bias with a test
that she created for her students’ use. I like the suggestion that Popham gives
in the text. He says that teachers should first proofread their own tests from
the perspective of the students in the class. Then, especially for major assessments, select
other teachers who ethnically resemble the students in the class and allow
these teachers to review the assessments for bias. This is, of course, with
keeping in mind that even though the teacher and the students may be the same
culturally, they still may have had completely different experiences. (Popham, 2014) Another challenging part
of assessment bias for a new teacher is with administering performance tasks. Because
bias can span across paper tests as well as other types of assessments, it is
so important for teachers to be aware of every type of assessment that we give
in class. When we give performance tasks, we typically are asking students to make
decisions based on their background knowledge. In doing so, we make assumptions
about what the students have learned before. There is a large window of
opportunity for bias to creep in if we are not careful. I am definitely more
aware now and will be on the lookout for bias when I create my assessments in
the future.
Since
working in education, I have always been sensitive to students with disabilities
and English Language Learners (ELLs) or as we call them in Howard County, ESOL
students (English Speakers of Other Languages). Yet, I had not thought about
the presence of bias with their test items. It is clearer to me now that allowing
these students the same accommodations during testing that they use in the classroom
during regular instruction is a good idea to reduce bias. These accommodations might
include giving the student a dictionary to use during the test, allowing them
more time to complete the test, and possibly even letting them take the test
in a different classroom. All of these measures
are positive steps to give the students a more level playing ground.
The last
major topic that I learned about assessments since taking this course is improving
teacher-created assessments via judgmentally-based and empirically-based methods.
The judgmentally based improvement is very straight-forward. Teachers will use
either their own judgment, a colleague’s judgment, or their student’s judgment
of the assessment. Popham gives systematic procedures for the criteria teachers
can use to implement these first two strategies. He cautions teachers- with the
last strategy- to give students the assessment first and then allow them to
take a short survey about how they would improve the assessment for next time,
as opposed to asking the students to do both simultaneously. The judgmentally
based improvement procedures work best with selected response or constructed
response test items.
The
empirically-based improvement, used mostly with selected-response tests, is a
little tedious, but gives good information. Using some quick formulas, a
teacher can determine the difficulty (p value) of each item on her assessment,
as well as how efficient an item is (the discrimination index, D). This information
can demonstrate to a teacher how well her students understand her instruction. After
calculating the p-value and the discrimination index, teachers can employ a distracter-analysis
to delve deeper into the student answers to investigate which test items she
needs to revamp.
Before
taking this Assessment class, I had never considered how much effort goes into
creating assessments and analyzing their results. Assessments come in so many
different forms and have so many layers and complexities that I had never
realized previously. This is especially true for the empirical side of it,
which includes numerous formulas and calculations, which I never knew existed. One
can determine so much from the results of an assessment. The field of education
is certainly a dynamic one. I give lots of credit to teachers and other
educators who create and analyze the assessments that students take every day.
I feel that my philosophy and my perspective have broadened quite a bit since
the beginning of this course. I am more aware of all of the possibilities with
testing and I am even more concerned that I will do the right thing for my
students with minimizing bias towards them and giving them the fairest chance
that I can give.
References
Holler, E. W., Gareis, C. R., Martin, J., Clouser,
A., & Miller, S. (2008). Teacher-made assessments: getting them right. Principal
Leadership, 60-64.
Meuller, J. (2014). What is Authentic
Assessment? Retrieved from Authentic Assessment Toolbox:
http://jfmueller.faculty.noctrl.edu/toolbox/whatisit.htm#names
Popham, W. J. (2014). Classroom
Assessment. Upper Saddle River, NJ: Pearson Education, Inc.
Sternberg, R. J. (2007). Assessing What
Matters. Educational Leadership, 20-26.
No comments:
Post a Comment