Sunday, December 14, 2014

Mini Research Project - The Effect of Positive Cultural Images on Test Scores of Low-Income Minority Students

View my Mini-Research Project: http://prezi.com/57aqrjejb1br/?utm_campaign=share&utm_medium=copy&rc=ex0share

Personal Assessment Philosophy





     Before taking this course (ED 615), I had never heard of the words “authentic assessment,” “formative assessment” or “summative assessment.” However, I do remember the first time I heard the phrase “performance task.” It was my third year working with the Saturday Math Academy and my “co-teacher” was telling me that she was in a Master’s program and she wanted to give our students a performance task to complete. She had been learning all about these performance tasks in her night classes and she insisted that it was the next “big thing” that the students would be learning in their home schools. I was hesitant because I did not know anything about performance tasks and I felt “out of the loop.” In fact, from what she explained, it seemed like a waste of time and I was anxious to give the students the material in the traditional methods that we had been using. That semester was my last semester with the Saturday Math Academy. Soon after, I started hearing about the “common core” and once again, the phrase “performance task” reemerged.
     In authentic assessment, the teacher bases the students’ grades on how well they perform a real-world (or “authentic”) task which demonstrates their learning through their application of knowledge. Authentic assessment differs from traditional assessment in several aspects. Traditional assessment usually entails a student selecting a response (such as a multiple choice, true-false or matching exercise), compared to authentic assessment which finds its basis upon students performing a complex task which demonstrates meaningful application. Traditional assessment stems from a contrived scenario that the teacher makes up for convenience of giving grades. Authentic assessment is rooted in real-life situations in which people currently encounter. Traditional assessment relies upon students recognizing or recalling and regurgitating information that the teacher has shown in the past. Authentic assessment, however, focuses on the students’ construction and application of knowledge. Traditional assessments are largely teacher-structured, whereas authentic assessments are student-driven. Lastly, traditional assessments provide indirect evidence of student learning, while authentic assessments give direct evidence that students understand. One model defines four types of questions which an instructor should use in authentic assessments: (1) analytical, (2) creative, (3) practical, and (4) wisdom-based. (Sternberg, 2007) By asking students questions with these themes in mind, the teacher can elicit a deeper understanding from the students and cause them to think on a more critical or higher level. Instead of simply asking the student to give the correct answer to a math problem, the instructor might ask the student to create his own problem using the knowledge he gained from a previous activity (creative) or he might question the student about how arriving at an incorrect answer would affect someone in the real world (wisdom).
     There are various alternative names for authentic assessment. Many people call it performance-based assessment because teachers grade students on how well they “perform” the assigned task. Direct assessment gets its name from the fact that the students give direct evidence that they can apply their learning. This contrasts with traditional forms of assessment such as a multiple choice test, for instance, which would only give us indirect evidence of student learning. We could imply that the student achieved the correct answer through knowledge, but he could have also guessed the correct answer. When we observe a student performing a task, we draw conclusions about his knowledge based on direct assessment. Lastly, some people refer to authentic assessment as alternative assessment because it is an alternative to traditional assessment. (Meuller, 2014)
     Formative assessment describes the majority of “low-stakes” tests that students receive in class which give the teacher a chance to provide students with immediate feedback on their work. These may include a short quiz, a class discussion, or a sentence demonstrating understanding of a main topic. The results of the formative assessment help the teacher and students to measure student progress as well as give the teacher an idea of how to modify the lesson in terms of what needs “re-teaching” and what concepts the students have already mastered.
     Summative assessment is the opposite of formative assessment. It essentially gives a summary or level of proficiency that the students have learned at the end of an instructional unit compared to a standard or benchmark. It includes high-stakes tests such as the SATs and the PARCC exam. Other examples include giving a grade on a final exam or assessing a student for a senior recital. A student can use a summative assessment as a formative assessment, however, if students use the results of the summative assessment to guide their future learning.
      I believe authentic assessment would be beneficial to student learning because in the real world, we do not always have multiple choices from which we can choose. We typically have to do something or perform a task to prove that we know a topic. Music and athletics are a great example of this. Typically, the coach will not allow his players to perform in a real game unless they have demonstrated their competency through practice of the sport. It is very rare that the coach gives a player a written test and then allows the player to be a first string player without any evidence of the player’s performance. In addition, music schools usually require a performance or audition to gain acceptance into the school. There are instances where we must complete both traditional as well as authentic assessment. One example of this is with the driving test. There is a written portion and a driving portion. We must pass both parts of the test in order to earn our driver’s license. However, what would our roads look like if there were not a performance-based test portion of the driving exam? The roads would be a lot less safe. The portion of the test where the driver has to prove his knowledge of driving a car through actual driving is crucial to demonstrating mastery of the subject. Teacher’s educational programs incorporate student teacher experiences, nurses and counselors must undergo clinical exams, and doctors pursue residencies. There are many situations in real life in which people must apply their knowledge to show evidence of their competency. By starting this practice in K-12 schools, we are better preparing our students for what they will encounter as adults.
     Authentic assessment could be helpful in achieving educational standards such as the common core for several reasons. The common core curriculum holds students to a higher level of thinking by requiring them to engage in real-world situations and problem-based learning. The PARCC exam, which was created to correspond to the common core, incorporates authentic assessment. Instead of a traditional test which might ask a student, “What is the color of Mary’s hat?” the PARCC exam would ask the students, “Based on your knowledge of Mary from your reading of the text, what type of hat would Mary wear.” This type of question gives students a chance to demonstrate their creativity and the open-endedness of the question promotes critical thinking.
     Studies have shown that authentic assessment actually increases performance of minority and low-income students. (Sternberg, 2007) Some students tend to do better on multiple choice questions because of strategies they have learned, not necessarily demonstrating their knowledge. Thus, traditional assessments show bias towards these students. Authentic assessment better predicts student success once they get to college because by using the higher level thinking early on, the students are more used to thinking in this manner and it becomes more comfortable for them. Since different groups learn in different ways, authentic assessment promotes cultural and ethnic strengths. (i.e. Native Americans scoring higher on the storytelling portions of the test.) By removing ethnic bias, more minority students will successfully gain acceptance into selective colleges. (Sternberg, 2007) We will be assessing students in the ways in which they learn best. More students will have a chance to do well instead of a select few because traditional assessment gauges such a narrow “window” of student achievement.
     Some people say that authentic assessment is like planning backwards, because you are starting with the end result in mind, but to me it seems like it is using logic and planning forward.  After researching authentic assessment, it is clear to me that this is what we should be doing to best prepare our students for success in their futures. In fact, we probably should have started testing this way about 30 years ago. The world is changing and in order to succeed in our current world as well as the future, students will need to broaden their thinking, know how to solve problems, and work collaboratively to achieve solutions. Authentic assessments emphasize and cultivate this type of thinking.
     In creating any type of assessment, two very important topics that one should consider and address are the reliability and the validity of the assessments. When I was in high school, my teacher told us that the car mechanic service called “Precision Tune” should actually have the name “Accuracy Tune.” Precision, much like reliability in the field of educational assessment, just means that someone completes an activity the same way every time, whether or not it is the correct way of doing it. Accuracy, like validity in the field of educational assessment, refers to getting something such as an answer or a procedure as close as possible to the desired outcome. When getting one’s car fixed, one would prefer to have the car fixed properly each time instead of repeatedly having it fixed incorrectly.
     Reliability falls into the same category as precision. Reliability is synonymous with consistency. When a teacher administers an assessment, she will know the scores are reliable when she gets the same results over time. If a test generates one outcome on Tuesdays and a different outcome on Fridays or varies from class to class or year to year, then the scores are not reliable. This measure is termed stability reliability. Another kind of reliability is alternate reliability, which means that different versions or forms of one test give the same results. This is important because many times teachers will administer different forms of a test to prevent cheating. Different students within one class may receive two different forms of the same test or different classes may receive different forms. The last kind of reliability is internal consistency. This form compares the answer of each test item to the results of the test as a whole. Thus, the students who scored the highest on the test should receive the highest grade on each test item. In one instance, a teacher used the internal consistency test to determine that the students who scored the highest overall on the test did not score as well as the rest of the class on three particular test items. This proved to the teacher that she must have confused the brightest students on one particular subtopic of the test. Another aspect of reliability is the standard error of measurement (SEM). This feature allows the teacher to determine an individual student’s consistency of performance instead of group performance. If the teacher gives the same test to the same student repeatedly, she should be able to estimate how that student would fare. The teacher can use the SEM as a predictor instead of actually administering and re-administering the test. The ideal SEM is the smallest value. Alternatively, validity determines whether the test measures what it says it should measure. One distinction that David Frisbie notes in his article, ‘Measurement 101: Some Fundamentals Revised’ is that “Reliability is a property of a set of scores, not of the assessment that produced the scores.” (Frisbie, 2005)
     There are three types of validity measures: content, criterion, and construct validity. Content validity refers to whether the test given actually measures the content that the teacher intended. One example of when there was not content validity is last year when students across the state of Maryland took the MSA test, but their curriculum had changed to the Common Core State Standard curriculum. Thus, the content that the students learned did not match up with the test that they took and there was not content-related evidence of validity.
Criterion validity, much like a predictor test such as the SAT or ACT, measures how well we think a student might fare at some specific criterion in the future. The usual criteria include grades, GPA, on-the-job performance ratings, citizenship, etc.   However, these are not always accurate and can sometimes only account for a portion of the reasons for why a student performs a certain way.
     Construct validity simply occurs when a teacher makes a hypothesis about how a student will perform on a test, collects empirical data on that student and then proceeds to determine whether the hypothesis was correct. Teachers execute this via a series of tests as opposed to one big test. The three types of strategies for this include (1) intervention studies (a test given after a major intervention), (2) differential-population studies (comparing populations based on one changing variable), and (3) related-measures studies (giving a second test whose results either support or deny a different but related, previously administered test).
Another aspect of assessments on which I had not previously reflected is the absence-of-bias in classroom assessments that teachers personally create. I have always heard about bias in high-stakes tests like the SAT. I have read articles about how the vocabulary in the SAT and similar standardized tests present favor towards middle-class students. For example, there may be an analogy question on the SAT using the word “mast” where knowing the definition of the word is crucial to getting the correct answer on the analogy. I can see how a student who frequently goes on boats with his parents, who may even own a boat, will have an advantage over another student who has never been on a boat before in his life, has never had a reason to learn the word “mast” and thus has no idea what the word means. Therefore, it follows that a teacher could create that same bias with a test that she created for her students’ use. I like the suggestion that Popham gives in the text. He says that teachers should first proofread their own tests from the perspective of the students in the class.  Then, especially for major assessments, select other teachers who ethnically resemble the students in the class and allow these teachers to review the assessments for bias. This is, of course, with keeping in mind that even though the teacher and the students may be the same culturally, they still may have had completely different experiences. (Popham, 2014) Another challenging part of assessment bias for a new teacher is with administering performance tasks. Because bias can span across paper tests as well as other types of assessments, it is so important for teachers to be aware of every type of assessment that we give in class. When we give performance tasks, we typically are asking students to make decisions based on their background knowledge. In doing so, we make assumptions about what the students have learned before. There is a large window of opportunity for bias to creep in if we are not careful. I am definitely more aware now and will be on the lookout for bias when I create my assessments in the future.
     Since working in education, I have always been sensitive to students with disabilities and English Language Learners (ELLs) or as we call them in Howard County, ESOL students (English Speakers of Other Languages). Yet, I had not thought about the presence of bias with their test items. It is clearer to me now that allowing these students the same accommodations during testing that they use in the classroom during regular instruction is a good idea to reduce bias. These accommodations might include giving the student a dictionary to use during the test, allowing them more time to complete the test, and possibly even letting them take the test in  a different classroom. All of these measures are positive steps to give the students a more level playing ground.
     The last major topic that I learned about assessments since taking this course is improving teacher-created assessments via judgmentally-based and empirically-based methods. The judgmentally based improvement is very straight-forward. Teachers will use either their own judgment, a colleague’s judgment, or their student’s judgment of the assessment. Popham gives systematic procedures for the criteria teachers can use to implement these first two strategies. He cautions teachers- with the last strategy- to give students the assessment first and then allow them to take a short survey about how they would improve the assessment for next time, as opposed to asking the students to do both simultaneously. The judgmentally based improvement procedures work best with selected response or constructed response test items.
The empirically-based improvement, used mostly with selected-response tests, is a little tedious, but gives good information. Using some quick formulas, a teacher can determine the difficulty (p value) of each item on her assessment, as well as how efficient an item is (the discrimination index, D). This information can demonstrate to a teacher how well her students understand her instruction. After calculating the p-value and the discrimination index, teachers can employ a distracter-analysis to delve deeper into the student answers to investigate which test items she needs to revamp.
     Before taking this Assessment class, I had never considered how much effort goes into creating assessments and analyzing their results. Assessments come in so many different forms and have so many layers and complexities that I had never realized previously. This is especially true for the empirical side of it, which includes numerous formulas and calculations, which I never knew existed. One can determine so much from the results of an assessment. The field of education is certainly a dynamic one. I give lots of credit to teachers and other educators who create and analyze the assessments that students take every day. I feel that my philosophy and my perspective have broadened quite a bit since the beginning of this course. I am more aware of all of the possibilities with testing and I am even more concerned that I will do the right thing for my students with minimizing bias towards them and giving them the fairest chance that I can give.



References

Holler, E. W., Gareis, C. R., Martin, J., Clouser, A., & Miller, S. (2008). Teacher-made assessments: getting them right. Principal Leadership, 60-64.
Meuller, J. (2014). What is Authentic Assessment? Retrieved from Authentic Assessment Toolbox: http://jfmueller.faculty.noctrl.edu/toolbox/whatisit.htm#names
Popham, W. J. (2014). Classroom Assessment. Upper Saddle River, NJ: Pearson Education, Inc.
Sternberg, R. J. (2007). Assessing What Matters. Educational Leadership, 20-26.



Sunday, October 26, 2014

ED 615: Reflective Essay Blog Post




I have completed several standardized tests during the course of my life. In elementary school, for several years in a row, I completed a test called the California Achievement Test. Even though I went to school in Maryland, we all took the California Achievement Test because Maryland did not have its own state test at the time. This test was our equivalent to the MSA or PARCC test that elementary students take now.
In high school, I took three Advanced Placement tests – one in Calculus, English, and French. I earned a score of “3” out of “5” on each test. These tests translated to 14 college credits towards my undergraduate experience at UMBC and actually allowed me to graduate on time. In high school, I also took the PSAT and ACT one time each, in addition to the SAT, which I took at least two times. Because the College Board (makers of the SAT) combined the highest scores from the English and Math sections- across different test dates-  to arrive at the student’s final score, I was able to maximize my score by taking the test multiple times. I have also taken the GRE for entrance into graduate school, but I did not fare as well on that test; I only took it once.
I do not feel that taking the tests completely demonstrated evidence of learning. This is especially true with the SAT. In high school after my first time and before my second attempt at the SAT, I enrolled in an after-school coach class that my homeroom teacher (who was also a math teacher) provided. What I learned most from the SAT prep course were strategies for how to approach each type of test question. It was very effective, but it did not help me display the content that I had learned during my regular math class.
            This example shows the problem with traditional standardized tests. Teachers can begin to “teach to the test.” On the opposite spectrum, teachers’ variability in instruction can impose limitations on a student’s performance.  The instructional sensitivity article shows that not all standardized tests actually generate the results that they set out to produce. (D'Agostino, Corson, & Welsh, 2007) The study in the article demonstrated that students whose teachers taught them in the same way in which the standardized tests were structured scored higher on their tests. This all stemmed from a case in 1981 (Debra P. v. Turlington) where a student brought a claim against the state of Florida asserting that students of color were not taught the material that was included on the state’s minimum basic achievement test. I have seen a similar example first hand in the middle school where I work. Two years ago, when I first started working there, I co-proctored a test for a class of 6th grade students who were taking the Maryland State Assessment (MSA). Several of the students either asked me to read the word “diagonal” for them from the test or asked me what the word “diagonal” meant. If so many of the students had trouble with this one, very important word, how many other points were they missing? Clearly, these students had not learned everything that they needed in order to be prepared for the standardized test that they were taking. The fact that the school had low test scores for several years in a row made one question how well the classroom instruction was matching up with the test content. This again occurred last year on a more wide-scale basis when all of the students in Maryland had to take the MSA again, but the curriculum had changed to the common core. Thus, the test did not reflect what the students were learning in their curriculum.
            I think what motivated the development of standardized testing was teachers, parents, policy-makers, and community members wanting to know how well students from different regions compared, in terms of acquired knowledge. States could then use that data to make funding decisions based on how much help the students needed to reach pre-set standards. By allowing one organization to make a test for everyone, states could eliminate the bias that each teacher would have imposed had she created her own tests.
The assumption is that giving the same test under the same conditions to everyone normalizes everything. However, we now know that various other factors come into play, including the knowledge base of the teacher giving the instruction, the teaching style of the teacher and prior knowledge of the students. For these reasons, standardized tests do not always produce the desired results that their administrators set out to achieve.





References

Balf, T. (2014, November). A Smarter, Fairer SAT. Popular Science, p. 30.
Black, P., & Wiliam, D. (1998, October). Inside the Black Box: Raising standards through classroom assessment. Phi Delta Kappan, 139-148.
D'Agostino, J. V., Corson, N. M., & Welsh, M. E. (2007). Instructional Sensitvity of a State's Standards-Based Assessment. Educational Assessment, 12(1), 1-22.
Popham, W. J. (2014). Classroom Assessment. Upper Saddle River, NJ: Pearson Education, Inc.