MarkBook®   Appendix  A - 1
The Assessment & Evaluation Processes 

As teachers, we frequently take the assessment and evaluation process for granted. Many of us were never trained in A&E methodologies. Or, we received crude instruction at best. This introductory chapter looks at types and models of assessment & evaluation, and makes some recommendations about how to use them to best advantage. Throughout this introductory chapter and in your current practices, we strongly recommend that you seek answers to the following questions.

What is your personal view of assessment & evaluation?
What reporting requirements does your educational system impose on you?
What plans do you have to improve your assessment & evaluation practices?

Hopefully, you will have answers BEFORE you begin the A&E process. Appendix A-3 looks at several A&E models that suggest ways of using MarkBook to meet mandated requirements and to satisfy your personal approach.


WHY DO WE EDUCATE PEOPLE?

We believe that education promotes future success. And we have lots of evidence supporting that belief. Research demonstrates that well-educated people live longer, are happier, earn greater incomes, need less social assistance, and contribute more to society. All of these are measures of "social success". Certainly, there are examples of poorly-educated persons who became very successful. However, the correlation between education and future success is so strong that society mandates education for all individuals. We want our children to be successful. By promoting personal growth in all individuals, society can reliably predict that future success.

By law, children are educated up to a certain age. But they're not the only ones being educated. Educational processes are very common outside the formal education system. For instance, employers require employees to pass training courses. A bank knows that a teller who has been trained on how to use the bank's software will be far more efficient and competent that one who cannot use the software. Similarly, immigrants must pass language courses and sometimes "culture/history" courses before being granted citizenship and receiving the attendant social benefits. In many jurisdictions, new drivers must complete a training course before being allowed to write a licensing exam and take a driving test. And so on!

Since society will be better if all citizens are well-educated, most societies provide the financial means to educate citizens of all ages. For twelve or more years of childhood education, the total cost is often a multiple of the parents' annual salaries. Obviously, societies want to educate individuals efficiently and as completely as possible. Unlike a century ago, the issues in education today are not about whether or not we should educate our citizens, they're about what we should teach them and how the funding for this process should work.

THE OBJECTIVE OF THE EDUCATION SYSTEM

Unlike the private business sector, which seeks profit, the objective of the education system is growth. We expect learners to grow in Cognitive Knowledge, in Cognitive Skills, in Affect, and in Psychomotor abilities. Bloom's Taxonomy of course!

HOW DO WE MEASURE AND REPORT STUDENT GROWTH?

Profits are easy to measure. They're in clearly-defined units called currency. Measuring profit is straight forward: Income less Expenses = Profit, as measured on a currency scale. Growth also has a simple formula: Finishing Position less Starting Position = Growth. Growth in a person's height is an example: 167cm this year less 152cm a year ago = 15cm of growth over the year. Growth in mass is similar: 81kg this year less 71kg a year ago = 10kg of growth over the year. Note in both cases that there is a standard measurement unit or scale.

Educational growth is not as easily measured. There are no standard scale units. We can't see it or count it. Instead, we look for evidence that growth has taken place using reliable or valid measurements. Such indices as the mark or score on a curriculum-based test, a portfolio of work, or a performance provides that evidence. Once a certain amount of growth has happened, we say that a learner has reached a defined stage (a large artificial 'milepost') and is given proof like a Graduation Certificate, a Degree or a Diploma. Consequently, we have the general understanding that a person with a Master's Degree should have more knowledge and skills than the holder of a High School Graduation Diploma who in turn has more than a person who just passed the last year of Primary School. We also create smaller measures of growth like 'credits'. Like all artificial measures of student growth, large and small, these have different meanings from one place to another and from one time period to another.

To measure any kind of growth, we have to have a reference. The height and mass examples above used centimeters and kilograms as references. Student growth should be measured against a curriculum reference. That is, growth should be measured as the degree of acquisition of the curriculum. Each course should have a well-defined curriculum listings specifying objectives (aka 'expectations'). For a measurement scale, we have created several artificial ones to quantify how well a learner has acquired those objectives.

One such scale uses letter grades, A B C D F. A learner who has a good grasp of the course' s curriculum gets an A. Conversely, one who did not acquire enough of the curriculum to warrant a pass (often a professional judgment), gets an F.  Similarly, there are level scales like R 1 2 3 4 with 4 highest, 7 6 5 4 3 2 1 with 1 highest, E S G N (excellent, good, satisfactory, needs improvement), a Grade Point Average with 4.0 as the highest, and P F (pass fail). There is a widely-used percentage scale with 100% as the highest. There is a percentile scale (not the same as a percentage) with 99+ as the highest. This last scale is normally used on standardized tests.

A grade of A or 100% doesn't mean that the learner has acquired 100% of the objectives. A grade of F doesn't mean that the learner hasn't learned anything. Instead, a number or letter is assigned that really communicates the quality of growth as opposed to the quantity. Unfortunately, not all jurisdictions interpret educational growth scales the same way. For instance, many systems use 50% as a pass. New York State in the USA uses 65% as a pass. Some others use 80% as a pass. Consequently, a learner who receives a 66% grade in the first jurisdiction is viewed as a satisfactory/competent learner, is viewed as a marginal learner in New York, and is regarded as a non-learner in the third. The way around this problem is to use levels and a criterion-referenced grading system.

ASSESSMENT vs EVALUATION

"Assessment" has two meanings. The first refers to a specific instrument of measure. Any test is an assessment. The second refers to the process of designing measurement instruments and then using them to gather critical appraisal data about each learner's progress. 

"Evaluation" is sometimes used as a synonym for assessment. However, "evaluation" implies an estimate of overall value whereas "assessment" implies a specific measure. Educators must perform both processes. Assess during the collection of achievement data and evaluate when determining the meaning or significance of the body of data collected. A teacher who is grading Mary's latest test to come up with a percentage score is assessing. At report card time, the teacher will gather all of Mary's assessment data and evaluate Mary's overall performance. An "assessment" is an individually measured item whereas an "evaluation" is an estimate of the overall merit of the collection of assessed items.

Evaluation requires professional judgment. It's not enough to look at a student's calculated arithmetic mean and blindly assign that number as the overall grade. Instead, educators must analyze the body of growth evidence to look for patterns of 'central tendency'. When evaluating, the educator must answer a question: "How well has this student acquired the curriculum?" if the student has acquired it very well, a top grade from any of the scales above should be assigned. Conversely, if the student has not acquired the curriculum, a low grade should be assigned.

WHY DO WE ASSESS AND EVALUATE LEARNERS?

The first and most important reason is to encourage learning or growth. There are four kinds of assessments described below and all should be used to promote growth.

Secondly, we assess to provide a reliable and valid measure of student achievement or attainment. We assess to prove that learning has taken place. Future placement in school, employment, etc. may flow from such a measure. As a society, we need to know which individuals are judged to be capable of certain future tasks and which individuals are not (yet?) capable.

Thirdly, we assess and evaluate for a host of political reasons. For instance, politicians and bureaucrats may have a need to confirm that the system is working, that reform is needed, that recent innovations are effective, that individual employees are working properly, or that their financial distributions have merit. Additionally, each education system needs feedback for its own long term planning. These measures have little to do with directly encouraging individual learning. However, they may contribute to general improvements in quality provided by the system.

THE FOUR KINDS OF ASSESSMENTS

There are four kinds of assessments and they should all be used to encourage individual learning and to prove that learning has taken place.

Diagnostic Assessment should be used to determine each learner's starting level of achievement.

Recall the general formula for growth above. This evaluation process requires that the educator figure out, at or near the beginning of each course or unit, and always prior to instruction, what the learners know and don't know as measured against what they should know upon completion of the curriculum objectives for that course or unit. A diagnostic assessment of knowledge and skills done early in a course will provide evidence of each learner's present status and needs. The scores earned on these tests should NEVER contribute towards the final overall grade in a course! Instead, they should form a guide for the educator in creating lessons which will advance the learners forward from their present positions. Diagnostic assessments should also measure student abilities on any pre-requisite skills. Additionally, if recorded, these diagnostic measurements provide proof that the students have learned because the scores should be much lower than the ones achieved on equivalent assessment instruments at the end of the course or unit.

Using a Diagnostic assessment, the educator may find that the students already know a concept in the current course and can demonstrate good skills with that concept before receiving instruction. In this circumstance, it's not necessary to re-teach the concept. However, the opposite is also true: learners may have unexpected gaps in their knowledge and skills. The educator must provide appropriate lessons to fill in those gaps if they are prerequisites for the current learnings. Otherwise, the current objectives/expectations cannot be met. For instance, an elementary arithmetic course expects the learners to master long division. However, a diagnostic assessment prior to teaching division determines that the learners cannot do subtraction. This is a pre-requisite skill for learning how to divide. Any immediate attempt to teach long division will be fruitless. Clearly, the diagnostic assessment has identified a need which must be met.

MarkBook records diagnostic scores. Give these a weight of zero so that they're not factored into calculations. Or, isolate them for reference in a separate Mark Set.


Formative Assessment should be used assist/encourage the growth process.

This assessment process provides feedback and direction to the learners so that they may improve their learnings. Again, marks or scores earned during Formative Assessment should not contribute significantly towards the overall final grade in the course. Instead, Formative Assessment should provide an opportunity for learners to experiment, to ask questions, to take risks, to receive analytical feedback, and to get a good idea of how well they personally understand the current concepts. Some examples:

    Mr. H. is teaching genetics using Punnett Squares. As each variation is introduced with a new sample problem, he has four students go to the blackboard to write a solution to the current problem while the rest of the students try it in their notebooks. He then "marks" the blackboard work while discussing each student's solution to the problem with the entire class. A grade is given, deficiencies are pointed out, and the class is expected to provide a fix if any is required. However, the grade is not recorded. Instead, it is presented verbally in a simulation scenario: "If this solution was presented on the upcoming unit test, I'd give it 4 out of 7. Three marks were deducted because...". Mr. H wants every possible Punnett Square mistake made on the blackboard. Through trial, error, and follow-up discussion, students will learn what these mistakes look like and how to avoid/correct them.

    Next door, Mr. R. is teaching how to find the roots of a quadratic equation. He uses exactly the same process of board work, trials, and follow-up discussion. Additionally, he gives a take-home assignment which will be graded in class the next day. This assignment is set at the skill level taught that day. The following day, the students grade each other's work under his direction. If this grade is recorded, it's not given significant weight because it's still in the trial-and-error experimental stage, i.e. it's formative.

    In the primary school down the street, Mrs. P. is teaching spelling. She is conducting a team spelling bee. The objective is to encourage the learners to spell this week's word list properly. If a mistake is made, the team gets "gonged". To prevent students from making intentional mistakes and getting a laugh from the gong, the winning team will earn a prize. No data is recorded whether students spell their words properly or not.

Note that communication was inherent to each example. Constant communication and feedback about the quality of student work provides clear direction for learners to improve. Good communication is a powerful motivator! Frequent feedback is the most important personal growth tool that teachers have available! MarkBook was designed to maximize communication and growth through printed reports such as those in section 8-7 and section 9-6, and through on-screen summaries as in section 9-1. If recorded, formative assessments will appear on MarkBook reports. However, the educator can delete them or lower their weight at any future date.

Summative Assessment measures achievement relative to the course objectives.

Once learners have had an opportunity to learn, then proof of that learning comes in the form of a summative measure. This may be an exam, a unit test, an assignment, a performance, etc. Of course, any summative assessment must measure the items taught instead of items which were not taught. Summative assessments DO count significantly in the final grade assigned to each learner. In fact, the bulk of the final grade should be based on the aggregate of the summative measures.

Self Assessment and Peer Assessment encourages each learner to accurately judge their own work.

Frequently, students are unable to accurate gauge the quality of their own work or to judge how well they are doing. Learners should be given an opportunity to grade their own work, often with a rubric, and to evaluate themselves in the course. Such measures, if recorded, should have a very low weight. However, self-assessment provides a powerful feedback tool to the student about the quality of their own performance. Some top students are very self-deprecating. They think that everything they do is below standard. After graduation, these individuals will have a tough time meeting deadlines or completing all work. Conversely, some poor students unrealistically believe that everything they do is top-notch. Again, there will be future problems. Self Assessment requires a critical examination of one's own work helps students get a better picture of themselves.

"MOTHERHOOD" PRINCIPLES ABOUT ASSESSMENT AND EVALUATION

To best promote learning and to give a valid picture of individual achievement, the A&E process should incorporate the following principles:

    assessment processes should be planned and communicated to learners and parents prior to instruction.
    assessment strategies must align with the prescribed curriculum objectives and with the teaching strategies used.
    assessment strategies must accommodate exceptionalities as well as variations in culture and language. In other words, assessment must be fair and designed to enable each student to demonstrate the full extent of their own learning.
    assessments should measure how well students learn as well as what they have learned.
    assessment instruments should be highly varied in type.
    assessments should cover a full range of instructional objectives including knowledge, skills, and affective items.

    assessment should be continuous.
    students should peer-assess, self-assess, and set personal achievement goals.
    students must receive clear instructions for improvement in what they have learned as well as how they learn.

NORM-REFERENCED GRADING vs. CRITERION-REFERENCED GRADING

All educators plan their assessment methodologies to some degree. In this process, we
    decide what kind of assessment data should be collected,
    design various instruments to collect that data,
    grade each student's work,
    assemble that graded data in a logical and analytical way, and
    evaluate and report.
MarkBook performs the last two steps magnificently. However, the stumbling block for many educators is the third step because there are two distinct ways to grade student work. One stifles growth and the other promotes it.

Norm-referenced grading is traditional and widely used. Do any of the following statements sound familiar?
    1. Class averages should end up in a certain range e.g. between 65% and 70%.
    2. A fixed percentage of students should fail.
    3. Enriched classes or gifted students should have a higher average and none should fail.
    4. Slow learners should have a lower class average and a higher failure rate.
    5. Class grades should show a bell curve distribution pattern.
    6. Students should compete for marks.

In norm-referenced grading, the ideal is to have student grades spread out with an average or median in a pre-determined range. Marks are assigned by each student's relative placement within the group. Growth is not measured! Instead, relative rank is measured!

If these statements sound familiar, then your grading system is norm-referenced. This grading technique does NOT encourage growth! In fact, norm-referenced grading is devastating for many students. Weaker students quickly recognize that they end up with low grades no matter what quality of work they do! Students who receive constant put-downs and never taste success end up refusing to try. That means no growth! Gifted students recognize that they get top marks because they are good test writers. Many of these students realize that little work is required to maintain their relative standing in class. Again, no growth!

Criterion-referenced grading is the technique that encourages growth. Each piece of student work is graded using a scale that indicates what mark will be assigned for a certain quality of work (this scale may be called a "rubric"). If all students meet the top criteria, all get 100% on that item. In criterion-referenced grading, the ideal is to have all students achieve 100% as a final mark! This is possible because student growth is measured by absolute performance not by relative performance.

Expect compliance and overall marks to go up whenever criterion-referenced grading is introduced. Also expect some novel responses such as students faxing their MarkBook report card (section 9-6) to family and friends or students negotiating with you to fix deficiencies in their academic record by handing in missing items. Nothing promotes success like success!

RUBRICS and EXEMPLARS

Rubrics provide a means of judging student performance. A rubric is a rule or guide. A rubric enables an evaluator to convert (i.e. "grade") a given quality of student work into a letter grade, percentage, or level. Tests involving multiple choice, fill in the blanks, matching, or other "right/wrong" items don't need rubrics. However, complex student work, such as an essay, cannot be properly and fairly graded using a simple "right/wrong" rubric. Instead, the evaluator should devise a rubric chart that enables conversion of the work's quality into a percentage, letter grade, or level. This chart may contain more than one criterion for grading. For instance, the evaluator may be expected to grade an essay on grammar, punctuation, structure, works cited, logic, etc.

Rubrics promote consistency. Consider the example of two English teachers, one known to students as a "tough marker" and the other known as an "easy marker". Given the same piece of student work, that both teachers agree was well done, one assigns a grade of 70% and the other assigns a grade of 95%. However, if both teachers were guided by the same rubric, it's very likely that their assigned grade would be much closer, if not identical.

Exemplars are useful in grading to promote growth. These are typical examples of student work demonstrating a given quality or level of performance. Published exemplars should provide samples of real student work at all levels so that evaluators may compare a given student's work with the exemplars for guidance in assigning a level or grade.

Again, exemplars promote consistency. If all evaluators in a system use the same rubrics and the same exemplars, then feedback to students will be consistent as well. That is, each individual will receive "clear instructions for improvement in what they have learned as well as how they learn". This criterion-referenced feedback is far superior to a norm-referenced message about how they performed relative to other students.


Manual: Go to Appendix A-2 for a discussion of your education system's requirements.



Asylum Software's Home Page | Send Us Email | LockerManager | MarkBook CNX for handheld computers | Why Use Class Management Software? | Start of the MarkBook Manual | Order a printed MarkBook Manual | LogiNotes (MarkBook En Français)  | Bar Coded Attendance | MarkBook's Home Page | Electronic Report Cards | Order MarkBook