A grade calculator is useful


Barbara Chervil

born 1978, is a journalist specializing in education, social affairs and society. She studied psychology in Giessen and subsequently volunteered at the "Süddeutsche Zeitung". Since 2010 she has been working as an author and freelance editor in Berlin, including for the Berlin "Tagesspiegel", the business magazine "brand eins" and the language learning magazine "Deutsch Perfekt"

Digit grades from 1 to 6 belong to school like lessons, class trips and lunch. But grades have been criticized for decades. They are considered unjust, arbitrary, not comparable. Then why are they still being given almost everywhere?

A boy holds his interim certificate in his hand in a classroom of the St. Stephan Middle School in Straubing (Lower Bavaria). (& copy picture-alliance / dpa)

Anyone who has a 1 in their maths certificate can look forward to their own top performance; whoever gets a 3 knows that they are mediocre; and anyone who has a 6 on their report card knows that their math performance in this school year was "unsatisfactory". Notes summarize information in numbers that are understandable at a glance.

But this reduction to numbers has been criticized for many years. Grades are considered unfair, prone to distortion and difficult to compare. Primary school teachers in particular advocate replacing numerical grades with other forms of assessment. Forest village schools and reform pedagogical model schools waive grades up to the upper level. Several federal states are trying out alternatives to traditional grades. Are grades as bad as their critics claim? And if so, why are they still given everywhere? Who needs grades? An overview of the most important questions in the debate about digit notes.


How can measurement methods be assessed - and what does that mean for school grades?

Many questions that researchers are interested in can only be answered by counting and measuring phenomena. Because measurement plays such a central role in science, a number of quality criteria have been agreed that can be used to determine how good a measurement method actually is. School grades are also measurements: they claim to measure the performance of pupils, whether in a subject, a class test or an oral exam. Therefore, the quality criteria for measuring established in science can also be related to school grades. Which conditions would grades have to meet in order to be considered a good measurement method?

  • Objectivity: Is a measurement method independent of the person using it? That measures objectivity. A measurement is objective if different observers come to the same results. School grades would therefore be objective if different teachers rate the performance of a student with the same grade.

  • Reliability: Does a measurement method measure reliably? That captures the reliability. A measurement is reliable, among other things, if a person gets the same result after repeated measurements. For school grades, this means: They would be reliable if a student got the same grade, if he or she wrote two papers with comparable tasks one after the other.

  • Validity: How well does a measurement method really measure what it is supposed to measure? This is the question that the validity (validity) aims at. For this purpose, for example, observations and measurement results from different sources are compared with one another. School grades are therefore considered valid if pupils who have achieved a good grade in one piece of work also do well in other exams that relate to the same area of ​​knowledge.

Question 1: What do grades measure?

What exactly does a 2 mean in German? Behind the number there is an abundance of individual services. Critics of the numerical grades say: You don't learn from a grade what a child can really do. The German grade includes reading comprehension, written formulation, spelling and oral expression, among other things. Perhaps the pupil is excellent at phrasing, but has weaknesses in spelling? The overall grade, for which the average is formed from several partial performances, compensates for such differences - and thus makes them invisible. In addition, there is the fundamental question of whether grades can capture the actual level of knowledge in a subject. To answer this, educational researchers compare school grades with assessments from other sources, for example with the performance of children in standardized tests. This was also done in the Pisa study in 2006, in which the scientific competence of 15-year-old schoolchildren was examined.

A connection was found between school grades and scientific competence: Those who had good grades in biology, physics and chemistry also tended to achieve a higher score in the Pisa test. However, this relationship was relatively weak. The authors of the German Pisa study 2006 explain this with the fact that the Pisa test and school grades record different facets of performance. Certificate grades, which are made up of class work, tests and oral queries during the school year, reflect rather short-term learning effects, often related to specific exams. The Pisa test, on the other hand, primarily tests the sustainability and flexible application of what has been learned.

Be careful with comparisons

Grades make it possible to get an idea of ​​a person's performance without much effort and to compare people with one another. In fact, such comparisons are problematic. Because the respective learning group is the reference value for the grading. Grades therefore do not reflect the objective level of performance, but the ranking within a class. The fact that there is a certain correlation between grades and performance measured in standardized tests such as PISA does not change this.

The reference variable is the class

The classic six-point grading scale is based on the assumption that talent and performance follow a normal distribution: Most of the class is in the average range, plus some very good and some particularly bad students. This pattern should be reflected in the distribution of grades. This means that teachers have to include some particularly difficult tasks in tests that only the best students can solve. School principals and school authorities urge teachers to use this scheme in their assessment to varying degrees.

Here a 2, there a 4

From the orientation towards the normal distribution it follows that a mediocre performance in different classes can lead to different grades: In a bad class there might be a 2, in a good class only a 4 possible. This applies to different classes in the same year of a school, as well as to comparisons between schools - and even more so to the comparison of grades from different federal states in which teaching is also based on different curricula.

Question 2: Are grades objective?

Several teachers assess the same work, in some cases significantly different. Studies have shown this repeatedly. In the case of German essays, this is perhaps unsurprising and, in fact, many scholars consider them to be very subjective and difficult to evaluate. In fact, studies have in some cases found large differences in the assessment for supposedly objective criteria such as mathematics and spelling (an example can be found in Brügelmann and Backhaus, 2006). One explanation for this is the fact that educators generally have a lot of leeway when grading. In most schools, for example, it is up to the individual teachers how many points they assign in an exam for a correct answer and how much they deduct from the total number of points for each incorrect answer. More and more schools are recognizing the problem and setting uniform evaluation standards; however, these only apply to written work. The same problem naturally also arises when assessing oral performance, where the teachers usually have even greater leeway. You can not only decide how to evaluate in a specific case, but also how many oral grades you collect. For example, a student who, after a failed exam, has several chances to improve orally, will probably end up with a better grade than a student who did not get this opportunity.

What distorts judgments

In addition, there is the fact that human judgments are often influenced by unconscious psychological processes. For example, a teacher is very likely to rate an average piece of work better if she has previously corrected several bad pieces of work. The previous impression of a student can also influence the assessment: If a child has only written great essays, the teacher may read a German exam with a mental bonus in mind, which can ultimately lead to a better grade. Such distortion mechanisms have been well documented in psychological studies - and do not only apply to teachers who correct a piece of work (Brügelmann & Backhaus, 2006 and Oelkers, 2001, for example, provide an overview).