Ad Code

AIOU B.Ed Assessment and Evolution 8602 Solved Assignment No 2 Spring 2021

 Allama Iqbal Open University Solved Assignments

Course:    B.Ed Assessment and Evolution 8602 Solved Assignment No 2 Spring 2021

This is plagiarism free assignment, students can copy from blew and submit at aaghi LMS.


Q1 What is the Relationship between Validity and Reliability of test.

The link between validity and reliability

Reliability and validity are two different standards used to measure the usefulness of a test. Although they are different, they work together. It would not be helpful to develop a test with good reliability that does not measure what you intend to measure. It was impossible to measure exactly what we want to measure with a test whose results are too imperfect to repeat, and vice versa. Reliability is a prerequisite for validity. This means that in order to be valid, you must have good reliability, the reliability actually sets an upper or lower limit, and the test may not be valid if it is not reliable. Creating good credibility is only the first part of ensuring validity. Validity must also be determined. Good reliability does not mean good validity, it simply means that we measure something consistently. The key is that credibility is needed, but it is not enough for validity. In short, credibility is noticing when a problem is valid.

availability

The validity of an assessment tool is the extent to which it measures what is designed to be measured. For example, if the test is designed to measure three-digit addition in math, but the tasks are presented in a complex language that does not match the students' skill level, it cannot measure numerical additional skills by numbers. This is not a valid test. This concept has been defined by many measurement experts, some of which are given below. According to the business dictionary, "validity is the extent to which a tool, selection process, statistical technique, or test measures what it should measure." Cook and Campbell (1979) define validity as the relevance or accuracy of conclusions, decisions, or explanations derived from the test results of individuals, groups, or institutions. According to the standards of the American Association of Psychologists (APA), the most important consideration in evaluating tests is. The term refers to the significance, significance and usefulness of certain conclusions drawn from test results. The validity of the test is the process of gathering evidence to support these conclusions. However, validity is a single concept. Although evidence can be gathered in a number of ways, validity always refers to the extent to which this evidence supports the conclusions drawn from the results. The conclusions apply to certain uses of the test, not the test itself.

Howell (1992) Opinion on the validity of a test; A valid test should specifically measure what is intended to be measured. For Messick, the validity is large-scale, not strictly valid or absolutely invalid. He argues that over time, evidence of validity continues to accumulate, reinforcing or refuting previous findings. In general, we can say that the validity of the assessment refers to the extent to which the content of the test represents the actual skills learned and whether the test allows you to draw accurate conclusions about the performance. Therefore, validity is the extent to which a test measures what it claims to measure. The validity of the test is essential for the correct application and interpretation of the results.

RELIABILITY

What does the word trust mean? Reliability means reliable. A test result is reliable if we have reason to believe that the test result is consistent and objective. For example, if the same test is run in two classes and assessed by different teachers, it can be trusted even if it gives similar results. Stability and reliability depend on the degree to which the result is free from accidental errors. First, we need to create a conceptual bridge between the question posed by the individual (ie are my results reliable?) And the scientifically measured reliability. This bridge is not as simple as it seems at first glance. When a person thinks about reliability, many things can come to mind: my friend is very reliable, my car is very reliable, my online billing process is very reliable, customer performance is very reliable, and so on. The weighted properties are stability, reliability, predictability, variability, etc. these are concepts. Note that implicit expressions of reliability are behavior, machine performance, data processes, and business operations, which can sometimes be unreliable. The question is, "How do test results differ from different observations?"

 

Q.2 Define a Scoring Criteraia for Essay type test items for 8th grade?

General Consideration in Constructing Essay type Test Items

In her book, Robert L. Ebel and David A. Frisbie (1991), "teachers are often as interested in measuring students 'thinking and knowledge skills as they are in measuring students' knowledge. There is a need for tests that lead to a degree enable." The student must answer a multi-paragraph question, writes on multiple pages, and can be used for college results and assessments such as essay, synthesis, or sub-notes.

Types of test tests

Pattern tests can be broken down into different types. WS Monree and RI Cater (1993) divide practical tests into several categories, such as: . Cause and effect, explanation of word usage or exact meaning, summary of the sentence, textbook or article unit, analysis, relationship explanation, illustration or examples, classification, application of rules, laws or principles, new situation, discussion, publication of authoring material or purpose in the organization , Criticism - the appropriateness, correctness or relevance of the printed statement or the classmate's answer to a question about the lesson, repeating the facts, formulating a new question - problems and question, new approaches, etc.

Funding elements of the assessment

A title or evaluation criteria has been developed to evaluate / rate the essay type article. This section is a guide to assessing subjective judgments. A set of criteria and standards related to learning objectives used to assess a student's performance on assignments, projects, essays, and other assignments. Headings make the evaluation easier and more transparent and enable a standardized evaluation based on defined criteria. The title can range from simple checklists to combinations of detailed checklists and rating scales. The details of the header depend on what you want to measure. If an article in your article is a limited response article that only evaluates mastery of the actual content, a fairly simple list of key points will do. An example of the header of a restricted response element is given below.

Evaluation key / evaluation criteria:

1. 1 point, maximum 5 points for each given factor

2. One point, 5 points maximum, for an adequate explanation of each of these factors

3. There are no penalties for spelling, punctuation or grammatical errors.

4. No additional credits are awarded for one or more of the five factors mentioned.

5. Information outside the topic is ignored.

Test evaluation

Assessment of target test units

If the student's answers to the test paper itself are recorded, the section can be created by ticking the correct answers on the blank copy of the test. Scoring is the comparison of the answer columns in this Master with the answer columns in each student's work. If it is more convenient, you can also use a strip of tape, which is just a strip of paper in which the answer columns are hidden. They can be easily prepared by cutting out the answer columns from the experiment template and placing them on strips of cardboard cut from the manilla folders.

When assessing objective tests, each correct answer is usually counted as a point because the random weighing of units does not significantly change the student's final score. If some items are accepted with two points, some with one point, and others with half points, it is more difficult to rate them without any benefit. Ratings based on such weights would be similar to a simpler method in which each element is read at a point. As we'll see later, keeping the top and bottom groups and ten students makes it easier to interpret the results. It's also a reasonable number for analysis in groups of 20-40 people. For example, with a small group of 20 students it is best to use the top and bottom pages for reliable data, while for a larger group of 40 students it is best to use the top and bottom 25 percent. Satisfactory. For a more detailed analysis, an upper and lower percentage of 27 percent is usually recommended, and most statistical guidelines are based on this percentage.

Q.3 Write a note on Mean, Median and Mode. Also dicsuss their importance in interpreting test scores.

Measures of Central Tendency

In her book, Robert L. Ebel and David A. Frisbie (1991), "teachers are often as interested in measuring students 'thinking and knowledge skills as they are in measuring students' knowledge. There is a need for tests that lead to a degree enable." The student must answer a multi-paragraph question, writes on multiple pages, and can be used for college results and assessments such as essay, synthesis, or sub-notes.

Types of test tests

Pattern tests can be broken down into different types. WS Monree and RI Cater (1993) divide practical tests into several categories, such as: . Cause and effect, explanation of word usage or exact meaning, summary of the sentence, textbook or article unit, analysis, relationship explanation, illustration or examples, classification, application of rules, laws or principles, new situation, discussion, publication of authoring material or purpose in the organization , Criticism - the appropriateness, correctness or relevance of the printed statement or the classmate's answer to a question about the lesson, repeating the facts, formulating a new question - problems and question, new approaches, etc.

Funding elements of the assessment

A title or evaluation criteria has been developed to evaluate / rate the essay type article. This section is a guide to assessing subjective judgments. A set of criteria and standards related to learning objectives used to assess a student's performance on assignments, projects, essays, and other assignments. Headings make the evaluation easier and more transparent and enable a standardized evaluation based on defined criteria. The title can range from simple checklists to combinations of detailed checklists and rating scales. The details of the header depend on what you want to measure. If an article in your article is a limited response article that only evaluates mastery of the actual content, a fairly simple list of key points will do. An example of the header of a restricted response element is given below.

Evaluation key / evaluation criteria:

1. 1 point, maximum 5 points for each given factor

2. One point, 5 points maximum, for an adequate explanation of each of these factors

3. There are no penalties for spelling, punctuation or grammatical errors.

4. No additional credits are awarded for one or more of the five factors mentioned.

5. Information outside the topic is ignored.

Test evaluation

Assessment of target test units

If the student's answers to the test paper itself are recorded, the section can be created by ticking the correct answers on the blank copy of the test. Scoring is the comparison of the answer columns in this Master with the answer columns in each student's work. If it is more convenient, you can also use a strip of tape, which is just a strip of paper in which the answer columns are hidden. They can be easily prepared by cutting out the answer columns from the experiment template and placing them on strips of cardboard cut from the manilla folders.

When assessing objective tests, each correct answer is usually counted as a point because the random weighing of units does not significantly change the student's final score. If some items are accepted with two points, some with one point, and others with half points, it is more difficult to rate them without any benefit. Ratings based on such weights would be similar to a simpler method in which each element is read at a point. As we'll see later, keeping the top and bottom groups and ten students makes it easier to interpret the results. It's also a reasonable number for analysis in groups of 20-40 people. For example, with a small group of 20 students it is best to use the top and bottom pages for reliable data, while for a larger group of 40 students it is best to use the top and bottom 25 percent. Satisfactory. For a more detailed analysis, an upper and lower percentage of 27 percent is usually recommended, and most statistical guidelines are based on this percentage.

Q.4 Write the procedure of arisiing letter grades to test score.

Calculating CGPA and Assigning Letter Grades

CGPA is the average of the cumulative score. It reflects the GPA for all courses / courses related to student achievement. To calculate the CGPA, we need the following information.

• Notes on each subject / course

• Average score for each course / course

• Total credit hours (additional hours per subject / course)

Calculating a CGPA is very simple: the total GPA is divided by credit hours. For example, if a student has completed 12 master classes with 3 credits. The total number of credit hours is 36. CGPA 36/12 - 3.0.

Assignment of written grades

The letter rating system is the most popular in the world, including Pakistan. Many teachers face assessment problems. There are four main problems or concerns in this regard; l) What should be included in the written evaluation, 2) How should the achievement data be combined to obtain a star rating? , 3) which reference framework should be used for the evaluation; and 4) How should the distribution of characteristics be determined?

Indicate what to add to the note

Letter results can only be more meaningful and useful if they represent success. Efforts for completed work, personal behavior, etc. If other factors or aspects follow, such as their interpretation, they are hopelessly confusing. For example, the letter C may indicate average success with extraordinary effort and excellent behavior and behavior, or vice versa.

If star ratings are to be valid performance indicators, they should be based on existing performance criteria. This includes setting goals as intended learning outcomes and developing or selecting tests and assessments that can be used to measure those learning outcomes.

 

Combine data when assigning ratings

One of the main challenges in assessment is to understand which aspects of the student are being assessed or what the timing of each learning outcome is.

For example, if we choose 35 percent for middle school, 40 percent for exams or final grades, and 25 percent for homework, presentations, classroom sharing, and conducting and conducting; We need to combine all the units by assigning a weight to each unit and then use these aggregates as a basis for valuation.

 Select the appropriate reference base for the ranking.

The results of the letters are usually based on one of the following reference systems.

a) Performance compared to other members of the group (relative result)

b) Performance according to established standards (absolute assessment)

c) Performance related to learning ability (several improvements)

Relative assessment involves comparing students' results with those of a comparison group, often with classmates. In this system, the grade is determined by the student's relative position or grade in the group. Although the disadvantage of relative assessment is the different referral system (eg results depend on the ability of the group), it is still widely used in schools, as in most cases our testing system is "standard-oriented".

Grading absolutely means comparing students' results to the standards set by the teacher. We call this a comparison. If all students perform poorly according to the established performance standard, they will all receive low scores.

Students' teaching ability does not follow a standardized system for assessing and reporting student achievement. Improvement in a short time is difficult. Consequently, insufficient reliability in assessing performance and capacity growth rates leads to low reliability ratings. Therefore, such degrees are used.

Determining the distribution of grades

A relative grade is essentially an assessment of a student's overall achievement and a written grade assigned to each group of students. This assessment may be limited to a single set of grades or based on a combined distribution of several class groups that have completed the same course.

If assessment is to be done on a curve, a more logical approach to determining the distribution of grades in a school is for school staff to establish general guidelines for introductory and advanced courses. All staff should understand the basis of the assessment and this should be clearly communicated to class users. If the objectives of the course are clearly stated and the standards of excellence are correctly set, the number of letters in the absolute system can be determined as follows. Complements other evaluation systems.

Q.5 Discuss the difference between Measures of Central Tendency and Measures of Reliability.

ΓΌ Measures of Central Tendency

For example, suppose a teacher gives the same test in two different classes and the following results are obtained:

Class 1: 80%, 80%, 80%, 80%, 80%

Level 2: 60%, 70%, 80%, 90%, 100%

If you average the two sets of results, you get the same answer: 80%. However, the data of the two classes from which these mean values ​​were derived differ greatly in both cases. It is possible for two different databases to have the same mean, median, and mode. For example:

Class A: 72 73 76 76 78

Class B: 67 76 76 78 80

Thus, classes A and B have the same mean, mode and median.

How statisticians differentiate between these cases is known as a measure of sample variability. As with central trend measurements, there are several ways to measure sample variability.

The simplest method is to find a range of samples that represents the difference between the largest and smallest observations. The measurement range is 0% for class I and 40% for class 2. Knowing this fact, we can better understand second class data. Class 1 has an average of 80% and a range of 0, but Class 2 has at least 80% and a range of 40%.

Statisticians use summary measures to identify patterns of data. The mean trend measure refers to the summary measure that is used to determine the most typical values ​​in the range of values.

Here we are interested in typical and more representative results. There are three most common trend metrics: average, fashion, and median. The teacher should be familiar with these general trend metrics.

Reliability indicator

What does the word trust mean? Reliability means reliable. Test results are considered reliable when we have reason to believe that the test results are stable and objective.

 For example, if the same test is given in two classes and graded by different teachers, it is reliable even if it gives similar results. Stability and reliability depend on the extent to which the score is free from random errors. First, we need to build a conceptual bridge between the questions asked by individuals (i.e., is my result reliable?) And how to scientifically measure reliability. This bridge is not as simple as it seems at first glance. When you think of reliability, many things come to mind: my boyfriend is very reliable, my car is very reliable, my internet bill payment process is very reliable, my customers' performance is very reliable, and so on. The characteristics considered are consistency, reliability, predictability, variability, etc. these are the answers. Note that indirect trust expressions include behavior, machine performance, computing power, and business processes that are sometimes unreliable. The question is, "How do test results differ between different observations?"

 

Some definitions of reliability:

According to Merriam Webster's Dictionary:

"Reliability is the degree to which a test, experiment, or measurement process produces the same results when tested repeatedly."

According to Hopkins and Antes (2000):

"Reliability is the consistency of repeated recordings made by a single subject or a group of subjects."

Joppe (2000) defines reliability as follows:

. The extent to which the results are consistent over time and are an accurate representation of the entire study population is called reliability, and if the survey results can be generated using a similar methodology, the research tool is considered reliable.

The most common definitions of reliability are: the degree to which the results are stable and consistent at different points in time (reliability tests), in different ways (parallel and alternative forms), or when measured with different elements. the same scale (internal consistency).

AIOU B.Ed Assessment and Evolution 8602 Solved Assignment No 2 Spring 2021
AIOU B.Ed Assessment and Evolution 8602 Solved Assignment No 2 Spring 2021


Post a Comment

0 Comments

Close Menu