- 03/12/2012
- Posted by: essay
- Category: Free essays
Question 3. Is it possible for a test with high reliability to have low validity?
Yes, a test can be reliable (i.e. provide consistent results), however, if it measures inappropriate values (has low or zero validity), it is not useful. Thus, a highly reliable test may have low validity.
Question 4. Overall, is “validity” or “reliability” more important when evaluating an instrument?
Overall, validity is the more important characteristic of an instrument since it more strongly determines the usefulness of a test. A test with high validity but low reliability might be of certain use (its results may be analyzed, with certain assumptions); however, a test with low validity and high reliability is of no use since it does not measure the required values.
Topic 32. Measures of reliability.
Question 1. Researchers need to use at least how many observers to determine interobserver reliability?
For measuring interobserver reliability, there should be at least two observers.
Question 2. When there are two quantitative scores per participant, researchers can compute what statistic to describe reliability?
When there are two quantitative scores per each participant, correlation coefficient between these two measurements may be calculated; correlation coefficients that are calculated for reliability purposes, are called reliability coefficients.
Topic 33. Internal consistence and reliability
Question 2. Does the “split-half” method require one or two administrations of the test?
The “split-half” method belongs to the class of methods which use the scores from one administration of the test for determining internal validity.
Question 3. What is meant by “odd-even” split?
“Odd-even” split is a form of ‘split-half” method when all even-numbered items from the test are scored as one test, and all odd-numbered items are scored as another test; then, correlation between the two splits of the test (odd and even) is calculated, in order to estimate internal validity.
Topic 34. Norm- and criterion-referenced tests.
Question 5. In which type of tests are items typically selected on the basis of content they cover with regard to item difficulty?
Item difficulty is important for norm-referenced tests; selecting too easy or too difficult items for this type of test will result in reduction of its validity. Thus, for this type of tests items are selected according to the content and according to difficulty; items that are appropriate in content but do not have medium difficulty are eliminated.
Question 6. Which type of test should be used in research where the purpose is to describe specifically what examinees can and cannot do?
For the purpose of describing and measuring what students can do and what they cannot do, criterion-referenced tests are used; in such tests, items are selected according to the criterion of what students must do, despite of the difficulty of these items.
Topic 35. Measures of optimum performance
Question 3. A test designed to measure how much students learn in a particular course of school is what type of test?
A test measuring certain knowledge and skills that students have acquired is called achievement test. These tests are likely to have high validity and reliability, given they are properly designed.
Question 4. A test designed to predict success in learning a new type of skills is what type of test?
Tests designed to predict a specific type of achievement, e.g. new type of skills, are called aptitude tests. These tests have lower validity than, for example, achievement tests, since they can measure only a limited number of skills required for the predicted achievement.
Question 6. How can researchers increase the reliability of scoring essays, products and performances?
Estimating reliability of more complex tests than multiple choice tests, such as essays, products and performances, is a difficult issue, since there are no predefined criteria for these items. In order to increase validity, the researcher should develop some desirable characteristics of the essay, product or performance, and create a checklist of such characteristics, or a rating scale, which may be used for quantitative estimate of the essay, product or performance.
Topic 36. Measures of typical performance.
Question 3. What do researchers reduce by observing behaviour unobtrusively?
Unobtrusive observation of behaviour is one of three methods that allows researchers to decrease social desirability of the tests, and thus increase the validity of the test.
Question 4. Loosely structured stimuli are used in which type of personality measure?
Loosely structures stimuli are used in projective techniques; in these techniques, the participant is unaware of the traits the researcher is looking for, and thus does not change the behaviour to socially desirable one.
Question 6. What is the range of choice in a Likert-type scale?
Likert-type scales include the scales with choices ranging from values like “strongly agree” to values like “strongly disagree”, which relate to certain sentences or statements in this type of scales.
Question 7. Is content validity a relevant concern when assessing the validity of a Likert-type scale for measuring attitudes?
Yes, when measuring attitude with Likert-type scales, it is necessary to check whether the statement cover all possible components of the attitude; in order for this test to be valid, the statement should relate to all areas of attitude that might be assigned to the studied field.
Part 2. Quantitative
1. Declare your research question in its final form.
Research question: Will teenagers in Westchester County perform better on tests and engage more in classroom participation with a later school start time ?
2. Declare what instrument you have chosen to collect your data.
For collecting data, I have chosen the pretest-posttest method, i.e. taking two measurements: 1 will be made a year prior to the experiment, and 1 will be made a year after the experiment. Both scores (MAT achievement test and Harkness classroom engagement score) will be measured according to pretest and posttest scheme.
3. Discuss in detail the type of reliability associated with your instrument. Give 2-3 specific examples from your literature review.
According to the chosen data collection methods and research design, reliability coefficients will be used to determine overall reliability. Reliability of pretest and posttest scores in control and experimental groups will be measured, and in case the tests appear to be reliable, analysis and comparison of test results for control and experimental group will be provided. Such type of testing (pretest and posttest applied to achievement tests) have been applied by Tаrаs & Роtts-Dаtеmа (2005) and widely used by Carskadon (1983, 1990, 1999). Evaluation of test validity and impact of later school start time on achievement has been done by Wahlstrom, 2001.
It is necessary to take into account that the reliability of Harkness scores is originally lower than MAT scores, and thus, correlation coefficients for this type of test will also be lower. As to a first approximation, a value of 0.8 correlation coefficient for control group for MAT scores and 0.5-0.6 reliability coefficient for Harkness classroom engagement scores should be regarded as admissible reliable value.
Leave a Reply
You must be logged in to post a comment.