In developing or using any psychometric measuring instruments, the psychologist must answer the important question; does the device really measure what it is intended to measure, that is, it is valid?

A necessary (but not sufficient) condition for validity is that the device gives scores which are consistent so that the same subject would score the same way if he were tested or observed again under the same conditions.

An instrument that gives consistent scores is said to be reliable. But in order to be reliable, a measuring instrument must be objective, so that two or more individuals can score the subjects’ responses and obtain the same result.

Finally, if the subject’s score is to impart any meaningful information, it must be interpretable in relation to the scores of other individuals in a defined group. This is accomplished through standardization. Though we need not go into the technique by which these aims are accomplished, it is important to understand these four concepts and the relationships among them.


The index of validity is the extent which the instrument accomplishes the purpose for which it was intended. Thus, if a psychologist devises a test of mechanical aptitude and discovers that person who gets high scores on the test are nearly always successful as mechanics whereas those who get low scores are nearly always unsuccessful, he may be relatively sure that his test is valid. It’s like with the discovery of our planet. If your instruments’ validity cannot be guaranteed, your research may well be questionable.

It takes time to determine the validity of such a test. Applicants must be tested, hired, and put to work. After a period of time, their performance on the job is measured and a comparison is made to determine whether those who made to best test scores are the ones who do best on the job. See what’s going with Brexit and the European Union! I have the strong impression that the EU can learn so much from other organizations when it comes to measuring what counts! And let’s forget at all, please about the Brits, okay?

Validity is always specific to a particular purpose. That is, a measuring instrument is not “valid” in the abstract but valid only for a specified purpose. For example, a test may be valid for selecting salesmen but invalid for selecting scientists.

When we are making physical measurements, validity usually poses no great problem. It is obvious enough that a yardstick measures length and that a thermometer measures temperature. But in measuring psychological characteristics, it is much harder to establish validity.

For example, a test designed to measure innate intelligence or cultural intelligence may not eliminate the factors of poor schooling and impoverished conditions or with cultural intelligence the other way around in the homes of some subjects, and thus scores may reflect these factors as well as intelligence.


The reliability of a measuring instrument is the degree to which people earn the same relative scores each time they are measured. If it is a matter of chance whether a subject does well or poorly on the measuring device, we say that it is unreliable. Of course, if the person tested varies between tests due to fatigue, boredom, or worry, such changes in scores are not the fault of the measuring instruments…Like in the Brits...(Oh, not again, please)?

A measuring device cannot be valid unless it is first of all reliable, but reliability in itself does not guarantee validity. That is, the fact that a subject makes the same score on a given test each time he takes it does not necessarily mean that the test is measuring what it purports to measure. Reliability is merely a means to the end of validity.


One of the most common causes of unreliability in a psychological measuring instrument is the inclusion of items which must be scored on the basis of subjective judgment. Such items lack objectivity. To be reliable, a measuring device must be set up in such a way that two or more persons can score the responses and get the same results. Many researchers are so uncertain about their measurement equipment. Well, it could be fear of success or fear of failure, wouldn’t you agree?

In many methods of personality assessment, people instead of an instrument must, in effect, act as the measuring, stick, making it almost impossible to obtain an accurate, objective measurement uncolored by personal feeling and attitudes. This is a weakness of most projective devices. Remains often the question How did we get to this point?


To be useful, a measuring device must be standardized, that is, administered under standard conditions to a large group of persons who are representative of the individuals for whom it is intended. This is done in order to obtain norms or standards so that individual scores can be compared against the scores of other individuals within a specific group. If, for example, a measuring device is to be used to classify inductees in the army, the representative sample may be drawn from the general population.

But an instrument that is going to be used to select applicants for admission to college must be standardized on college students. If sex differences influence results, norms for the two sexes should be provided separately. In some instances, age-group norms are required. Too much information is also not good and obviously, the statistical tabulation of scores will be meaningless unless all subjects understood the directions and worked under the same time limit on the test content.