Monday, March 2, 2015

A gentle intro to cross-cultural equivalence - or how can we measure across cultures?

Psychology is the study of human behaviour and mental processes through scientific methods. The claim of psychology is often to be universal, that is applicable to all of humanity. Using scientific methods, we psychologists rely on a systematic and objective process of proposing and testing hypotheses and making predictions about the state of human nature.  Ever since the beginning of psychology as an academic discipline, the scientific quest to quantify natural occurrences to better understand and predict them in the future became one of the ultimate goals. Of course, this requires often extensive qualitative research, but ultimately the hope was and is that we can understand a behaviour or mental process so precisely that we can quantitatively measure it and also change it.



The application of such quantitative methods are now often taken for granted, even though the levels of quantification may vary. For example, we may want to select the most able person for a particular job, refer a child with learning problems to a specialist or we may wish to help a person with mental health problems to fully function in society again. Even though all these problems can be phrased in qualitative terms (a good person for the job, a child that has problems learning, a person who is not well), these are essentially quantitative problems because they always have some reference to implicit or explicit standards. A person might be BETTER qualified than another to take up a job or a person may have GREATER problems understanding concepts or material than 75% of the children of her age. Therefore, in many day-to-day situations we make implicit and intuitive quantitative statements.

If we want to make quantitative statements about a scientific concept, we run into one of the central problems in psychology. This is namely WHAT do we want to make a comparison about? Or in other words, how do we define a psychological construct so that we can measure it? A geographer, chemist or physicist is unlikely to phase the problems that psychologists have… after all, we can easily measure distances (e.g., how far is Auckland from Wellington), we have ways of dating the age of a piece of rock or we can measure the energy of particles when we collide them at the near speed of light. Psychologists on the other hand are dealing with intangible concepts that are difficult to specify. Most of you are familiar with concepts such as intelligence, attitudes, personality traits, depression or identity. However, if we were to ask you to pinpoint any of these concepts in the real world, we would be unable to do so. Our psychological terminology refer to unobserved mental constructs that we create in our community of fellow psychologists to indicate a particular set of problems, describe a particular set of behaviours or mental representations. I would argue that underlying many of these psychological terms are assumptions about relative coherence, stability, generalizability and potentially even some general biological foundations that lead to the emergence of such a syndrome. Therefore, we don’t just invent these terms on a whim, but we think that there is something meaningful to them that we think is important enough to look into and tell other people about.

Therefore, the first issue in any psychological study, even though it may not seem obvious anymore, is to clearly and unambiguously define and specify what we want to study. What is our construct or process of interest? It is at this point, that culture will throw the first curve ball at any psychologist attempting to address this question. How can we make sure that our definition or mental construct of our psychological term or process is actually valid or does have some meaning in another cultural context? How does our upbringing in a highly developed Western society influence how we think about psychological constructs? Can we assume that identity is a concept that is meaningful in a village in the lowland Amazon basin? Is our definition of depression applicable to refugees coming from Syria or Iraq? Is conscientiousness a useful term to screen out applicants for jobs in an international organization? Therefore, the first problem in any psychological study is to unambiguously define and describe the psychological process for all the populations that we are interested in. We could think of this as a mental bubble that we draw around some problem or process. Does this bubble ‘exist’ in all the different cultures that we want to include in our study? How can we find out whether this bubble is meaningful and has some value or relevance for all the local populations? We will discuss this as the question of functional equivalence.

If we are confident that there is some value to this mental bubble of ours (let’s say, depression, personality or identity) and that the terms are meaningful in two or more cultures, then we need to find good indicators for it. In psychological terms, this is called operationalization. How can we empirically say that one person has more of this latent category quality that we just created with our mental bubble compared to another person? What would be a good indicator to tell us that one person is better for a job compared to another person or that one person is a better learner than another, who in turn may need some help? Here again, culture will throw lots of beautiful little challenges at us. We need to find indicators that are meaningful and relevant in each cultural context, but obviously we would still need to be able to compare the results across contexts. Therefore, we can’t have indicators that are relevant and meaningful in each context, but cannot be compared across cultures. We want to aim for some level of comparability. For example, is staying late at your desk a good indicator of being conscientious? Or could it be seen as being disorganized and incompetent? What if people are unfamiliar with office jobs? Is the number of items that you circled the temple this morning before going to work a better indicator of your conscientiousness? Is the ability to track animals over long distances and varied terrain a good indicator of concentration?  Or should we give people lots of d’s and b’s and p’s and q’s and then ask them to count how many p and q’s were together in each line? Should we measure intelligence by asking people to name as many types of medicinal plans for diarrhoea? Or give them complex questions about history and philosophy? This problem of identifying good measurement indicators will be called structural equivalence. Obviously, how we define and how we operationalize a construct is very much dependent on each other. For this reason, some researchers lump the two terms together as construct equivalence. For reasons that we will discuss later, I prefer to keep them separate.

So, we now have a mental bubble and we have a number of indicators that give us some clue about the latent bubble. However, we don’t actually know how good each of these indicators is in representing that latent bubble. We need to find a way to show us how well each indicator works in each of our cultures. In other words, is the same indicator better in capturing a key aspect of our construct in one culture compared to another? For example, is going to parties and having lots of friends a good indicator of extraversion? Is having many wives a good indicator of social status in all cultures? Is staying late at work to finish a good indicator in all cultures for high conscientiousness? This problems is called metric equivalence. It is the question about the relative strength of the indicator-latent variable relationship. In technical terms, we are concerned with the equivalence of factor loadings or item slopes in classic test theory or the item discriminability in item response theory.

Finally, we may be convinced that our indicators work equally well in all contexts. Each questionnaire or test items is really giving us a good and reliable insight into the construct. But there may be still problems. Some items, even though they have the same relationship with the latent construct in all cultures, may still be a bit more difficult or easier in one context compared to another.  If I would ask you to name the capital of Benin, most of you would probably struggle finding the correct answer. Benin is a country that is quite far from our thoughts and most of us will never set foot in this place or may not have heard about it in the media. However, if I would ask you about the capital city of one of your neighbouring countries, you would probably quite easily be able to name it. Therefore, asking about the capital of Benin would be easier for somebody living in Togo or Nigeria compared to somebody living in NZ or Denmark. This is the issue of full score or scalar equivalence. Technically, we would look at the invariance of item intercepts (in a multi-group CFA) or the differential item difficulty (in IRT).


In summary, measuring psychological attributes or processes across cultural contexts is quite difficult. I gave some relatively superficial and easy examples to make this a relatively non-technical and easy intro to the problem. We need to define our construct – draw our mental bubble around what we want to study. The first step in any cultural study then is to make sure that this construct or mental bubble is meaningful and functional in all cultures that we want to study. Once we think this is the case, we need to find good indicators that are observable and give us some insight into the position or state of an individual in relation to our mental bubble. We then need to discuss whether the indicators are equally good in all contexts or whether some are better in telling us something about a person or process in one cultural context compared to another. Finally, we need to find out whether all indicators are equally easy or difficult. Only once we have fulfilled this last criterion can we actually make any comparisons between individuals or groups across cultures. This is a tough task and unfortunately, most studies that you will see in the literature do fall well short of it. But this is the challenge that we really need to meet in order to develop a meaningful and universal psychological science.