Variable Types

This gives you just one thing: An emphasis (and an explanation) of the two main levels of variables you will encounter. As usual, this page should serve simply as a supplement to and not as a replacement for taking good notes, studying those notes, reading the assigned sections in the text, and watching the assigned videos.

Variable Types


The proportion of people in each province whose mother-tongue is Turkish (1965). While `mother tongue` is a nominal variable, the variable `proportion of speakers` is numeric. Why the difference? What is actually being measured in each case?

This topic is not well-covered in the text, but it does help you better understand data and variables. Suffice it to say that not all data can be treated the same. For instance, one can subtract temperatures. One cannot subtract hair colors. While both temperature and hair color are variables, they contain different levels (or types) of information. The level of information contained in the data — its possible relationships — determine which mathematical operations we can perform. This, in turn, determines which summary statistics make sense.

In the next two sections, I cover two main types of variables, categorical and numeric. The reason for this partitioning is that different measures of center and different measures of spread make sense for different types of data.

Categorical (Qualitative)

Possible values for a categorical variable can be explicitly enumerated. The following are the only possible values of the eye color variable: {blue, brown, green, other}. Since we were able to list out all possible values, eye color is a categorical variable.

The variable car make is also categorical {foreign, domestic}. So is level in school: {freshman, sophomore, junior, senior, graduate}. So is employment level: {unemployed, employed}. So is mother tongue: {English, German, Spanish, Swedish, French, Turkish, Japanese, Chinese, Other}.

Note that these examples have something in common: It does not make sense to talk about distance between the values. English minus Turkish is…? The very operation does not make sense. As such, any statistic measured on this data must forgo the use of addition, subtraction, multiplication, etc. of the values. In fact, the only mathematical operation that makes sense is that of counting.

Numeric (Quantitative)

If a variable is quantitative, its values are numbers and those numbers measure something meaningfully. Height is quantitative. Age is quantitative. Temperature is quantitative. IQ is quantitative. Number of hours taken is quantitative. These are all examples of quantitative variables.

Since the measurements are numbers, addition and subtraction make logical sense. One can meaningfully discuss the difference (or distance) between two values. The difference between temperature of 280K and 300K is 20K. The difference between an IQ of 90 and one of 100 is 10. The difference between a person who took 60 hours and one who took 6 is 54 hours. The distance between a person with blue eyes and one with brown eyes is…? I don’t know either. Eye color is not a numeric variable.

Note that not only does subtraction make sense, so does its sister, addition. That fact will be used when determining appropriate statistics to summarize the variable.

Why it Matters

This mini-lecture covered two classes of variables, categorical and numeric. It emphasized that the two types of variables contain different information, which means that different arithmetic can be done for each. Since addition does not make sense for categorical variables, no summary statistic on categorical variables can use addition.

With that in mind, we have the following allowable statistics for each variable type:

	Categorical	Numeric

Measure of Center:	Mode	Mean
		Median

Measure of Spread:	None	Standard Deviation
		Variance
		Interquartile Range

Notice that the statistics for the categorical variables are based on counts. The statistics for the numeric variables include addition and subtraction in their formulas. Why? Because addition and subtraction make sense for numeric variables.

Check It

To check your understanding of this mini-lecture, please classify the following variables as either categorical or numeric. The grey box beneath the variable contains the answer and explanation.

age
Numeric. Because age is represented with numbers, it is a numeric variable.
time of day
Numeric. Because it makes sense to calculate the difference in times (elapsed time), this is a numeric variable.
nationality
Categorical. Because it does not make sense to calculate the difference in nationalities (Welsh − Russian = ???), this is a categorical variable.
car make
Categorical. Because it does not make sense to calculate the difference in car makes (Ford − Toyota = ???), this is a categorical variable.
number of hairs on head
Numeric. Because you are counting something, this is a numeric variable.
amount of money in my wallet
Numeric. Because you are counting something, this is a numeric variable.

That is it. In this mini-lecture, we looked at different variable types. The variable type signifies the information contained in the variable. As such, it determines which summary statistics you can and cannot calculate.