2 Descriptive statistic: Tabular and Graphical Presentations

2.1 Summarizing Categorical Data

2.1.1 Frequency Distribution

A frequency distribution is a tabular summary of data showing the number (frequency) of items in each of several non overlapping classes.

Example 2.1 Consider the following data shown in Table 2.1.

Table 2.1: Data from a sample of 50 soft drink purchases

Coke Classic	Coke Classic	Coke Classic
Diet Coke	Diet Coke	Coke Classic
Pepsi	Coke Classic	Pepsi
Diet Coke	Diet Coke	Dr. Pepper
Coke Classic	Coke Classic	Coke Classic
Coke Classic	Sprite	Diet Coke
Dr. Pepper	Pepsi	Pepsi
Diet Coke	Coke Classic	Pepsi
Pepsi	Coke Classic	Pepsi
Pepsi	Coke Classic	Pepsi
Coke Classic	Pepsi	Coke Classic
Dr. Pepper	Coke Classic	Dr. Pepper
Sprite	Sprite	Pepsi
Coke Classic	Dr. Pepper	Sprite
Diet Coke	Pepsi	Coke Classic
Coke Classic	Diet Coke	Sprite
Coke Classic	Pepsi

Now we will construct a frequency distribution by simply counting each type of soft-drink.

Table 2.2: Frequency distribution of Soft Drink Purchases

Soft Drink	Frequency
Coke Classic	19
Diet Coke	8
Dr. Pepper	5
Pepsi	13
Sprite	5

Relative Frequency and Percent Frequency Distributions

Relative Frequency $=\frac{Frequency \ \ of \ \ the \ \ class}{n}$
The percent frequency of a class is the relative frequency multiplied by 100.

2.1.2 Bar Charts and Pie Charts

Bar chart: A graphical device for depicting qualitative data that have been summarized in a frequency, relative frequency, or percent frequency distribution.
Pie chart: A graphical device for presenting data summaries based on subdivision of a circle into sectors that correspond to the relative frequency for each class.

From the frequency table of soft drinks purchase, we will develop relative and percent frequency distribution (see Table 2.3) and will construct a bar-chart and pie-chart.

Table 2.3: Frequency, Relative And Percent Frequency Distributions Of Soft Drink Purchases

Soft Drink	Frequency (f)	Relative Frequency(Rf)	Percent Frequency (Pf)
Coke Classic	19	0.38	38
Diet Coke	8	0.16	16
Dr. Pepper	5	0.10	10
Pepsi	13	0.26	26
Sprite	5	0.10	10

Now we construct a bar chart and pie chart.

Figure 2.1: Bar chart of Soft drink purchases

Figure 2.2: Pie chart of Soft drink purchases

2.1.3 Cross-tabulation and its graphical presentation

Cross-tabulation (also known as crosstabs or contingency tables) is a fundamental data analysis technique used to examine the relationship between two or more categorical variables. For instance, consider the following cross-table between the gender and fitness-level of 30 individuals:

Table 2.4: Cross-Tabulation of Gender by Fitness Level

Gender	Excellent	Fair	Good	Poor	Grand Total
Male	2	7	3	3	15
Female	5	3	5	2	15
Grand Total	7	10	8	5	30

We can show the information of cross-tab using (a) Clustered column chart or (b) Stacked column chart (see Figure 2.3 ).

Figure 2.3: Visulization of relationship between two categorical variables

2.2 Summarizing Quantitative Data

2.2.1 Frequency Distribution of quantitative data

Consider the following data.

YEAR-END AUDIT TIMES (IN DAYS): 12, 14, 19, 18, 15, 15, 18, 17, 20, 27, 22, 23, 22, 21, 33, 28, 14, 18, 16, 13,

To construct a frequency distribution we have to

Determine the number of non overlapping classes(k).
Determine the width of each class.
Determine the class limits.

2.2.2 Frequency Distribution of Audit time data

Here, $n=20$, Smallest value=12, Largest value=33.

Determine number of classes, $k$ as : $k=\sqrt n=\sqrt 20=4.47\approx5$. So $5$ is the number of classes.
Class width $w$ as: $w=\frac{Largest-Smallest}{k}=\frac{33-12}{5}=4.2\approx 5$
Class limits: Start from near smallest value (12) say from $10$ we have the following classes (exclusive method-where upper bound of the class is excluded):

[10,15), [15,20), [20,25), [25,30), and [30,35)

Now count the data values in corresponding classes and thus we have the frequency distribution. Once we have the frequency distribution then we also can produce the relative and percent frequency distribution (Table 2.5 ).

Table 2.5: Frequency, relative frequency (rf) and percent frequency (pf) distribution for the audit time data (n=20)

Audit Time (days)	Frequency (f)	Relative frequency (rf)	Percent frequency(pf)
[10,15)	4	0.20	20
[15,20)	8	0.40	40
[20,25)	5	0.25	25
[25,30)	2	0.10	10
[30,35)	1	0.05	5

2.2.3 Histogram

A common graphical presentation of quantitative data is a histogram. This graphical summary can be prepared for data previously summarized in either a frequency, relative frequency, or percent frequency distribution.

Figure 2.4: Histogram for the Audit Time data

Important Note: A relative frequency/ percent frequency histogram is ideal for comparing distributions across groups of different sizes, as it displays proportions instead of raw counts, allowing fair and meaningful comparisons.

Illustration ( see Figure 2.5)

A dataset of Marks (out of 100) was collected from two student groups:

70 Female students
40 Male students

The goal is to compare the distribution of marks between these two groups.

The frequency histogram shows how many students fall into each marks range (bin). However, since the number of female students (70) is greater than male students (40), their bars are naturally taller — even if the relative performance is similar. This makes direct comparison unfair and misleading.

A relative frequency histogram shows the proportion of students in each bin within each group. By dividing counts by the total number in the group:

It normalizes the data,
Allows for fair comparisons between groups of different sizes,
Highlights true differences in distribution, not just differences in group size.

Example Observation

From the RF histogram:

Around 40% of males scored between 50–60.
Around 39% of females scored between 50–60.

This comparison is valid only because the histograms show relative frequency, not raw counts.

Figure 2.5: Comparison between Frequency histogram and relative frequency histogram

2.2.4 HISTOGRAM and shape of the distribution

See Figure 2.6 .

Figure 2.6: Histograms Showing Differing Levels of Skewness

2.2.5 Cumulative Distributions and Ogive

A variation of the frequency distribution that provides another tabular summary of quantitative data is the cumulative frequency distribution. Table 2.6 shows the cumulative relative frequency of Audit Time data.

Table 2.6: Frequency, relative frequency and Cumulative relative frequency distribution of Audit Time Data

Audit Time (days)	Frequency (f)	Relative frequency (rf)	Cumulative relative frequency(crf)
[10,15)	4	0.20	0.20
[15,20)	8	0.40	0.60
[20,25)	5	0.25	0.85
[25,30)	2	0.10	0.95
[30,35)	1	0.05	1.00

2.2.6 Ogive

Another way of presenting this information is the ogive, which is a graphical representation of the cumulative relative frequencies. Figure 2.7 is the drawn ogive for the cumulative relative frequency for Audit time data.

For instance, from both Table 2.6 and Figure 2.7 we can say (estimate) that 60% of audits took less than 20 days . Similarly, 95% of audits took less than 30 days and so on.

2.2.7 The Stem-and-Leaf Display

The techniques of exploratory data analysis consist of simple arithmetic and easy-to-draw graphs that can be used to summarize data quickly. One technique—referred to as a stem-and-leaf display—can be used to show both the rank order and shape of a data set simultaneously (Anderson and Sweeney 2011).

Steps to Construct a Stem-and-Leaf Diagram

(1) Divide each number into two parts: a stem, consisting of one or more of the leading digits, and a leaf, consisting of the remaining digit.

(2) List the stem values in a vertical column.

(3) Record the leaf for each observation beside its stem.

(4) Write the units for stems and leaves on the display.

Example 2.2 Here are the number of questions answered correctly on an aptitude test given to 50 individuals recently interviewed for a position at Haskens Manufacturing.

112, 72, 69, 97, 107,73, 92, 76, 86, 73, 126, 128, 118, 127, 124,82, 104, 132, 134, 83, 92, 108, 96, 100, 92,115, 76, 91, 102, 81, 95, 141, 81, 80, 106,84, 119, 113, 98, 75, 68, 98, 115, 106, 95,100, 85, 94, 106, 119


  The decimal point is 1 digit(s) to the right of the |

   6 | 89
   7 | 233566
   8 | 01123456
   9 | 12224556788
  10 | 002466678
  11 | 2355899
  12 | 4678
  13 | 24
  14 | 1

Exception

In some data sets, providing more classes or stems may be desirable. One way to do this would be to modify the original stems as follows: For example, divide stem 5 into two new stems, 5L and 5U. Stem 5L has leaves 0, 1, 2, 3, and 4, and stem 5U has leaves 5, 6, 7, 8, and 9. This will double the number of original stems. However, there may be various type of data in practical situations. So, we have to figure out the suitable stem-and-leaf plot.

Example 2.3: Construct a stem-and-leaf plot from the following data:

88.5, 98.8, 89.6, 92.2, 92.7, 88.4, 87.5, 90.9, 94.7, 88.3, 90.4, 83.4, 87.9, 92.6, 87.8, 89.9, 84.3, 90.4, 91.6, 91.0


  The decimal point is 1 digit(s) to the right of the |

  8 | 34
  8 | 888889
  9 | 0000112233
  9 | 59


  The decimal point is at the |

  82 | 4
  84 | 3
  86 | 589
  88 | 34569
  90 | 44906
  92 | 267
  94 | 7
  96 | 
  98 | 8

Example 2.4 (Another example): Construct a stem-and-leaf plot from the following data: 7,8,2,1,8,3,5,7,1,2,2,5,8,5,5,7,8,7,5,3

Solution:


  The decimal point is at the |

  1 | 00
  2 | 000
  3 | 00
  4 | 
  5 | 00000
  6 | 
  7 | 0000
  8 | 0000

2.3 Exercises

2.1 A doctor’s office staff studied the waiting times for patients who arrive at the office with a request for emergency service. The following data with waiting times in minutes were collected over a one-month period.

2, 5, 10, 12, 4, 4, 5, 17, 11, 8, 9, 8, 12, 21, 6, 8, 7, 13, 18, 3

Use class interval/width of 5 in the following (start your class limit from 0):

Show the frequency distribution.
Show the relative frequency distribution.
Show the cumulative frequency distribution.
Show the cumulative relative frequency distribution.
What proportion of patients needing emergency service wait less than 10 minutes or less?

2.2 A shortage of candidates has required school districts to pay higher salaries and offer extras to attract and retain school district superintendents. The following data show the annual base salary ($1000s) for superintendents in 20 districts in the greater Rochester, New York, area (The Rochester Democrat and Chronicle, February 10, 2008).

187, 184, 174, 185, 175, 172, 202, 197, 165, 208, 215, 164, 162, 172, 182, 156, 172, 175, 170, 183

Use appropriate number classes/ class width in the following.

Show the frequency distribution.
Show the percent frequency distribution.
Show the cumulative percent frequency distribution.
Develop a histogram for the annual base salary.
Do the data appear to be skewed? Explain.
Which salary range belongs to the highest percentage of superintendents ?

187, 184, 174, 185, 175, 172, 202, 197, 165, 208, 215, 164, 162, 172, 182, 156, 172, 175, 170, 183

2.3 NRF/BIG research provided results of a consumer holiday spending survey (USA Today, December 20, 2005). The following data provide the dollar amount of holiday spending for a sample of 25 consumers.

1200, 850, 740, 590, 340, 450, 890, 260, 610, 350, 1780, 180, 850,2050, 770, 800, 1090, 510, 520, 220, 1450, 280, 1120, 200 350

What is the lowest holiday spending? The highest?
Use a class width of $250 to prepare a frequency distribution and a percent frequency distribution for the data.
Prepare a histogram and comment on the shape of the distribution.
What observations can you make about holiday spending?

1200, 850, 740, 590, 340, 450, 890, 260, 610, 350, 1780, 180, 850,2050, 770, 800, 1090, 510, 520, 220, 1450, 280, 1120, 200, 350

2.4 Construct a stem-and-leaf display for the following data.

70, 72, 75, 64, 58, 83, 80, 82, 76, 75, 68, 65, 57, 78, 85, 72

2.5 Construct a stem-and-leaf display for the following data.

11.3, 9.6, 10.4, 7.5, 8.3, 10.5, 10.0, 9.3, 8.1, 7.7, 7.5, 8.4, 6.3, 8.8

2.6 A psychologist developed a new test of adult intelligence. The test was administered to 20 individuals, and the following data were obtained.

114, 99, 131, 124, 117, 102, 106, 127, 119, 115,98, 104, 144, 151, 132, 106, 125, 122, 118, 118

Construct a stem-and-leaf display for the data.

2.4 Line Chart

2.5 Scatter diagram

2.6 Case Study: Lifestyle Indicators and Preferences

You’re given data from a cross-sectional survey of 30 individuals. Variables include Age, Gender, Monthly Income, Hours of Exercise/Week , Fitness Level, Favorite Fruit and Temperature preference. This data-set enables practice with data visualization, scale classification, and relationship analysis using tools like bar charts, scatter plots, and histograms.

ID	Age	Gender	Monthly Income (USD)	Hours of Exercise/Week	Favorite Fruit	Fitness Level	Temperature Preference (°C)
1	25	Male	1520.75	2.5	Mango	Fair	23
2	32	Female	2280.50	1.2	Apple	Poor	22
3	20	Male	925.10	4.8	Banana	Good	19
4	29	Female	1980.90	3.3	Orange	Fair	22
5	40	Male	3055.45	0.7	Apple	Poor	21
6	23	Female	1425.00	5.9	Mango	Excellent	20
7	37	Male	3625.80	2.1	Banana	Fair	18
8	31	Female	2520.00	1.5	Apple	Poor	27
9	27	Male	1685.25	3.7	Orange	Good	22
10	22	Female	1190.00	6.2	Mango	Excellent	19
11	34	Male	3180.45	0.0	Banana	Poor	24
12	26	Female	1830.60	4.3	Orange	Good	19
13	39	Male	3999.00	1.4	Apple	Fair	22
14	24	Female	1335.75	5.6	Banana	Excellent	26
15	28	Male	2125.35	2.2	Mango	Fair	27
16	33	Female	2885.50	3.1	Orange	Good	22
17	21	Male	960.00	6.0	Apple	Excellent	18
18	30	Female	2190.00	1.7	Banana	Fair	27
19	36	Male	3425.90	2.3	Mango	Fair	27
20	29	Female	1920.40	4.5	Orange	Good	20
21	35	Male	3095.25	3.0	Apple	Good	23
22	22	Female	1125.00	5.1	Banana	Excellent	27
23	38	Male	4180.60	1.8	Mango	Fair	22
24	30	Female	2590.00	2.4	Orange	Fair	20
25	41	Male	3540.85	0.9	Apple	Poor	22
26	28	Female	1875.50	4.0	Banana	Good	24
27	24	Male	1275.00	6.7	Mango	Excellent	21
28	31	Female	2390.25	3.5	Orange	Good	25
29	36	Male	2965.80	2.6	Banana	Fair	22
30	21	Female	985.00	5.4	Apple	Excellent	25

Tasks

Identify the scale of measurement of each variable.
Construct suitable graph/chart for variables like gender, fitness level etc.
Construct relative frequency histogram for variables like age, Monthly Income (USD), etc. Describe what you have learned.
Plot Hours of Exercise/Week vs Temperature Preference (°C). What is your conclusion.
Draw the scatter plot of Monthly Income vs Hours of Exercise/Week. Is there any relation?
Cross-tabulation

Make a cross-tabulation between Gender and Favorite Fruit. Show the result in a stacked bar-chart.

Make a cross-tabulation between Favorite Fruit and Fitness Level. Show the result in a stacked bar-chart.