Statistics and Probability - CSEC Mathematics

Introduction to Statistics

Statistics is the science of collecting, organizing, analyzing, interpreting, and presenting data.

Types of Data

Data Type Description Examples
Qualitative Descriptive, non-numerical Colors, gender, types of fruit
Quantitative Numerical, can be measured Height, weight, test scores
Discrete Countable, whole numbers Number of students, cars in a parking lot
Continuous Measurable, can take any value in a range Time, temperature, distance

Data Collection and Presentation

Methods of Data Collection

Data Presentation

Data can be presented in various ways:

Frequency Tables

Score Range Frequency
0-10 2
11-20 5
21-30 8
31-40 10
41-50 5

Bar Charts

A
B
C
D
E

Pie Charts

Category A (30%)   Category B (30%)   Category C (40%)

Line Graphs

Jan Feb Mar Apr May Sales

Measures of Central Tendency

These describe the center of a data set:

Mean

The average (sum of all values divided by number of values)

Mean = Σx / n

Median

The middle value when data is ordered (or average of two middle values for even n)

Mode

The most frequently occurring value(s)

Example: Calculating Measures

Find the mean, median, and mode of: 5, 7, 3, 5, 8, 2, 5

Solution:

Ordered data: 2, 3, 5, 5, 5, 7, 8

Mean = (5+7+3+5+8+2+5)/7 = 35/7 = 5

Median = 5 (4th value)

Mode = 5 (appears most frequently)

Measures of Dispersion

These describe how spread out the data is:

Range

Difference between highest and lowest values

Range = Max - Min

Variance

Average of squared differences from the mean

σ² = Σ(x - μ)² / n

Standard Deviation

Square root of variance (shows spread in original units)

σ = √σ²

Example: Calculating Dispersion

Find the range and standard deviation of: 2, 4, 6, 8, 10

Solution:

Mean = (2+4+6+8+10)/5 = 6

Range = 10 - 2 = 8

Variance = [(2-6)² + (4-6)² + (6-6)² + (8-6)² + (10-6)²]/5 = (16+4+0+4+16)/5 = 8

Standard deviation = √8 ≈ 2.83

Introduction to Probability

Probability is the measure of how likely an event is to occur.

0 0.5 1

Probability scale from 0 (impossible) to 1 (certain)

Basic Probability Concepts

Probability Formula

P(E) = Number of favorable outcomes / Total number of possible outcomes

Example: Simple Probability

What is the probability of rolling a 5 on a standard die?

Solution:

P(5) = 1 (favorable outcome) / 6 (possible outcomes) = 1/6 ≈ 0.1667 or 16.67%

Probability Rules

Complement Rule

P(not E) = 1 - P(E)

Addition Rule (Mutually Exclusive Events)

P(A or B) = P(A) + P(B)

Addition Rule (Non-Mutually Exclusive Events)

P(A or B) = P(A) + P(B) - P(A and B)

Multiplication Rule (Independent Events)

P(A and B) = P(A) × P(B)

Example: Probability Rules

In a deck of 52 cards, what is the probability of drawing a heart or a king?

Solution:

P(Heart) = 13/52

P(King) = 4/52

P(King of Hearts) = 1/52

P(Heart or King) = 13/52 + 4/52 - 1/52 = 16/52 ≈ 0.3077 or 30.77%

Probability Distributions

Discrete Probability Distribution

A table or function that lists all possible values of a discrete random variable with their probabilities.

Number of Heads (x) P(x)
0 0.25
1 0.50
2 0.25

Binomial Distribution

For experiments with:

P(x successes) = C(n,x) × pˣ × (1-p)ⁿ⁻ˣ

Example: Binomial Probability

What's the probability of getting exactly 3 heads in 5 coin tosses?

Solution:

n = 5, x = 3, p = 0.5

P(3) = C(5,3) × (0.5)³ × (0.5)² = 10 × 0.125 × 0.25 = 0.3125 or 31.25%

Normal Distribution

A continuous probability distribution that is symmetric about the mean, showing that data near the mean are more frequent in occurrence.

Mean 68% within 1σ 95% within 2σ 99.7% within 3σ

Empirical Rule (68-95-99.7 Rule)

Correlation and Regression

Correlation Coefficient (r)

Measures the strength and direction of linear relationship between two variables (-1 ≤ r ≤ 1)

Simple Linear Regression

Finds the line of best fit: y = a + bx

Where:

Scatter Plot with Line of Best Fit

Glossary of Terms

Correlation
A statistical relationship between two variables.
Event
A set of outcomes from an experiment.
Mean
The average of a set of numbers.
Median
The middle value in an ordered data set.
Mode
The most frequently occurring value in a data set.
Normal Distribution
A symmetric, bell-shaped distribution of data.
Probability
A measure of the likelihood of an event occurring.
Range
The difference between the highest and lowest values.
Standard Deviation
A measure of how spread out numbers are.
Variance
The average of the squared differences from the mean.

Self-Assessment Questions

  1. Calculate the mean, median, and mode of: 7, 3, 5, 9, 5, 2, 5
  2. Ordered: 2, 3, 5, 5, 5, 7, 9

    Mean = (7+3+5+9+5+2+5)/7 = 36/7 ≈ 5.14

    Median = 5 (4th value)

    Mode = 5 (appears most frequently)

  3. Find the range and standard deviation of: 10, 12, 14, 16, 18
  4. Mean = (10+12+14+16+18)/5 = 14

    Range = 18 - 10 = 8

    Variance = [(10-14)² + (12-14)² + (14-14)² + (16-14)² + (18-14)²]/5 = (16+4+0+4+16)/5 = 8

    Standard deviation = √8 ≈ 2.83

  5. What is the probability of drawing a red card or a queen from a standard deck?
  6. P(Red) = 26/52

    P(Queen) = 4/52

    P(Red Queen) = 2/52

    P(Red or Queen) = 26/52 + 4/52 - 2/52 = 28/52 ≈ 0.5385 or 53.85%

  7. If the probability of rain tomorrow is 0.3, what is the probability it won't rain?
  8. P(No rain) = 1 - P(Rain) = 1 - 0.3 = 0.7 or 70%

  9. Calculate the probability of getting exactly 2 heads in 3 coin tosses.
  10. Possible outcomes: HHH, HHT, HTH, THH, HTT, THT, TTH, TTT

    Favorable outcomes: HHT, HTH, THH (3 outcomes)

    Total outcomes: 8

    P(2 heads) = 3/8 = 0.375 or 37.5%

  11. For a normal distribution with mean 100 and standard deviation 15, what percentage of values lie between 85 and 115?
  12. 85 = μ - σ (100 - 15)

    115 = μ + σ (100 + 15)

    ≈68% of values lie within 1 standard deviation of the mean

  13. Create a frequency table for: A, B, B, C, A, B, D, A, C, B
  14. Category Frequency
    A 3
    B 4
    C 2
    D 1
  15. If P(A) = 0.4 and P(B) = 0.3, and A and B are independent, find P(A and B).
  16. P(A and B) = P(A) × P(B) = 0.4 × 0.3 = 0.12 or 12%

  17. Find the median of: 12, 7, 15, 9, 3, 18, 6
  18. Ordered: 3, 6, 7, 9, 12, 15, 18

    Median = 9 (4th value)

  19. Calculate the probability of rolling a sum of 7 with two standard dice.
  20. Possible combinations: (1,6), (2,5), (3,4), (4,3), (5,2), (6,1) → 6 outcomes

    Total possible outcomes: 6 × 6 = 36

    P(sum=7) = 6/36 = 1/6 ≈ 0.1667 or 16.67%