What is Entropy?
Entropy is a fundamental concept in Information Theory that measures the uncertainty or unpredictability of a random variable. It quantifies the expected amount of information or “surprise” inherent in the possible outcomes of the variable.
Definition of Entropy
For a discrete random variable with possible outcomes occurring with probabilities , the entropy is defined as:
- : Entropy of the random variable .
- : Probability that takes the value .
- : Logarithm function (commonly base 2 for bits or base for nats).
This formula calculates the expected value of the information content of the outcomes.
Information Content ()
The information content (or self-information) of an outcome is given by:
-
Represents the amount of “surprise” associated with the occurrence of .
-
Lower probability events yield higher information content.
-
Why Use the Logarithm?
- Monotonic Decrease: As probability increases, information content decreases logarithmically.
- Additivity: Logarithms convert multiplication into addition, which is essential for combining independent events.
Derivation of Entropy
Entropy is the expected information content across all possible outcomes of :
Substituting :
This formula sums the product of each outcome’s probability and its information content.
Why Use Logarithms in Entropy?
Properties of the Logarithm in Entropy
1. Log Converts Multiplication into Addition
For independent events and :
- Additivity of Information: The total information content of independent events is the sum of their individual information contents.
2. Logarithm Reflects Information Content
- Lower Probabilities Yield Higher Logs: Rare events provide more information when they occur.
- Continuous Scale: Logarithms provide a smooth and continuous measure of information content.
Handling Independent Events
For independent events, the joint probability is the product of individual probabilities:
Using logarithms:
- Additive Information: This property aligns with our intuitive understanding that knowing two independent events provides the sum of their individual information.
Measuring Surprise
- Logarithmic Scale: Captures the diminishing returns of additional information from more probable events.
- Consistency: Provides a consistent method to quantify information across different probability distributions.
Interpretation of Entropy
- High Entropy: Indicates a high level of uncertainty or unpredictability in the outcomes.
- Low Entropy: Indicates that the outcomes are more predictable or certain.
Entropy answers the question:
“How surprised should we expect to be about the outcome of an event?”
Real-World Examples
1. Coin Toss
Consider a fair coin toss:
- Possible Outcomes: Heads () or Tails ().
- Probabilities: , .
Entropy calculation:
So, Each toss provides 1 bit of information.
Biased Coin Toss with Biased Coin Toss with 80% Probability of Heads
- Toss a biased coin 4 times.
- Objective: Determine how many bits are needed to store the outcome on average, and demonstrate with an example encoding that uses fewer than 4 bits on average.
Consider a biased coin where:
For this biased coin:
Calculating each term:
Substitute the values:
- The entropy for a single toss with is bits.
- Assuming independence, the total entropy for 4 tosses is:
- Entropy for tossing independently 4 times
This means we can reduce the number of bits, actually to less than 4 bits!
- If all outcomes are heads, send a single bit ‘1’.
- If any toss is tails, send ‘0’ followed by four additional bits to record the exact sequence (where heads = 1, tails = 0).
Examples:
-
The probability of all heads is :
-
For all heads, we use 1 bit.
-
For any other sequence, we use 5 bits.
-
The expected number of bits is:
Simplifying this expression:
Therefore, on average, we only need 3.36 bits to encode the outcome of the 4 tosses, which is fewer than 4 bits. 2.89 bits is just theoretical information underline.
2. Weather Forecast
Suppose the probability of it raining on a given day is , and not raining is .
Entropy calculation:
- Interpretation: There’s less uncertainty because one outcome is much more likely.
3. Language Text Compression
In English text:
- The letter “e” appears frequently, with a high probability.
- The letter “z” appears less frequently, with a low probability.
Entropy helps in designing compression algorithms:
- High-frequency letters: Less information per occurrence, can be encoded with shorter codes.
- Low-frequency letters: More information per occurrence, require longer codes.
Conclusion
Entropy is a crucial concept that quantifies the expected amount of information or surprise from a random variable’s outcomes. By using logarithms, we can:
- Add Information Content: Simplify the calculation of combined information from independent events.
- Reflect Uncertainty: Accurately represent how likely or surprising an event is.
Understanding entropy and its properties is essential for fields like information theory, data compression, communication systems, and machine learning.