Huffman Coding

Background
- Developed by David A. Huffman in 1952 while he was a Ph.D. student at MIT (Huffman, 1952)
- Created as a more efficient alternative to other variable-length encoding schemes of the time
Definition:
- Huffman coding is a lossless data compression algorithm that assigns variable-length codes to characters based on their frequency of occurrence (Cormen et al., 2009)
- It creates a binary tree (Huffman tree) where more frequent characters are assigned shorter codes and less frequent characters longer codes (Sayood, 2017)
- The algorithm guarantees a prefix-free code, meaning no codeword is a prefix of another codeword (Blelloch, 2013)
Practical uses:
- Text compression in file compression utilities like zip and gzip (Salomon & Motta, 2010)
- Image compression in formats like JPEG and PNG (Salomon & Motta, 2010)
- Part of more complex compression algorithms like DEFLATE used in ZIP files (Salomon & Motta, 2010)
Common Misconceptions
- Huffman coding is not always the most efficient compression method for all types of data (Sayood, 2017)
- It does not work well for sources with uniform symbol distribution (Sayood, 2017)

In JPEG compression, Huffman coding is applied to the quantized DCT coefficients
More frequent coefficients (often zeros) get shorter codes
This contributes to JPEG’s efficient compression of photographic images (Salomon & Motta, 2010)

Literature Review

A Method for the Construction of Minimum-Redundancy Codes
Key points:
- Introduces the Huffman coding algorithm
- Proves that the algorithm produces an optimal prefix code
- Demonstrates how to construct the Huffman tree

Introduction to Algorithms, Third Edition
Key points:
- Provides a detailed explanation of the Huffman coding algorithm
- Analyzes the time complexity of the algorithm
- Proves the optimality of Huffman codes

Lossless CompressionDataCompression
- Huffman coding is a type of lossless compression, meaning the original data can be perfectly reconstructed from the compressed data
Entropy CodingInformationTheory
- Huffman coding is a type of entropy coding, which aims to compress data close to its information-theoretic limit
Shannon-Fano CodingDataCompression
- Another variable-length encoding scheme that predates Huffman coding but is generally less efficient

2024-10-17 23:28 최근에는 특히 adaptive models에서 Huffman Coding보다 arithmetic coding이 더 많이 사용됩니다. 그 이유는 adaptive 모델에서 기호의 확률이 지속적으로 업데이트되기 때문에, arithmetic coding이 더 효율적으로 연속적인 확률 분포를 처리할 수 있기 때문입니다. 반면, static Huffman은 정적 확률 분포에서 여전히 빠르고 효율적인 방식입니다.