Text Compression
You Can Say That Again! #
Many computer users are familiar with compressed formats such as zip, gzip, or gif images. These are based on a method called Ziv-Lempel coding, which turns out to be an interesting exercise in finding patterns in text.
Children’s rhymes and stories are good examples for text compression, because they often involve repeated words and sequences.
Activity description (PDF) #
Translations and other versions #
- Chinese language version
- French language version
- Greek language version
- Italian language version
- Persian - Farsi language version
- Polish language version
- Portuguese language version
- Russian language version
- Slovenian language version
- Turkish language version
Related resources #
An older version of this activity can be downloaded in PDF format here. The content is similar to the current version, but there’s some extra technical information.
More lessons and activities #
- The Computer Science Field Guide has a chapter on Compression Coding which includes an interactive version of this activity where you can enter your own text to be compressed.
- The Royal Institution UK and Microsoft Research together have produced an activity in Information Theory explaining the concept of compression for the classroom called Information Salvation.
- The Mathmaniacs web site has a similar activity (lesson 2).
- Centre for Innovation in Mathematics Teaching has the following teaching packages in compression related topics developed to teach Codes and Ciphers in their Maths Curriculum:
- Huffman Codes: Teacher Guide, Student Guide, OHP Slides, Lesson Plans
- Arithmetic Coding: Teacher Guide, Student Guide, OHP Slides, Lesson Plans
- [Registration and software download required] Wolfram Demonstrations Project has the following demonstration activities. Note: You will need to install the Wolfram CDF Player in order to use these activities. You can either download each demonstration or use your browser to run it.
- Audio File Size Calculator which calculates the uncompressed file size, in bytes, required for a variety of digital audio file size output configurations
- Binary Run-Length Encoding which splits data into runs of zeros and ones. A list of binary distinctions can then be encoded as a list of run-lengths
- CSFN has the following articles that demonstrate compression:
If you want to find out more #
Videos #
- The Compressor Head series has a number of humorous videos aimed at programmers, explaining how a number of compression methods work.
- Compression – Computerphile
- Entropy in Compression – Computerphile
- EXTRA BITS – Text Compression Meets Probabilities – Computerphile
- Elegant Compression in Text (The LZ 77 Method) – Computerphile
Additional resources #
- Mordechai (Moti) Ben-Ari from the Weizmann Institute of Science, Israel has programmed the Text Compression Unplugged activity in Scratch which can be downloaded in a zip file of the complete set of activities.
Curriculum Links #
Great Principles of Computer Science #
- Recollection
- Communication
ACM K12 Curriculum #
- Level I (Grades 3-5) Topic 11: develop a simple understanding of an algorithm