You Can Say That Again!
Many computer users are familiar with compressed formats such as zip, gzip, or gif images. These are based on a method called Ziv-Lempel coding, which turns out to be an interesting exercise in finding patterns in text.
Children’s rhymes and stories are good examples for text compression, because they often involve repeated words and sequences.
Activity description (PDF)
- Instructions for Text Compression activity (English)
- Text Compression (French)
- Text Compression (Greek)
- Text Compression (Italian)
- Text Compression (Persian -Farsi)
- Text Compression (Polish)
- Text Compression (Portugese)
- Text Compression (Russian)
- Text Compression (Slovenian)
- Text Compression (Turkish)
- An older version of this activity can be downloaded in PDF format here. The content is similar to the current version, but there’s some extra technical information.
- The Computer Science Field Guide has a chapter on Compression Coding which includes an interactive version of this activity where you can enter your own text to be compressed.
- Mordechai (Moti) Ben-Ari from the Weizmann Institute of Science, Israel has programmed the Text Compression Unplugged activity in Scratch which can be downloaded in a zip file of the complete set of activities.
- The Royal Institution UK and Microsoft Research together have produced an activity in Information Theory explaining the concept of compression for the classroom called Information Salvation.
- The Compressor Head series has a number of humorous videos aimed at programmers, explaining how a number of compression methods work.
- Wikipedia: LZ77 and LZ78 algorithms
- The Mathmaniacs web site has a similar activity (lesson 2).
- Howstuffworks.com article on file compression
- CSFN has the following articles that demonstrate compression:
- Centre for Innovation in Mathematics Teaching has the following teaching packages in compression related topics developed to teach Codes and Ciphers in their Maths Curriculum:
- Wolfram Demonstrations Project has the following demonstration activities. Note: You will need to install the Wolfram CDF Player in order to use these activities. You can either download each demonstration or use your browser to run it.
- Audio File Size Calculator which calculates the uncompressed file size, in bytes, required for a variety of digital audio file size output configurations
- Binary Run-Length Encoding which splits data into runs of zeros and ones. A list of binary distinctions can then be encoded as a list of run-lengths
- Compression – Computerphile
- Entropy in Compression – Computerphile
- EXTRA BITS – Text Compression Meets Probabilities – Computerphile
- Elegant Compression in Text (The LZ 77 Method) – Computerphile
Great Principles of Computer Science [info]
- Recollection, Communication
ACM K12 Curriculum [info]Expand
- Level I (Grades 3-5) Topic 11: develop a simple understanding of an algorithm
New Zealand Curriculum [info]Expand
- Technology Level 1: Technological systems
- Understand that technological systems have inputs, controlled transformations, and outputs.
- Technology Level 3: Technological systems
- Understand that technological systems are represented by symbolic language tools and understand the role played by the black box in technological systems.