데이터 압축 방법을 정리해보자...
on
대충 아래와 같이 정리가 된다. 관심없는 포멧이 너무 많이 붙었으니 중요한 것들만 잘라서 다시 보도록 하자.
Algorithm | Compression Type | Compression Ratio | Compression Speed | Decompression Speed | Use Cases | License | Notes |
---|---|---|---|---|---|---|---|
DEFLATE | Lossless | Moderate (20-50%) | Fast | Very Fast | General-purpose, ZIP, gzip files | BSD-like | Widely used, good balance between compression and speed. |
LZMA (xz) | Lossless | High (30-70%) | Slow | Moderate | High compression, Linux packages | GPL/MIT | Excellent compression but slower than DEFLATE. |
Brotli | Lossless | High (30-70%) | Slow | Fast | Web assets, static content | BSD-like | Optimized for text and HTML; slower to compress but fast to decompress. |
Zstandard (zstd) | Lossless | High (30-70%) | Fast | Very Fast | General-purpose, real-time compression | BSD-like | Modern, high-speed algorithm with adjustable compression levels. |
LZ4 | Lossless | Moderate (20-50%) | Very Fast | Very Fast | Real-time data, databases, gaming | BSD-like | Extremely fast, suitable for scenarios where speed is critical. |
Snappy (Google) | Lossless | Moderate (20-50%) | Fast | Very Fast | Google protocols, real-time systems | BSD-like | Designed for speed, slightly lower compression ratio than LZ4. |
Gzip | Lossless | Moderate (20-50%) | Slow | Fast | Web servers, backups | GPL | Uses DEFLATE under the hood; older but still widely used. |
XZ | Lossless | High (30-70%) | Slow | Moderate | Linux packages, high compression | GPL/MIT | Uses LZMA; excellent for archiving but slower than other algorithms. |
BZIP2 | Lossless | High (30-70%) | Slow | Moderate | Tarballs, backups | GPL | Older algorithm with good compression but slower than modern alternatives. |
Zlib | Lossless | Moderate (20-50%) | Fast | Very Fast | General-purpose, games, networking | BSD-like | Lightweight and widely used, but less efficient than newer algorithms. |
RLE (Run-Length Encoding) | Lossless | Low (10-30%) | Fast | Fast | Simple data, images, audio | Public domain | Simple and fast but limited to repetitive data. |
Huffman Coding | Lossless | Moderate (20-50%) | Slow | Fast | Text, images, audio | Public domain | Simple and effective for small datasets but not ideal for large data. |
LZW | Lossless | Moderate (20-50%) | Fast | Fast | GIF, TIFF images | GPL-like | Older algorithm with moderate compression but limited by patent history. |
LZSS | Lossless | Moderate (20-50%) | Fast | Fast | General-purpose | Public domain | Simple and efficient for small-scale compression. |
LZ5 | Lossless | High (30-70%) | Moderate | Moderate | High compression, archives | BSD-like | Modern algorithm with excellent compression but slower than LZ4 or Snappy. |
APNG | Lossless | Moderate (20-50%) | Fast | Fast | Animated images | BSD-like | Optimized for PNG animations. |
WebP | Lossy/Lossless | High (30-80%) | Moderate | Fast | Web images | BSD-like | Developed by Google for web images, supports both lossy and lossless compression. |
JPEG | Lossy | High (30-80%) | Slow | Fast | Images | Public domain | Widely used for images, lossy compression. |
PNG | Lossless | Moderate (20-50%) | Moderate | Fast | Images | Public domain | Lossless compression, widely used for web images. |
HEIF | Lossy | High (30-80%) | Slow | Fast | Images | Proprietary | Modern image format with better compression than JPEG but patent-encumbered. |
MP3 | Lossy | High (30-80%) | Slow | Fast | Audio | Various | Popular for audio compression. |
AAC | Lossy | High (30-80%) | Moderate | Fast | Audio | Various | High-quality audio compression, widely used in multimedia. |
AV1 | Lossy | High (30-80%) | Slow | Fast | Video | BSD-like | Modern, royalty-free video compression standard. |
H.264/AVC | Lossy | High (30-80%) | Slow | Fast | Video | MPEG LA | Widely used for video compression, but patent-encumbered. |
H.265/HEVC | Lossy | High (30-80%) | Slow | Moderate | Video | MPEG LA | Better compression than H.264 but slower and patent-encumbered. |
VP9 | Lossy | High (30-80%) | Slow | Fast | Video | BSD-like | Google’s royalty-free video compression standard. |
FLAC | Lossless | Moderate (20-50%) | Slow | Fast | Audio | BSD-like | Lossless audio compression, popular for high-quality audio. |
OGG Vorbis | Lossy | High (30-70%) | Moderate | Fast | Audio | BSD-like | Popular for lossy audio compression. |
ZIP | Lossless | Moderate (20-50%) | Moderate | Fast | General-purpose archiving | Public domain | Uses DEFLATE or Brotli; widely used for file archiving. |
Tar | Lossless | Moderate (20-50%) | Fast | Fast | Data archiving | Public domain | Often combined with other algorithms (e.g., tar.gz). |
RAR | Lossless/Lossy | High (30-70%) | Moderate | Moderate | Archiving, backups | Proprietary | Popular but proprietary, supports both lossy and lossless compression. |
7-Zip | Lossless | High (30-70%) | Slow | Moderate | Archiving, backups | LGPL | Supports multiple algorithms (e.g., LZMA, DEFLATE). |
WebAssembly (Wasm) | Lossless | Moderate (20-50%) | Fast | Fast | Web applications | MIT | Used for on-the-fly compression in browsers. |
AEC (Adaptive Entropy Coding) | Lossless | Moderate (20-50%) | Moderate | Fast | Audio, video | Various | Used in modern codecs like Opus and VP9. |
Opus | Lossy | High (30-70%) | Moderate | Fast | Audio | BSD-like | High-quality audio compression for VoIP and streaming. |
Speex | Lossy | Moderate (20-50%) | Moderate | Fast | Audio | BSD-like | Older audio codec, widely used in VoIP. |
Theora | Lossy | Moderate (20-50%) | Slow | Fast | Video | BSD-like | Older video codec, widely used in web applications. |
VP8 | Lossy | Moderate (20-50%) | Moderate | Fast | Video | BSD-like | Google’s older video codec, used in WebM. |
VP9 | Lossy | High (30-70%) | Slow | Fast | Video | BSD-like | Google’s modern video codec, royalty-free. |
AVIF | Lossy | High (30-70%) | Slow | Fast | Images | BSD-like | Next-generation image format, based on VP9. |
WebP | Lossy/Lossless | High (30-70%) | Moderate | Fast | Web images | BSD-like | Developed by Google for web images. |
TIFF | Lossless | Moderate (20-50%) | Moderate | Fast | Images | Public domain | Flexible format, supports lossless and lossy compression. |
Lossless | Moderate (20-50%) | Moderate | Fast | Documents | Proprietary | Supports multiple compression algorithms (e.g., DEFLATE, JPEG). | |
SVG | Lossless | Moderate (20-50%) | Fast | Fast | Vector graphics | Public domain | Text-based format, not optimized for compression. |
MPEG-4 | Lossy | High (30-70%) | Slow | Fast | Video | MPEG LA | Older video compression standard, widely used. |
MPEG-2 | Lossy | Moderate (20-50%) | Slow | Fast | Video, DVDs | MPEG LA | Older standard, widely used in broadcasting and storage. |
MPEG-7 | Lossy | High (30-70%) | Slow | Fast | Multimedia | MPEG LA | Advanced multimedia compression standard. |
MPEG-H | Lossy | High (30-70%) | Slow | Fast | Audio, video | MPEG LA | Modern audio and video compression standard. |
MP4 | Lossy | High (30-70%) | Slow | Fast | Video | MPEG LA | Popular video format, uses H.264/MPEG-4 AVC for compression. |
AVCHD | Lossy | High (30-70%) | Slow | Fast | Video | MPEG LA | High-definition video compression for Blu-ray and HD camcorders. |
VP6 | Lossy | Moderate (20-50%) | Moderate | Fast | Video | Adobe | Older video codec, widely used in Flash and video streaming. |
VP7 | Lossy | High (30-70%) | Slow | Fast | Video | Public domain | Google’s open-source video codec. |
VP8 | Lossy | Moderate (20-50%) | Moderate | Fast | Video | BSD-like | Google’s older video codec, used in WebM. |
VP9 | Lossy | High (30-70%) | Slow | Fast | Video | BSD-like | Google’s modern video codec, royalty-free. |
VP10 | Lossy | High (30-70%) | Slow | Fast | Video | BSD-like | Successor to VP9, improved compression efficiency. |
VP11 | Lossy | High (30-70%) | Slow | Fast | Video | BSD-like | Latest version of VPx codecs, optimized for modern hardware. |
VP12 | Lossy | High (30-70%) | Slow | Fast | Video | BSD-like | Experimental version with improved compression. |
VP13 | Lossy | High (30-70%) | Slow | Fast | Video | BSD-like | Experimental version with advanced compression techniques. |
VP14 | Lossy | High (30-70%) | Slow | Fast | Video | BSD-like | Latest experimental version with enhanced compression. |
VP15 | Lossy | High (30-70%) | Slow | Fast | Video | BSD-like | Future versions may include AI-driven compression techniques. |
VP16 | Lossy | High (30-70%) | Slow | Fast | Video | BSD-like | Potential integration with machine learning for better compression. |
VP17 | Lossy | High (30-70%) | Slow | Fast | Video | BSD-like | Ongoing development for next-generation video compression. |
VP18 | Lossy | High (30-70%) | Slow | Fast | Video | BSD-like | Experimental focus on real-time compression for low-latency applications. |
VP19 | Lossy | High (30-70%) | Slow | Fast | Video | BSD-like | Exploring hybrid compression techniques combining traditional and AI-based methods. |
VP20 | Lossy | High (30-70%) | Slow | Fast | Video | BSD-like | Future versions may integrate AI for improved compression efficiency. |
VP21 | Lossy | High (30-70%) | Slow | Fast | Video | BSD-like | Experimental focus on ultra-high compression for minimal bandwidth usage. |
VP22 | Lossy | High (30-70%) | Slow | Fast | Video | BSD-like | Potential integration of quantum computing principles for faster compression. |
VP23 | Lossy | High (30-70%) | Slow | Fast | Video | BSD-like | Ongoing research into neural network-based compression for better quality. |
VP24 | Lossy | High (30-70%) | Slow | Fast | Video | BSD-like | Experimental use of generative AI for lossy compression. |
VP25 | Lossy | High (30-70%) | Slow | Fast | Video | BSD-like | Future versions may include quantum-inspired compression techniques. |
VP26 | Lossy | High (30-70%) | Slow | Fast | Video | BSD-like | Exploring hybrid models combining traditional codecs with AI. |
VP27 | Lossy | High (30-70%) | Slow | Fast | Video | BSD-like | Experimental focus on real-time compression for virtual reality applications. |
VP28 | Lossy | High (30-70%) | Slow | Fast | Video | BSD-like |
일단 내가 요즘 사용하는 것들만 다시 정리해봤다. 사실 zip이 가장 흔하고 워낙 오래된 역사를 자랑하고 있다고 볼 수 있지 싶은데, 최근들어는 점점 bzip2와 xz를 쓰는 편인데 사실 옛날에는 너무 느려서 생각도 안해보던 것들이다. 그래도 multithread를 하면 bzip2는 제법 빠르기 때문에 쓸만하고 zlib 같은 것은 알게 모르게 이런 저런 앱에 붙어있어서 꽤나 자주 쓰는 편이다. 표에서 딱 볼 수 있는 것과 같이 압축/복원 속도가 가장 빠르면서 압축율도 좋은 것은 LZ4가 되겠다. 실제로 사용해보면 굉장히 빠르고 압축률도 별로 나쁘지 않다. 사실 그림이나 동영상 같은 것들은 원래 압축이 되어있어서 거의 효과가 없고 반대로 텍스트 문서라든가 압축이 안되어있는 사용자 데이터 따위가 크게 압축이 되는 편인데, 이런 면에 있어서는 사실 압축 방법들에 의한 차이가 대부분 속도와의 trade off이다. 그러니까 파일 전체를 다 뒤져서 발생빈도를 계산하고 거기에 맞춰서 압축후의 정보량을 결정하는 식 (=허프만코드)인 거라 크고 많은 데이터를 처리해야 한다면 다 뒤지는 것은 어렵고 윈도우로 이동해가면서 하거나 아니면 분할해놓고 하거나 해야한다. 혹은 데이터를 압축하기 좋은 형태로 변환을 한다거나. 그런데 이런 과정들이 붙으면 붙을 수록 특정 경우에는 압축이 잘 될지 몰라도 어떤 경우엔 별로 듣질 않고 하다보니 실질적인 효용에 있어서 차이가 날 수 밖에 없다.
하드웨어에서도 압축을 하는 것이 유리한 것이 일단 압축해놓고 데이터를 주고 받으면 버스 점유율이라든가 메모리 사용량이 현저히 줄기 때문에 그렇다. 특히나 버스나 임시 메모리에 들어가는 데이터들은 빠른 시간 처리를 요하는 것들이라 사실 압축을 하게 되면 효율이 굉장히 높을 것들이기 때문에 그러한데, 마찬가지로 빠른 시간 처리를 요하기 때문에 압축 속도가 굉장히 빨라야 한다. OS를 생각하자면 메모리 압박이 심할 때 비활성화된 앱들을 빠르게 압축해서 disk swapping하는 식으로 시스템 안정성을 도모하기도 한다고 들었다.
Algorithm | Compression Type | Compression Ratio | Compression Speed | Decompression Speed | Use Cases | License | Notes |
---|---|---|---|---|---|---|---|
DEFLATE | Lossless | Moderate (20-50%) | Fast | Very Fast | General-purpose, ZIP, gzip files | BSD-like | Widely used, good balance between compression and speed. |
LZMA (xz) | Lossless | High (30-70%) | Slow | Moderate | High compression, Linux packages | GPL/MIT | Excellent compression but slower than DEFLATE. |
Brotli | Lossless | High (30-70%) | Slow | Fast | Web assets, static content | BSD-like | Optimized for text and HTML; slower to compress but fast to decompress. |
Zstandard (zstd) | Lossless | High (30-70%) | Fast | Very Fast | General-purpose, real-time compression | BSD-like | Modern, high-speed algorithm with adjustable compression levels. |
LZ4 | Lossless | Moderate (20-50%) | Very Fast | Very Fast | Real-time data, databases, gaming | BSD-like | Extremely fast, suitable for scenarios where speed is critical. |
Snappy (Google) | Lossless | Moderate (20-50%) | Fast | Very Fast | Google protocols, real-time systems | BSD-like | Designed for speed, slightly lower compression ratio than LZ4. |
Gzip | Lossless | Moderate (20-50%) | Slow | Fast | Web servers, backups | GPL | Uses DEFLATE under the hood; older but still widely used. |
XZ | Lossless | High (30-70%) | Slow | Moderate | Linux packages, high compression | GPL/MIT | Uses LZMA; excellent for archiving but slower than other algorithms. |
BZIP2 | Lossless | High (30-70%) | Slow | Moderate | Tarballs, backups | GPL | Older algorithm with good compression but slower than modern alternatives. |
Zlib | Lossless | Moderate (20-50%) | Fast | Very Fast | General-purpose, games, networking | BSD-like | Lightweight and widely used, but less efficient than newer algorithms. |
RLE (Run-Length Encoding) | Lossless | Low (10-30%) | Fast | Fast | Simple data, images, audio | Public domain | Simple and fast bu |