Compressing a file using Huffman coding when the characters all have similar repetitions?
up vote
0
down vote
favorite
So i have been implementing huffman compression on a bunch of different types of files (.jpg, .txt, .docx) and I've noticed a lot that sometimes the compressed file is sometimes almost the same as the original file (ex: 251,339kb -> 250,917kb(without header!)) I'm pretty sure my code is solid, though i'm not sure if this is right or not. The thing that i've noticed is that the character frequencies are very similar, so for example I'll have 10 characters that all have for example 65 repetitions and then another 10 characters that have 66 repetitions, and then another 10 characters that have 67 repetitions, etc etc. And because the files are big, the compressed character representation code end up having the same size as the original, or even larger (9 bits). Is this a normal thing when compressing using huffman?
java huffman-code
add a comment |
up vote
0
down vote
favorite
So i have been implementing huffman compression on a bunch of different types of files (.jpg, .txt, .docx) and I've noticed a lot that sometimes the compressed file is sometimes almost the same as the original file (ex: 251,339kb -> 250,917kb(without header!)) I'm pretty sure my code is solid, though i'm not sure if this is right or not. The thing that i've noticed is that the character frequencies are very similar, so for example I'll have 10 characters that all have for example 65 repetitions and then another 10 characters that have 66 repetitions, and then another 10 characters that have 67 repetitions, etc etc. And because the files are big, the compressed character representation code end up having the same size as the original, or even larger (9 bits). Is this a normal thing when compressing using huffman?
java huffman-code
1
It is normal that data which is already pretty random doesn't get smaller but could get larger. What do you expect to happen when you try to compress an already compressed file?
– Peter Lawrey
Nov 10 at 9:09
1
@PeterLawrey's answer is spot on. Additional things to think about-if you have an uncompressed image file like an uncomp. TIFF, the frequency of using 0 to 255 in each byte will tend to become evenly distributed. The exception being an image that is mostly in a single color family. For instance, something that is mostly bright blue. For text files, it takes 7 bits to represent English (A-Z,a-z,0-9, plus punctuation and symbols). For normal text, Huffman will do reasonably well because English tends to favor certain characters more than others.
– Jason Armstrong
Nov 10 at 14:10
An example, say you have lots of clothes and you put them in a vaccum bag, you suck all the air out of it and it gets small. Now put this in another vaccum bag, does it get smaller then second time, possibly if the first bag wasn't very good, but it's more likely to just make it larger.
– Peter Lawrey
Nov 10 at 19:45
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
So i have been implementing huffman compression on a bunch of different types of files (.jpg, .txt, .docx) and I've noticed a lot that sometimes the compressed file is sometimes almost the same as the original file (ex: 251,339kb -> 250,917kb(without header!)) I'm pretty sure my code is solid, though i'm not sure if this is right or not. The thing that i've noticed is that the character frequencies are very similar, so for example I'll have 10 characters that all have for example 65 repetitions and then another 10 characters that have 66 repetitions, and then another 10 characters that have 67 repetitions, etc etc. And because the files are big, the compressed character representation code end up having the same size as the original, or even larger (9 bits). Is this a normal thing when compressing using huffman?
java huffman-code
So i have been implementing huffman compression on a bunch of different types of files (.jpg, .txt, .docx) and I've noticed a lot that sometimes the compressed file is sometimes almost the same as the original file (ex: 251,339kb -> 250,917kb(without header!)) I'm pretty sure my code is solid, though i'm not sure if this is right or not. The thing that i've noticed is that the character frequencies are very similar, so for example I'll have 10 characters that all have for example 65 repetitions and then another 10 characters that have 66 repetitions, and then another 10 characters that have 67 repetitions, etc etc. And because the files are big, the compressed character representation code end up having the same size as the original, or even larger (9 bits). Is this a normal thing when compressing using huffman?
java huffman-code
java huffman-code
asked Nov 9 at 18:24
TheHaruWhoCodes
207
207
1
It is normal that data which is already pretty random doesn't get smaller but could get larger. What do you expect to happen when you try to compress an already compressed file?
– Peter Lawrey
Nov 10 at 9:09
1
@PeterLawrey's answer is spot on. Additional things to think about-if you have an uncompressed image file like an uncomp. TIFF, the frequency of using 0 to 255 in each byte will tend to become evenly distributed. The exception being an image that is mostly in a single color family. For instance, something that is mostly bright blue. For text files, it takes 7 bits to represent English (A-Z,a-z,0-9, plus punctuation and symbols). For normal text, Huffman will do reasonably well because English tends to favor certain characters more than others.
– Jason Armstrong
Nov 10 at 14:10
An example, say you have lots of clothes and you put them in a vaccum bag, you suck all the air out of it and it gets small. Now put this in another vaccum bag, does it get smaller then second time, possibly if the first bag wasn't very good, but it's more likely to just make it larger.
– Peter Lawrey
Nov 10 at 19:45
add a comment |
1
It is normal that data which is already pretty random doesn't get smaller but could get larger. What do you expect to happen when you try to compress an already compressed file?
– Peter Lawrey
Nov 10 at 9:09
1
@PeterLawrey's answer is spot on. Additional things to think about-if you have an uncompressed image file like an uncomp. TIFF, the frequency of using 0 to 255 in each byte will tend to become evenly distributed. The exception being an image that is mostly in a single color family. For instance, something that is mostly bright blue. For text files, it takes 7 bits to represent English (A-Z,a-z,0-9, plus punctuation and symbols). For normal text, Huffman will do reasonably well because English tends to favor certain characters more than others.
– Jason Armstrong
Nov 10 at 14:10
An example, say you have lots of clothes and you put them in a vaccum bag, you suck all the air out of it and it gets small. Now put this in another vaccum bag, does it get smaller then second time, possibly if the first bag wasn't very good, but it's more likely to just make it larger.
– Peter Lawrey
Nov 10 at 19:45
1
1
It is normal that data which is already pretty random doesn't get smaller but could get larger. What do you expect to happen when you try to compress an already compressed file?
– Peter Lawrey
Nov 10 at 9:09
It is normal that data which is already pretty random doesn't get smaller but could get larger. What do you expect to happen when you try to compress an already compressed file?
– Peter Lawrey
Nov 10 at 9:09
1
1
@PeterLawrey's answer is spot on. Additional things to think about-if you have an uncompressed image file like an uncomp. TIFF, the frequency of using 0 to 255 in each byte will tend to become evenly distributed. The exception being an image that is mostly in a single color family. For instance, something that is mostly bright blue. For text files, it takes 7 bits to represent English (A-Z,a-z,0-9, plus punctuation and symbols). For normal text, Huffman will do reasonably well because English tends to favor certain characters more than others.
– Jason Armstrong
Nov 10 at 14:10
@PeterLawrey's answer is spot on. Additional things to think about-if you have an uncompressed image file like an uncomp. TIFF, the frequency of using 0 to 255 in each byte will tend to become evenly distributed. The exception being an image that is mostly in a single color family. For instance, something that is mostly bright blue. For text files, it takes 7 bits to represent English (A-Z,a-z,0-9, plus punctuation and symbols). For normal text, Huffman will do reasonably well because English tends to favor certain characters more than others.
– Jason Armstrong
Nov 10 at 14:10
An example, say you have lots of clothes and you put them in a vaccum bag, you suck all the air out of it and it gets small. Now put this in another vaccum bag, does it get smaller then second time, possibly if the first bag wasn't very good, but it's more likely to just make it larger.
– Peter Lawrey
Nov 10 at 19:45
An example, say you have lots of clothes and you put them in a vaccum bag, you suck all the air out of it and it gets small. Now put this in another vaccum bag, does it get smaller then second time, possibly if the first bag wasn't very good, but it's more likely to just make it larger.
– Peter Lawrey
Nov 10 at 19:45
add a comment |
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53231383%2fcompressing-a-file-using-huffman-coding-when-the-characters-all-have-similar-rep%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
It is normal that data which is already pretty random doesn't get smaller but could get larger. What do you expect to happen when you try to compress an already compressed file?
– Peter Lawrey
Nov 10 at 9:09
1
@PeterLawrey's answer is spot on. Additional things to think about-if you have an uncompressed image file like an uncomp. TIFF, the frequency of using 0 to 255 in each byte will tend to become evenly distributed. The exception being an image that is mostly in a single color family. For instance, something that is mostly bright blue. For text files, it takes 7 bits to represent English (A-Z,a-z,0-9, plus punctuation and symbols). For normal text, Huffman will do reasonably well because English tends to favor certain characters more than others.
– Jason Armstrong
Nov 10 at 14:10
An example, say you have lots of clothes and you put them in a vaccum bag, you suck all the air out of it and it gets small. Now put this in another vaccum bag, does it get smaller then second time, possibly if the first bag wasn't very good, but it's more likely to just make it larger.
– Peter Lawrey
Nov 10 at 19:45