Compressing a file using Huffman coding when the characters all have similar repetitions?









up vote
0
down vote

favorite












So i have been implementing huffman compression on a bunch of different types of files (.jpg, .txt, .docx) and I've noticed a lot that sometimes the compressed file is sometimes almost the same as the original file (ex: 251,339kb -> 250,917kb(without header!)) I'm pretty sure my code is solid, though i'm not sure if this is right or not. The thing that i've noticed is that the character frequencies are very similar, so for example I'll have 10 characters that all have for example 65 repetitions and then another 10 characters that have 66 repetitions, and then another 10 characters that have 67 repetitions, etc etc. And because the files are big, the compressed character representation code end up having the same size as the original, or even larger (9 bits). Is this a normal thing when compressing using huffman?










share|improve this question

















  • 1




    It is normal that data which is already pretty random doesn't get smaller but could get larger. What do you expect to happen when you try to compress an already compressed file?
    – Peter Lawrey
    Nov 10 at 9:09






  • 1




    @PeterLawrey's answer is spot on. Additional things to think about-if you have an uncompressed image file like an uncomp. TIFF, the frequency of using 0 to 255 in each byte will tend to become evenly distributed. The exception being an image that is mostly in a single color family. For instance, something that is mostly bright blue. For text files, it takes 7 bits to represent English (A-Z,a-z,0-9, plus punctuation and symbols). For normal text, Huffman will do reasonably well because English tends to favor certain characters more than others.
    – Jason Armstrong
    Nov 10 at 14:10










  • An example, say you have lots of clothes and you put them in a vaccum bag, you suck all the air out of it and it gets small. Now put this in another vaccum bag, does it get smaller then second time, possibly if the first bag wasn't very good, but it's more likely to just make it larger.
    – Peter Lawrey
    Nov 10 at 19:45














up vote
0
down vote

favorite












So i have been implementing huffman compression on a bunch of different types of files (.jpg, .txt, .docx) and I've noticed a lot that sometimes the compressed file is sometimes almost the same as the original file (ex: 251,339kb -> 250,917kb(without header!)) I'm pretty sure my code is solid, though i'm not sure if this is right or not. The thing that i've noticed is that the character frequencies are very similar, so for example I'll have 10 characters that all have for example 65 repetitions and then another 10 characters that have 66 repetitions, and then another 10 characters that have 67 repetitions, etc etc. And because the files are big, the compressed character representation code end up having the same size as the original, or even larger (9 bits). Is this a normal thing when compressing using huffman?










share|improve this question

















  • 1




    It is normal that data which is already pretty random doesn't get smaller but could get larger. What do you expect to happen when you try to compress an already compressed file?
    – Peter Lawrey
    Nov 10 at 9:09






  • 1




    @PeterLawrey's answer is spot on. Additional things to think about-if you have an uncompressed image file like an uncomp. TIFF, the frequency of using 0 to 255 in each byte will tend to become evenly distributed. The exception being an image that is mostly in a single color family. For instance, something that is mostly bright blue. For text files, it takes 7 bits to represent English (A-Z,a-z,0-9, plus punctuation and symbols). For normal text, Huffman will do reasonably well because English tends to favor certain characters more than others.
    – Jason Armstrong
    Nov 10 at 14:10










  • An example, say you have lots of clothes and you put them in a vaccum bag, you suck all the air out of it and it gets small. Now put this in another vaccum bag, does it get smaller then second time, possibly if the first bag wasn't very good, but it's more likely to just make it larger.
    – Peter Lawrey
    Nov 10 at 19:45












up vote
0
down vote

favorite









up vote
0
down vote

favorite











So i have been implementing huffman compression on a bunch of different types of files (.jpg, .txt, .docx) and I've noticed a lot that sometimes the compressed file is sometimes almost the same as the original file (ex: 251,339kb -> 250,917kb(without header!)) I'm pretty sure my code is solid, though i'm not sure if this is right or not. The thing that i've noticed is that the character frequencies are very similar, so for example I'll have 10 characters that all have for example 65 repetitions and then another 10 characters that have 66 repetitions, and then another 10 characters that have 67 repetitions, etc etc. And because the files are big, the compressed character representation code end up having the same size as the original, or even larger (9 bits). Is this a normal thing when compressing using huffman?










share|improve this question













So i have been implementing huffman compression on a bunch of different types of files (.jpg, .txt, .docx) and I've noticed a lot that sometimes the compressed file is sometimes almost the same as the original file (ex: 251,339kb -> 250,917kb(without header!)) I'm pretty sure my code is solid, though i'm not sure if this is right or not. The thing that i've noticed is that the character frequencies are very similar, so for example I'll have 10 characters that all have for example 65 repetitions and then another 10 characters that have 66 repetitions, and then another 10 characters that have 67 repetitions, etc etc. And because the files are big, the compressed character representation code end up having the same size as the original, or even larger (9 bits). Is this a normal thing when compressing using huffman?







java huffman-code






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 9 at 18:24









TheHaruWhoCodes

207




207







  • 1




    It is normal that data which is already pretty random doesn't get smaller but could get larger. What do you expect to happen when you try to compress an already compressed file?
    – Peter Lawrey
    Nov 10 at 9:09






  • 1




    @PeterLawrey's answer is spot on. Additional things to think about-if you have an uncompressed image file like an uncomp. TIFF, the frequency of using 0 to 255 in each byte will tend to become evenly distributed. The exception being an image that is mostly in a single color family. For instance, something that is mostly bright blue. For text files, it takes 7 bits to represent English (A-Z,a-z,0-9, plus punctuation and symbols). For normal text, Huffman will do reasonably well because English tends to favor certain characters more than others.
    – Jason Armstrong
    Nov 10 at 14:10










  • An example, say you have lots of clothes and you put them in a vaccum bag, you suck all the air out of it and it gets small. Now put this in another vaccum bag, does it get smaller then second time, possibly if the first bag wasn't very good, but it's more likely to just make it larger.
    – Peter Lawrey
    Nov 10 at 19:45












  • 1




    It is normal that data which is already pretty random doesn't get smaller but could get larger. What do you expect to happen when you try to compress an already compressed file?
    – Peter Lawrey
    Nov 10 at 9:09






  • 1




    @PeterLawrey's answer is spot on. Additional things to think about-if you have an uncompressed image file like an uncomp. TIFF, the frequency of using 0 to 255 in each byte will tend to become evenly distributed. The exception being an image that is mostly in a single color family. For instance, something that is mostly bright blue. For text files, it takes 7 bits to represent English (A-Z,a-z,0-9, plus punctuation and symbols). For normal text, Huffman will do reasonably well because English tends to favor certain characters more than others.
    – Jason Armstrong
    Nov 10 at 14:10










  • An example, say you have lots of clothes and you put them in a vaccum bag, you suck all the air out of it and it gets small. Now put this in another vaccum bag, does it get smaller then second time, possibly if the first bag wasn't very good, but it's more likely to just make it larger.
    – Peter Lawrey
    Nov 10 at 19:45







1




1




It is normal that data which is already pretty random doesn't get smaller but could get larger. What do you expect to happen when you try to compress an already compressed file?
– Peter Lawrey
Nov 10 at 9:09




It is normal that data which is already pretty random doesn't get smaller but could get larger. What do you expect to happen when you try to compress an already compressed file?
– Peter Lawrey
Nov 10 at 9:09




1




1




@PeterLawrey's answer is spot on. Additional things to think about-if you have an uncompressed image file like an uncomp. TIFF, the frequency of using 0 to 255 in each byte will tend to become evenly distributed. The exception being an image that is mostly in a single color family. For instance, something that is mostly bright blue. For text files, it takes 7 bits to represent English (A-Z,a-z,0-9, plus punctuation and symbols). For normal text, Huffman will do reasonably well because English tends to favor certain characters more than others.
– Jason Armstrong
Nov 10 at 14:10




@PeterLawrey's answer is spot on. Additional things to think about-if you have an uncompressed image file like an uncomp. TIFF, the frequency of using 0 to 255 in each byte will tend to become evenly distributed. The exception being an image that is mostly in a single color family. For instance, something that is mostly bright blue. For text files, it takes 7 bits to represent English (A-Z,a-z,0-9, plus punctuation and symbols). For normal text, Huffman will do reasonably well because English tends to favor certain characters more than others.
– Jason Armstrong
Nov 10 at 14:10












An example, say you have lots of clothes and you put them in a vaccum bag, you suck all the air out of it and it gets small. Now put this in another vaccum bag, does it get smaller then second time, possibly if the first bag wasn't very good, but it's more likely to just make it larger.
– Peter Lawrey
Nov 10 at 19:45




An example, say you have lots of clothes and you put them in a vaccum bag, you suck all the air out of it and it gets small. Now put this in another vaccum bag, does it get smaller then second time, possibly if the first bag wasn't very good, but it's more likely to just make it larger.
– Peter Lawrey
Nov 10 at 19:45

















active

oldest

votes











Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53231383%2fcompressing-a-file-using-huffman-coding-when-the-characters-all-have-similar-rep%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown






























active

oldest

votes













active

oldest

votes









active

oldest

votes






active

oldest

votes















 

draft saved


draft discarded















































 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53231383%2fcompressing-a-file-using-huffman-coding-when-the-characters-all-have-similar-rep%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

How to how show current date and time by default on contact form 7 in WordPress without taking input from user in datetimepicker

Syphilis

Darth Vader #20