Compressing a file using Huffman coding when the characters all have similar repetitions?

up vote
0
down vote

favorite

So i have been implementing huffman compression on a bunch of different types of files (.jpg, .txt, .docx) and I've noticed a lot that sometimes the compressed file is sometimes almost the same as the original file (ex: 251,339kb -> 250,917kb(without header!)) I'm pretty sure my code is solid, though i'm not sure if this is right or not. The thing that i've noticed is that the character frequencies are very similar, so for example I'll have 10 characters that all have for example 65 repetitions and then another 10 characters that have 66 repetitions, and then another 10 characters that have 67 repetitions, etc etc. And because the files are big, the compressed character representation code end up having the same size as the original, or even larger (9 bits). Is this a normal thing when compressing using huffman?

asked Nov 9 at 18:24

TheHaruWhoCodes

207

1

It is normal that data which is already pretty random doesn't get smaller but could get larger. What do you expect to happen when you try to compress an already compressed file?
– Peter Lawrey
Nov 10 at 9:09

1

@PeterLawrey's answer is spot on. Additional things to think about-if you have an uncompressed image file like an uncomp. TIFF, the frequency of using 0 to 255 in each byte will tend to become evenly distributed. The exception being an image that is mostly in a single color family. For instance, something that is mostly bright blue. For text files, it takes 7 bits to represent English (A-Z,a-z,0-9, plus punctuation and symbols). For normal text, Huffman will do reasonably well because English tends to favor certain characters more than others.
– Jason Armstrong
Nov 10 at 14:10

An example, say you have lots of clothes and you put them in a vaccum bag, you suck all the air out of it and it gets small. Now put this in another vaccum bag, does it get smaller then second time, possibly if the first bag wasn't very good, but it's more likely to just make it larger.
– Peter Lawrey
Nov 10 at 19:45

add a comment |

up vote
0
down vote

favorite

asked Nov 9 at 18:24

TheHaruWhoCodes

207

1

It is normal that data which is already pretty random doesn't get smaller but could get larger. What do you expect to happen when you try to compress an already compressed file?
– Peter Lawrey
Nov 10 at 9:09

1

@PeterLawrey's answer is spot on. Additional things to think about-if you have an uncompressed image file like an uncomp. TIFF, the frequency of using 0 to 255 in each byte will tend to become evenly distributed. The exception being an image that is mostly in a single color family. For instance, something that is mostly bright blue. For text files, it takes 7 bits to represent English (A-Z,a-z,0-9, plus punctuation and symbols). For normal text, Huffman will do reasonably well because English tends to favor certain characters more than others.
– Jason Armstrong
Nov 10 at 14:10

An example, say you have lots of clothes and you put them in a vaccum bag, you suck all the air out of it and it gets small. Now put this in another vaccum bag, does it get smaller then second time, possibly if the first bag wasn't very good, but it's more likely to just make it larger.
– Peter Lawrey
Nov 10 at 19:45

add a comment |

up vote
0
down vote

favorite

asked Nov 9 at 18:24

TheHaruWhoCodes

207

java huffman-code

asked Nov 9 at 18:24

TheHaruWhoCodes

207

asked Nov 9 at 18:24

TheHaruWhoCodes

207

asked Nov 9 at 18:24

TheHaruWhoCodes

207

asked Nov 9 at 18:24

TheHaruWhoCodes

207

asked Nov 9 at 18:24

TheHaruWhoCodes

207

1

It is normal that data which is already pretty random doesn't get smaller but could get larger. What do you expect to happen when you try to compress an already compressed file?
– Peter Lawrey
Nov 10 at 9:09

1

@PeterLawrey's answer is spot on. Additional things to think about-if you have an uncompressed image file like an uncomp. TIFF, the frequency of using 0 to 255 in each byte will tend to become evenly distributed. The exception being an image that is mostly in a single color family. For instance, something that is mostly bright blue. For text files, it takes 7 bits to represent English (A-Z,a-z,0-9, plus punctuation and symbols). For normal text, Huffman will do reasonably well because English tends to favor certain characters more than others.
– Jason Armstrong
Nov 10 at 14:10

An example, say you have lots of clothes and you put them in a vaccum bag, you suck all the air out of it and it gets small. Now put this in another vaccum bag, does it get smaller then second time, possibly if the first bag wasn't very good, but it's more likely to just make it larger.
– Peter Lawrey
Nov 10 at 19:45

add a comment |

1

It is normal that data which is already pretty random doesn't get smaller but could get larger. What do you expect to happen when you try to compress an already compressed file?
– Peter Lawrey
Nov 10 at 9:09

1

@PeterLawrey's answer is spot on. Additional things to think about-if you have an uncompressed image file like an uncomp. TIFF, the frequency of using 0 to 255 in each byte will tend to become evenly distributed. The exception being an image that is mostly in a single color family. For instance, something that is mostly bright blue. For text files, it takes 7 bits to represent English (A-Z,a-z,0-9, plus punctuation and symbols). For normal text, Huffman will do reasonably well because English tends to favor certain characters more than others.
– Jason Armstrong
Nov 10 at 14:10

An example, say you have lots of clothes and you put them in a vaccum bag, you suck all the air out of it and it gets small. Now put this in another vaccum bag, does it get smaller then second time, possibly if the first bag wasn't very good, but it's more likely to just make it larger.
– Peter Lawrey
Nov 10 at 19:45

It is normal that data which is already pretty random doesn't get smaller but could get larger. What do you expect to happen when you try to compress an already compressed file?
– Peter Lawrey
Nov 10 at 9:09

@PeterLawrey's answer is spot on. Additional things to think about-if you have an uncompressed image file like an uncomp. TIFF, the frequency of using 0 to 255 in each byte will tend to become evenly distributed. The exception being an image that is mostly in a single color family. For instance, something that is mostly bright blue. For text files, it takes 7 bits to represent English (A-Z,a-z,0-9, plus punctuation and symbols). For normal text, Huffman will do reasonably well because English tends to favor certain characters more than others.
– Jason Armstrong
Nov 10 at 14:10

An example, say you have lots of clothes and you put them in a vaccum bag, you suck all the air out of it and it gets small. Now put this in another vaccum bag, does it get smaller then second time, possibly if the first bag wasn't very good, but it's more likely to just make it larger.
– Peter Lawrey
Nov 10 at 19:45

add a comment |

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53231383%2fcompressing-a-file-using-huffman-coding-when-the-characters-all-have-similar-rep%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

active

oldest

votes

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Pfthb