one hot encoding of output labels

Multi tool use
While I understand the need to one hot encode features in the input data, how does one hot encoding of output labels actually help? The tensor flow MNIST tutorial encourages one hot encoding of output labels. The first assignment in CS231n(stanford) however does not suggest one hot encoding. What's the rationale behind choosing / not choosing to one hot encode output labels?
Edit: Not sure about the reason for the downvote, but just to elaborate more, I missed out mentioning the softmax function along with the cross entropy loss function, which is normally used in multinomial classification. Does it have something to do with the cross entropy loss function?
Having said that, one can calculate the loss even without the output labels being one hot encoded.
machine-learning classification one-hot-encoding
add a comment |
While I understand the need to one hot encode features in the input data, how does one hot encoding of output labels actually help? The tensor flow MNIST tutorial encourages one hot encoding of output labels. The first assignment in CS231n(stanford) however does not suggest one hot encoding. What's the rationale behind choosing / not choosing to one hot encode output labels?
Edit: Not sure about the reason for the downvote, but just to elaborate more, I missed out mentioning the softmax function along with the cross entropy loss function, which is normally used in multinomial classification. Does it have something to do with the cross entropy loss function?
Having said that, one can calculate the loss even without the output labels being one hot encoded.
machine-learning classification one-hot-encoding
I'm voting to close this question as off-topic because it is not about programming, as established in the help center. However, you may consider doing additional research on multi-class classification and logistic regression.
– E_net4
Jul 17 '18 at 22:20
1
This may help.
– mattdns
Jul 17 '18 at 23:54
add a comment |
While I understand the need to one hot encode features in the input data, how does one hot encoding of output labels actually help? The tensor flow MNIST tutorial encourages one hot encoding of output labels. The first assignment in CS231n(stanford) however does not suggest one hot encoding. What's the rationale behind choosing / not choosing to one hot encode output labels?
Edit: Not sure about the reason for the downvote, but just to elaborate more, I missed out mentioning the softmax function along with the cross entropy loss function, which is normally used in multinomial classification. Does it have something to do with the cross entropy loss function?
Having said that, one can calculate the loss even without the output labels being one hot encoded.
machine-learning classification one-hot-encoding
While I understand the need to one hot encode features in the input data, how does one hot encoding of output labels actually help? The tensor flow MNIST tutorial encourages one hot encoding of output labels. The first assignment in CS231n(stanford) however does not suggest one hot encoding. What's the rationale behind choosing / not choosing to one hot encode output labels?
Edit: Not sure about the reason for the downvote, but just to elaborate more, I missed out mentioning the softmax function along with the cross entropy loss function, which is normally used in multinomial classification. Does it have something to do with the cross entropy loss function?
Having said that, one can calculate the loss even without the output labels being one hot encoded.
machine-learning classification one-hot-encoding
machine-learning classification one-hot-encoding
edited Jul 17 '18 at 15:59
lazy python
asked Jul 17 '18 at 15:13
lazy pythonlazy python
397
397
I'm voting to close this question as off-topic because it is not about programming, as established in the help center. However, you may consider doing additional research on multi-class classification and logistic regression.
– E_net4
Jul 17 '18 at 22:20
1
This may help.
– mattdns
Jul 17 '18 at 23:54
add a comment |
I'm voting to close this question as off-topic because it is not about programming, as established in the help center. However, you may consider doing additional research on multi-class classification and logistic regression.
– E_net4
Jul 17 '18 at 22:20
1
This may help.
– mattdns
Jul 17 '18 at 23:54
I'm voting to close this question as off-topic because it is not about programming, as established in the help center. However, you may consider doing additional research on multi-class classification and logistic regression.
– E_net4
Jul 17 '18 at 22:20
I'm voting to close this question as off-topic because it is not about programming, as established in the help center. However, you may consider doing additional research on multi-class classification and logistic regression.
– E_net4
Jul 17 '18 at 22:20
1
1
This may help.
– mattdns
Jul 17 '18 at 23:54
This may help.
– mattdns
Jul 17 '18 at 23:54
add a comment |
1 Answer
1
active
oldest
votes
One hot vector is used in cases where output is not cardinal. Lets assume you encode your output as integer giving each label a number.
The integer values have a natural ordered relationship between each other and machine learning algorithms may be able to understand and harness this relationship, but your labels may be unrelated. There may be no similarity in your labels. For categorical variables where no such ordinal relationship exists, the integer encoding is not good.
In fact, using this encoding and allowing the model to assume a natural ordering between categories may result in unexpected results where model predictions are halfway between categories categories.
What a mean by that?
The idea is that if we train an ML algorithm - for example a neural network - it’s going to think that a cat (which is 1) is halfway between a dog and a bird, because they are 0 and 2 respectively. We don’t want that; it’s not true and it’s an extra thing for the algorithm to learn.
The same may happen when data is encoded in n dimensional space and vector has a continuous value. The result may be hard to interpret and map back to labels.
In this case, a one-hot encoding can be applied to label representation as it has clear interpretation and its values are separated each is in different dimension.
If you need more information or would like to see the reason for one-hot encoding for the perspective of loss function see https://www.linkedin.com/pulse/why-using-one-hot-encoding-classifier-training-adwin-jahn/
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f51384911%2fone-hot-encoding-of-output-labels%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
One hot vector is used in cases where output is not cardinal. Lets assume you encode your output as integer giving each label a number.
The integer values have a natural ordered relationship between each other and machine learning algorithms may be able to understand and harness this relationship, but your labels may be unrelated. There may be no similarity in your labels. For categorical variables where no such ordinal relationship exists, the integer encoding is not good.
In fact, using this encoding and allowing the model to assume a natural ordering between categories may result in unexpected results where model predictions are halfway between categories categories.
What a mean by that?
The idea is that if we train an ML algorithm - for example a neural network - it’s going to think that a cat (which is 1) is halfway between a dog and a bird, because they are 0 and 2 respectively. We don’t want that; it’s not true and it’s an extra thing for the algorithm to learn.
The same may happen when data is encoded in n dimensional space and vector has a continuous value. The result may be hard to interpret and map back to labels.
In this case, a one-hot encoding can be applied to label representation as it has clear interpretation and its values are separated each is in different dimension.
If you need more information or would like to see the reason for one-hot encoding for the perspective of loss function see https://www.linkedin.com/pulse/why-using-one-hot-encoding-classifier-training-adwin-jahn/
add a comment |
One hot vector is used in cases where output is not cardinal. Lets assume you encode your output as integer giving each label a number.
The integer values have a natural ordered relationship between each other and machine learning algorithms may be able to understand and harness this relationship, but your labels may be unrelated. There may be no similarity in your labels. For categorical variables where no such ordinal relationship exists, the integer encoding is not good.
In fact, using this encoding and allowing the model to assume a natural ordering between categories may result in unexpected results where model predictions are halfway between categories categories.
What a mean by that?
The idea is that if we train an ML algorithm - for example a neural network - it’s going to think that a cat (which is 1) is halfway between a dog and a bird, because they are 0 and 2 respectively. We don’t want that; it’s not true and it’s an extra thing for the algorithm to learn.
The same may happen when data is encoded in n dimensional space and vector has a continuous value. The result may be hard to interpret and map back to labels.
In this case, a one-hot encoding can be applied to label representation as it has clear interpretation and its values are separated each is in different dimension.
If you need more information or would like to see the reason for one-hot encoding for the perspective of loss function see https://www.linkedin.com/pulse/why-using-one-hot-encoding-classifier-training-adwin-jahn/
add a comment |
One hot vector is used in cases where output is not cardinal. Lets assume you encode your output as integer giving each label a number.
The integer values have a natural ordered relationship between each other and machine learning algorithms may be able to understand and harness this relationship, but your labels may be unrelated. There may be no similarity in your labels. For categorical variables where no such ordinal relationship exists, the integer encoding is not good.
In fact, using this encoding and allowing the model to assume a natural ordering between categories may result in unexpected results where model predictions are halfway between categories categories.
What a mean by that?
The idea is that if we train an ML algorithm - for example a neural network - it’s going to think that a cat (which is 1) is halfway between a dog and a bird, because they are 0 and 2 respectively. We don’t want that; it’s not true and it’s an extra thing for the algorithm to learn.
The same may happen when data is encoded in n dimensional space and vector has a continuous value. The result may be hard to interpret and map back to labels.
In this case, a one-hot encoding can be applied to label representation as it has clear interpretation and its values are separated each is in different dimension.
If you need more information or would like to see the reason for one-hot encoding for the perspective of loss function see https://www.linkedin.com/pulse/why-using-one-hot-encoding-classifier-training-adwin-jahn/
One hot vector is used in cases where output is not cardinal. Lets assume you encode your output as integer giving each label a number.
The integer values have a natural ordered relationship between each other and machine learning algorithms may be able to understand and harness this relationship, but your labels may be unrelated. There may be no similarity in your labels. For categorical variables where no such ordinal relationship exists, the integer encoding is not good.
In fact, using this encoding and allowing the model to assume a natural ordering between categories may result in unexpected results where model predictions are halfway between categories categories.
What a mean by that?
The idea is that if we train an ML algorithm - for example a neural network - it’s going to think that a cat (which is 1) is halfway between a dog and a bird, because they are 0 and 2 respectively. We don’t want that; it’s not true and it’s an extra thing for the algorithm to learn.
The same may happen when data is encoded in n dimensional space and vector has a continuous value. The result may be hard to interpret and map back to labels.
In this case, a one-hot encoding can be applied to label representation as it has clear interpretation and its values are separated each is in different dimension.
If you need more information or would like to see the reason for one-hot encoding for the perspective of loss function see https://www.linkedin.com/pulse/why-using-one-hot-encoding-classifier-training-adwin-jahn/
answered Nov 14 '18 at 1:00


RKORKO
265
265
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f51384911%2fone-hot-encoding-of-output-labels%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
m4BA1e4X2wkjSBVpzZ,VAn1z vSx kon,5XjwBnHQR4f0D,7OG0 ocp
I'm voting to close this question as off-topic because it is not about programming, as established in the help center. However, you may consider doing additional research on multi-class classification and logistic regression.
– E_net4
Jul 17 '18 at 22:20
1
This may help.
– mattdns
Jul 17 '18 at 23:54