one hot encoding of output labels

While I understand the need to one hot encode features in the input data, how does one hot encoding of output labels actually help? The tensor flow MNIST tutorial encourages one hot encoding of output labels. The first assignment in CS231n(stanford) however does not suggest one hot encoding. What's the rationale behind choosing / not choosing to one hot encode output labels?

Edit: Not sure about the reason for the downvote, but just to elaborate more, I missed out mentioning the softmax function along with the cross entropy loss function, which is normally used in multinomial classification. Does it have something to do with the cross entropy loss function?
Having said that, one can calculate the loss even without the output labels being one hot encoded.

edited Jul 17 '18 at 15:59

asked Jul 17 '18 at 15:13

lazy python

397

I'm voting to close this question as off-topic because it is not about programming, as established in the help center. However, you may consider doing additional research on multi-class classification and logistic regression.

– E_net4
Jul 17 '18 at 22:20

1

This may help.

– mattdns
Jul 17 '18 at 23:54

add a comment |

edited Jul 17 '18 at 15:59

asked Jul 17 '18 at 15:13

lazy python

397

I'm voting to close this question as off-topic because it is not about programming, as established in the help center. However, you may consider doing additional research on multi-class classification and logistic regression.

– E_net4
Jul 17 '18 at 22:20

1

This may help.

– mattdns
Jul 17 '18 at 23:54

add a comment |

edited Jul 17 '18 at 15:59

asked Jul 17 '18 at 15:13

lazy python

397

machine-learning classification one-hot-encoding

edited Jul 17 '18 at 15:59

asked Jul 17 '18 at 15:13

lazy python

397

edited Jul 17 '18 at 15:59

asked Jul 17 '18 at 15:13

lazy python

397

edited Jul 17 '18 at 15:59

asked Jul 17 '18 at 15:13

lazy python

397

asked Jul 17 '18 at 15:13

lazy python

397

asked Jul 17 '18 at 15:13

lazy python

397

I'm voting to close this question as off-topic because it is not about programming, as established in the help center. However, you may consider doing additional research on multi-class classification and logistic regression.

– E_net4
Jul 17 '18 at 22:20

1

This may help.

– mattdns
Jul 17 '18 at 23:54

add a comment |

I'm voting to close this question as off-topic because it is not about programming, as established in the help center. However, you may consider doing additional research on multi-class classification and logistic regression.

– E_net4
Jul 17 '18 at 22:20

1

This may help.

– mattdns
Jul 17 '18 at 23:54

I'm voting to close this question as off-topic because it is not about programming, as established in the help center. However, you may consider doing additional research on multi-class classification and logistic regression.

– E_net4
Jul 17 '18 at 22:20

This may help.

– mattdns
Jul 17 '18 at 23:54

add a comment |

1 Answer
1

active

oldest

votes

One hot vector is used in cases where output is not cardinal. Lets assume you encode your output as integer giving each label a number.

The integer values have a natural ordered relationship between each other and machine learning algorithms may be able to understand and harness this relationship, but your labels may be unrelated. There may be no similarity in your labels. For categorical variables where no such ordinal relationship exists, the integer encoding is not good.

In fact, using this encoding and allowing the model to assume a natural ordering between categories may result in unexpected results where model predictions are halfway between categories categories.

What a mean by that?

The idea is that if we train an ML algorithm - for example a neural network - it’s going to think that a cat (which is 1) is halfway between a dog and a bird, because they are 0 and 2 respectively. We don’t want that; it’s not true and it’s an extra thing for the algorithm to learn.

The same may happen when data is encoded in n dimensional space and vector has a continuous value. The result may be hard to interpret and map back to labels.

In this case, a one-hot encoding can be applied to label representation as it has clear interpretation and its values are separated each is in different dimension.

If you need more information or would like to see the reason for one-hot encoding for the perspective of loss function see https://www.linkedin.com/pulse/why-using-one-hot-encoding-classifier-training-adwin-jahn/

answered Nov 14 '18 at 1:00

RKO

265

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f51384911%2fone-hot-encoding-of-output-labels%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

One hot vector is used in cases where output is not cardinal. Lets assume you encode your output as integer giving each label a number.

In fact, using this encoding and allowing the model to assume a natural ordering between categories may result in unexpected results where model predictions are halfway between categories categories.

What a mean by that?

The same may happen when data is encoded in n dimensional space and vector has a continuous value. The result may be hard to interpret and map back to labels.

In this case, a one-hot encoding can be applied to label representation as it has clear interpretation and its values are separated each is in different dimension.

answered Nov 14 '18 at 1:00

RKO

265

add a comment |

One hot vector is used in cases where output is not cardinal. Lets assume you encode your output as integer giving each label a number.

In fact, using this encoding and allowing the model to assume a natural ordering between categories may result in unexpected results where model predictions are halfway between categories categories.

What a mean by that?

The same may happen when data is encoded in n dimensional space and vector has a continuous value. The result may be hard to interpret and map back to labels.

In this case, a one-hot encoding can be applied to label representation as it has clear interpretation and its values are separated each is in different dimension.

answered Nov 14 '18 at 1:00

RKO

265

add a comment |

One hot vector is used in cases where output is not cardinal. Lets assume you encode your output as integer giving each label a number.

In fact, using this encoding and allowing the model to assume a natural ordering between categories may result in unexpected results where model predictions are halfway between categories categories.

What a mean by that?

The same may happen when data is encoded in n dimensional space and vector has a continuous value. The result may be hard to interpret and map back to labels.

In this case, a one-hot encoding can be applied to label representation as it has clear interpretation and its values are separated each is in different dimension.

answered Nov 14 '18 at 1:00

RKO

265

One hot vector is used in cases where output is not cardinal. Lets assume you encode your output as integer giving each label a number.

In fact, using this encoding and allowing the model to assume a natural ordering between categories may result in unexpected results where model predictions are halfway between categories categories.

What a mean by that?

The same may happen when data is encoded in n dimensional space and vector has a continuous value. The result may be hard to interpret and map back to labels.

In this case, a one-hot encoding can be applied to label representation as it has clear interpretation and its values are separated each is in different dimension.

answered Nov 14 '18 at 1:00

RKO

265

answered Nov 14 '18 at 1:00

RKO

265

answered Nov 14 '18 at 1:00

RKO

265

answered Nov 14 '18 at 1:00

RKO

265

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

m4BA1e4X2wkjSBVpzZ,VAn1z vSx kon,5XjwBnHQR4f0D,7OG0 ocp

搜尋此網誌

Pfthb