one hot encoding of output labels










1















While I understand the need to one hot encode features in the input data, how does one hot encoding of output labels actually help? The tensor flow MNIST tutorial encourages one hot encoding of output labels. The first assignment in CS231n(stanford) however does not suggest one hot encoding. What's the rationale behind choosing / not choosing to one hot encode output labels?



Edit: Not sure about the reason for the downvote, but just to elaborate more, I missed out mentioning the softmax function along with the cross entropy loss function, which is normally used in multinomial classification. Does it have something to do with the cross entropy loss function?
Having said that, one can calculate the loss even without the output labels being one hot encoded.










share|improve this question
























  • I'm voting to close this question as off-topic because it is not about programming, as established in the help center. However, you may consider doing additional research on multi-class classification and logistic regression.

    – E_net4
    Jul 17 '18 at 22:20







  • 1





    This may help.

    – mattdns
    Jul 17 '18 at 23:54















1















While I understand the need to one hot encode features in the input data, how does one hot encoding of output labels actually help? The tensor flow MNIST tutorial encourages one hot encoding of output labels. The first assignment in CS231n(stanford) however does not suggest one hot encoding. What's the rationale behind choosing / not choosing to one hot encode output labels?



Edit: Not sure about the reason for the downvote, but just to elaborate more, I missed out mentioning the softmax function along with the cross entropy loss function, which is normally used in multinomial classification. Does it have something to do with the cross entropy loss function?
Having said that, one can calculate the loss even without the output labels being one hot encoded.










share|improve this question
























  • I'm voting to close this question as off-topic because it is not about programming, as established in the help center. However, you may consider doing additional research on multi-class classification and logistic regression.

    – E_net4
    Jul 17 '18 at 22:20







  • 1





    This may help.

    – mattdns
    Jul 17 '18 at 23:54













1












1








1


1






While I understand the need to one hot encode features in the input data, how does one hot encoding of output labels actually help? The tensor flow MNIST tutorial encourages one hot encoding of output labels. The first assignment in CS231n(stanford) however does not suggest one hot encoding. What's the rationale behind choosing / not choosing to one hot encode output labels?



Edit: Not sure about the reason for the downvote, but just to elaborate more, I missed out mentioning the softmax function along with the cross entropy loss function, which is normally used in multinomial classification. Does it have something to do with the cross entropy loss function?
Having said that, one can calculate the loss even without the output labels being one hot encoded.










share|improve this question
















While I understand the need to one hot encode features in the input data, how does one hot encoding of output labels actually help? The tensor flow MNIST tutorial encourages one hot encoding of output labels. The first assignment in CS231n(stanford) however does not suggest one hot encoding. What's the rationale behind choosing / not choosing to one hot encode output labels?



Edit: Not sure about the reason for the downvote, but just to elaborate more, I missed out mentioning the softmax function along with the cross entropy loss function, which is normally used in multinomial classification. Does it have something to do with the cross entropy loss function?
Having said that, one can calculate the loss even without the output labels being one hot encoded.







machine-learning classification one-hot-encoding






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jul 17 '18 at 15:59







lazy python

















asked Jul 17 '18 at 15:13









lazy pythonlazy python

397




397












  • I'm voting to close this question as off-topic because it is not about programming, as established in the help center. However, you may consider doing additional research on multi-class classification and logistic regression.

    – E_net4
    Jul 17 '18 at 22:20







  • 1





    This may help.

    – mattdns
    Jul 17 '18 at 23:54

















  • I'm voting to close this question as off-topic because it is not about programming, as established in the help center. However, you may consider doing additional research on multi-class classification and logistic regression.

    – E_net4
    Jul 17 '18 at 22:20







  • 1





    This may help.

    – mattdns
    Jul 17 '18 at 23:54
















I'm voting to close this question as off-topic because it is not about programming, as established in the help center. However, you may consider doing additional research on multi-class classification and logistic regression.

– E_net4
Jul 17 '18 at 22:20






I'm voting to close this question as off-topic because it is not about programming, as established in the help center. However, you may consider doing additional research on multi-class classification and logistic regression.

– E_net4
Jul 17 '18 at 22:20





1




1





This may help.

– mattdns
Jul 17 '18 at 23:54





This may help.

– mattdns
Jul 17 '18 at 23:54












1 Answer
1






active

oldest

votes


















1














One hot vector is used in cases where output is not cardinal. Lets assume you encode your output as integer giving each label a number.



The integer values have a natural ordered relationship between each other and machine learning algorithms may be able to understand and harness this relationship, but your labels may be unrelated. There may be no similarity in your labels. For categorical variables where no such ordinal relationship exists, the integer encoding is not good.



In fact, using this encoding and allowing the model to assume a natural ordering between categories may result in unexpected results where model predictions are halfway between categories categories.



What a mean by that?



The idea is that if we train an ML algorithm - for example a neural network - it’s going to think that a cat (which is 1) is halfway between a dog and a bird, because they are 0 and 2 respectively. We don’t want that; it’s not true and it’s an extra thing for the algorithm to learn.



The same may happen when data is encoded in n dimensional space and vector has a continuous value. The result may be hard to interpret and map back to labels.



In this case, a one-hot encoding can be applied to label representation as it has clear interpretation and its values are separated each is in different dimension.



If you need more information or would like to see the reason for one-hot encoding for the perspective of loss function see https://www.linkedin.com/pulse/why-using-one-hot-encoding-classifier-training-adwin-jahn/






share|improve this answer






















    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f51384911%2fone-hot-encoding-of-output-labels%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    One hot vector is used in cases where output is not cardinal. Lets assume you encode your output as integer giving each label a number.



    The integer values have a natural ordered relationship between each other and machine learning algorithms may be able to understand and harness this relationship, but your labels may be unrelated. There may be no similarity in your labels. For categorical variables where no such ordinal relationship exists, the integer encoding is not good.



    In fact, using this encoding and allowing the model to assume a natural ordering between categories may result in unexpected results where model predictions are halfway between categories categories.



    What a mean by that?



    The idea is that if we train an ML algorithm - for example a neural network - it’s going to think that a cat (which is 1) is halfway between a dog and a bird, because they are 0 and 2 respectively. We don’t want that; it’s not true and it’s an extra thing for the algorithm to learn.



    The same may happen when data is encoded in n dimensional space and vector has a continuous value. The result may be hard to interpret and map back to labels.



    In this case, a one-hot encoding can be applied to label representation as it has clear interpretation and its values are separated each is in different dimension.



    If you need more information or would like to see the reason for one-hot encoding for the perspective of loss function see https://www.linkedin.com/pulse/why-using-one-hot-encoding-classifier-training-adwin-jahn/






    share|improve this answer



























      1














      One hot vector is used in cases where output is not cardinal. Lets assume you encode your output as integer giving each label a number.



      The integer values have a natural ordered relationship between each other and machine learning algorithms may be able to understand and harness this relationship, but your labels may be unrelated. There may be no similarity in your labels. For categorical variables where no such ordinal relationship exists, the integer encoding is not good.



      In fact, using this encoding and allowing the model to assume a natural ordering between categories may result in unexpected results where model predictions are halfway between categories categories.



      What a mean by that?



      The idea is that if we train an ML algorithm - for example a neural network - it’s going to think that a cat (which is 1) is halfway between a dog and a bird, because they are 0 and 2 respectively. We don’t want that; it’s not true and it’s an extra thing for the algorithm to learn.



      The same may happen when data is encoded in n dimensional space and vector has a continuous value. The result may be hard to interpret and map back to labels.



      In this case, a one-hot encoding can be applied to label representation as it has clear interpretation and its values are separated each is in different dimension.



      If you need more information or would like to see the reason for one-hot encoding for the perspective of loss function see https://www.linkedin.com/pulse/why-using-one-hot-encoding-classifier-training-adwin-jahn/






      share|improve this answer

























        1












        1








        1







        One hot vector is used in cases where output is not cardinal. Lets assume you encode your output as integer giving each label a number.



        The integer values have a natural ordered relationship between each other and machine learning algorithms may be able to understand and harness this relationship, but your labels may be unrelated. There may be no similarity in your labels. For categorical variables where no such ordinal relationship exists, the integer encoding is not good.



        In fact, using this encoding and allowing the model to assume a natural ordering between categories may result in unexpected results where model predictions are halfway between categories categories.



        What a mean by that?



        The idea is that if we train an ML algorithm - for example a neural network - it’s going to think that a cat (which is 1) is halfway between a dog and a bird, because they are 0 and 2 respectively. We don’t want that; it’s not true and it’s an extra thing for the algorithm to learn.



        The same may happen when data is encoded in n dimensional space and vector has a continuous value. The result may be hard to interpret and map back to labels.



        In this case, a one-hot encoding can be applied to label representation as it has clear interpretation and its values are separated each is in different dimension.



        If you need more information or would like to see the reason for one-hot encoding for the perspective of loss function see https://www.linkedin.com/pulse/why-using-one-hot-encoding-classifier-training-adwin-jahn/






        share|improve this answer













        One hot vector is used in cases where output is not cardinal. Lets assume you encode your output as integer giving each label a number.



        The integer values have a natural ordered relationship between each other and machine learning algorithms may be able to understand and harness this relationship, but your labels may be unrelated. There may be no similarity in your labels. For categorical variables where no such ordinal relationship exists, the integer encoding is not good.



        In fact, using this encoding and allowing the model to assume a natural ordering between categories may result in unexpected results where model predictions are halfway between categories categories.



        What a mean by that?



        The idea is that if we train an ML algorithm - for example a neural network - it’s going to think that a cat (which is 1) is halfway between a dog and a bird, because they are 0 and 2 respectively. We don’t want that; it’s not true and it’s an extra thing for the algorithm to learn.



        The same may happen when data is encoded in n dimensional space and vector has a continuous value. The result may be hard to interpret and map back to labels.



        In this case, a one-hot encoding can be applied to label representation as it has clear interpretation and its values are separated each is in different dimension.



        If you need more information or would like to see the reason for one-hot encoding for the perspective of loss function see https://www.linkedin.com/pulse/why-using-one-hot-encoding-classifier-training-adwin-jahn/







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 14 '18 at 1:00









        RKORKO

        265




        265





























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f51384911%2fone-hot-encoding-of-output-labels%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            How to how show current date and time by default on contact form 7 in WordPress without taking input from user in datetimepicker

            Darth Vader #20

            Syphilis