Reinforcement Learning or Supervised Learning?










0















If lots of iterations are needed in a simulated environment before a reinforcement learning (RL) algorithm to work in real world, why we don’t use the same simulated environment to generate the labeled data and then use supervised learning methods instead of RL?










share|improve this question


























    0















    If lots of iterations are needed in a simulated environment before a reinforcement learning (RL) algorithm to work in real world, why we don’t use the same simulated environment to generate the labeled data and then use supervised learning methods instead of RL?










    share|improve this question
























      0












      0








      0








      If lots of iterations are needed in a simulated environment before a reinforcement learning (RL) algorithm to work in real world, why we don’t use the same simulated environment to generate the labeled data and then use supervised learning methods instead of RL?










      share|improve this question














      If lots of iterations are needed in a simulated environment before a reinforcement learning (RL) algorithm to work in real world, why we don’t use the same simulated environment to generate the labeled data and then use supervised learning methods instead of RL?







      reinforcement-learning supervised-learning






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 13 '18 at 23:35









      AliAli

      37134




      37134






















          3 Answers
          3






          active

          oldest

          votes


















          0














          The reason is because the two fields has a fundamental difference:



          One tries to replicate previous results and the other tries to be better than previous results.



          There are 4 fields in machine learning:



          • Supervised learning

          • Unsupervised Learning

          • Semi-supervised Learning

          • Reinforcement learning

          Let's talking about the two fields you asked for, and let's intuitively explore them with a real life example of archery.



          Supervised Learning



          For supervised learning, we would observe a master archer in action for maybe a week and record how far they pulled the bow string back, angle of shot, etc. And then we go home and build a model. In the most ideal scenario, our model becomes equally as good as the master archer. It cannot get better because the loss function in supervised learning is usually MSE or Cross entropy, so we simply try to replicate the feature label mapping. After building the model, we deploy it. And let's just say we're extra fancy and make it learn online. So we continually take data from the master archer and continue to learn to be exactly the same as the master archer.



          The biggest takeaway:



          We're trying to replicate the master archer simply because we think he is the best. Therefore we can never beat him.



          Reinforcement Learning



          In reinforcement learning, we simply build a model and let it try many different things. And we give it a reward / penalty depending on how far the arrow was from the bullseye. We are not trying to replicate any behaviour, instead, we try to find our own optimal behaviour. Because of this, we are not given any bias towards what we think the optimal shooting strategy is.



          Because RL does not have any prior knowledge, it may be difficult for RL to converge on difficult problems. Therefore, there is a method called apprenticeship learning / imitation learning, where we basically give the RL some trajectories of master archers just so it can have a starting point and begin to converge. But after that, RL will explore by taking random actions sometimes to try to find other optimal solutions. This is something that supervised learning cannot do. Because if you explore using supervised learning, you are basically saying by taking this action in this state is optimal. Then you try to make your model replicate it. But this scenario is wrong in supervised learning, and should instead be seen as an outlier in the data.



          Key differences of Supervised learning vs RL:



          • Supervised Learning replicates what's already done

          • Reinforcement learning can explore the state space, and do random actions. This then allows RL to be potentially better than the current best.


          Why we don’t use the same simulated environment to generate the labeled data and then use supervised learning methods instead of RL




          We do this for Deep RL because it has an experience replay buffer. But this is not possible for supervised learning because the concept of reward is lacking.



          Example: Walking in a maze.



          Reinforcement Learning



          Taking a right in square 3: Reward = 5



          Taking a left in square 3: Reward = 0



          Taking a up in square 3: Reward = -5



          Supervised Learning



          Taking a right in square 3



          Taking a left in square 3



          Taking a up in square 3



          When you try to make a decision in square 3, RL will know to go right. Supervised learning will be confused, because in one example, your data said to take a right in square 3, 2nd example says to take left, 3rd example says to go up. So it will never converge.






          share|improve this answer






























            0














            In supervised learning we have target labelled data which is assumed to be correct.



            In RL that's not the case we have nothing but rewards. Agents needs to figure itself which action to take by playing with the environment while observing the rewards it gets.






            share|improve this answer






























              0














              In short, supervised learning is passive learning, that is all the data is collected before you start training your model.



              However, reinforcement learning is active learning. In RL, usually you don't have much data at first and you collect new data as you are training your model. Your RL algorithm and model decide what specific data samples you can collect while training.






              share|improve this answer






















                Your Answer






                StackExchange.ifUsing("editor", function ()
                StackExchange.using("externalEditor", function ()
                StackExchange.using("snippets", function ()
                StackExchange.snippets.init();
                );
                );
                , "code-snippets");

                StackExchange.ready(function()
                var channelOptions =
                tags: "".split(" "),
                id: "1"
                ;
                initTagRenderer("".split(" "), "".split(" "), channelOptions);

                StackExchange.using("externalEditor", function()
                // Have to fire editor after snippets, if snippets enabled
                if (StackExchange.settings.snippets.snippetsEnabled)
                StackExchange.using("snippets", function()
                createEditor();
                );

                else
                createEditor();

                );

                function createEditor()
                StackExchange.prepareEditor(
                heartbeatType: 'answer',
                autoActivateHeartbeat: false,
                convertImagesToLinks: true,
                noModals: true,
                showLowRepImageUploadWarning: true,
                reputationToPostImages: 10,
                bindNavPrevention: true,
                postfix: "",
                imageUploader:
                brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
                contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
                allowUrls: true
                ,
                onDemand: true,
                discardSelector: ".discard-answer"
                ,immediatelyShowMarkdownHelp:true
                );



                );













                draft saved

                draft discarded


















                StackExchange.ready(
                function ()
                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53291055%2freinforcement-learning-or-supervised-learning%23new-answer', 'question_page');

                );

                Post as a guest















                Required, but never shown

























                3 Answers
                3






                active

                oldest

                votes








                3 Answers
                3






                active

                oldest

                votes









                active

                oldest

                votes






                active

                oldest

                votes









                0














                The reason is because the two fields has a fundamental difference:



                One tries to replicate previous results and the other tries to be better than previous results.



                There are 4 fields in machine learning:



                • Supervised learning

                • Unsupervised Learning

                • Semi-supervised Learning

                • Reinforcement learning

                Let's talking about the two fields you asked for, and let's intuitively explore them with a real life example of archery.



                Supervised Learning



                For supervised learning, we would observe a master archer in action for maybe a week and record how far they pulled the bow string back, angle of shot, etc. And then we go home and build a model. In the most ideal scenario, our model becomes equally as good as the master archer. It cannot get better because the loss function in supervised learning is usually MSE or Cross entropy, so we simply try to replicate the feature label mapping. After building the model, we deploy it. And let's just say we're extra fancy and make it learn online. So we continually take data from the master archer and continue to learn to be exactly the same as the master archer.



                The biggest takeaway:



                We're trying to replicate the master archer simply because we think he is the best. Therefore we can never beat him.



                Reinforcement Learning



                In reinforcement learning, we simply build a model and let it try many different things. And we give it a reward / penalty depending on how far the arrow was from the bullseye. We are not trying to replicate any behaviour, instead, we try to find our own optimal behaviour. Because of this, we are not given any bias towards what we think the optimal shooting strategy is.



                Because RL does not have any prior knowledge, it may be difficult for RL to converge on difficult problems. Therefore, there is a method called apprenticeship learning / imitation learning, where we basically give the RL some trajectories of master archers just so it can have a starting point and begin to converge. But after that, RL will explore by taking random actions sometimes to try to find other optimal solutions. This is something that supervised learning cannot do. Because if you explore using supervised learning, you are basically saying by taking this action in this state is optimal. Then you try to make your model replicate it. But this scenario is wrong in supervised learning, and should instead be seen as an outlier in the data.



                Key differences of Supervised learning vs RL:



                • Supervised Learning replicates what's already done

                • Reinforcement learning can explore the state space, and do random actions. This then allows RL to be potentially better than the current best.


                Why we don’t use the same simulated environment to generate the labeled data and then use supervised learning methods instead of RL




                We do this for Deep RL because it has an experience replay buffer. But this is not possible for supervised learning because the concept of reward is lacking.



                Example: Walking in a maze.



                Reinforcement Learning



                Taking a right in square 3: Reward = 5



                Taking a left in square 3: Reward = 0



                Taking a up in square 3: Reward = -5



                Supervised Learning



                Taking a right in square 3



                Taking a left in square 3



                Taking a up in square 3



                When you try to make a decision in square 3, RL will know to go right. Supervised learning will be confused, because in one example, your data said to take a right in square 3, 2nd example says to take left, 3rd example says to go up. So it will never converge.






                share|improve this answer



























                  0














                  The reason is because the two fields has a fundamental difference:



                  One tries to replicate previous results and the other tries to be better than previous results.



                  There are 4 fields in machine learning:



                  • Supervised learning

                  • Unsupervised Learning

                  • Semi-supervised Learning

                  • Reinforcement learning

                  Let's talking about the two fields you asked for, and let's intuitively explore them with a real life example of archery.



                  Supervised Learning



                  For supervised learning, we would observe a master archer in action for maybe a week and record how far they pulled the bow string back, angle of shot, etc. And then we go home and build a model. In the most ideal scenario, our model becomes equally as good as the master archer. It cannot get better because the loss function in supervised learning is usually MSE or Cross entropy, so we simply try to replicate the feature label mapping. After building the model, we deploy it. And let's just say we're extra fancy and make it learn online. So we continually take data from the master archer and continue to learn to be exactly the same as the master archer.



                  The biggest takeaway:



                  We're trying to replicate the master archer simply because we think he is the best. Therefore we can never beat him.



                  Reinforcement Learning



                  In reinforcement learning, we simply build a model and let it try many different things. And we give it a reward / penalty depending on how far the arrow was from the bullseye. We are not trying to replicate any behaviour, instead, we try to find our own optimal behaviour. Because of this, we are not given any bias towards what we think the optimal shooting strategy is.



                  Because RL does not have any prior knowledge, it may be difficult for RL to converge on difficult problems. Therefore, there is a method called apprenticeship learning / imitation learning, where we basically give the RL some trajectories of master archers just so it can have a starting point and begin to converge. But after that, RL will explore by taking random actions sometimes to try to find other optimal solutions. This is something that supervised learning cannot do. Because if you explore using supervised learning, you are basically saying by taking this action in this state is optimal. Then you try to make your model replicate it. But this scenario is wrong in supervised learning, and should instead be seen as an outlier in the data.



                  Key differences of Supervised learning vs RL:



                  • Supervised Learning replicates what's already done

                  • Reinforcement learning can explore the state space, and do random actions. This then allows RL to be potentially better than the current best.


                  Why we don’t use the same simulated environment to generate the labeled data and then use supervised learning methods instead of RL




                  We do this for Deep RL because it has an experience replay buffer. But this is not possible for supervised learning because the concept of reward is lacking.



                  Example: Walking in a maze.



                  Reinforcement Learning



                  Taking a right in square 3: Reward = 5



                  Taking a left in square 3: Reward = 0



                  Taking a up in square 3: Reward = -5



                  Supervised Learning



                  Taking a right in square 3



                  Taking a left in square 3



                  Taking a up in square 3



                  When you try to make a decision in square 3, RL will know to go right. Supervised learning will be confused, because in one example, your data said to take a right in square 3, 2nd example says to take left, 3rd example says to go up. So it will never converge.






                  share|improve this answer

























                    0












                    0








                    0







                    The reason is because the two fields has a fundamental difference:



                    One tries to replicate previous results and the other tries to be better than previous results.



                    There are 4 fields in machine learning:



                    • Supervised learning

                    • Unsupervised Learning

                    • Semi-supervised Learning

                    • Reinforcement learning

                    Let's talking about the two fields you asked for, and let's intuitively explore them with a real life example of archery.



                    Supervised Learning



                    For supervised learning, we would observe a master archer in action for maybe a week and record how far they pulled the bow string back, angle of shot, etc. And then we go home and build a model. In the most ideal scenario, our model becomes equally as good as the master archer. It cannot get better because the loss function in supervised learning is usually MSE or Cross entropy, so we simply try to replicate the feature label mapping. After building the model, we deploy it. And let's just say we're extra fancy and make it learn online. So we continually take data from the master archer and continue to learn to be exactly the same as the master archer.



                    The biggest takeaway:



                    We're trying to replicate the master archer simply because we think he is the best. Therefore we can never beat him.



                    Reinforcement Learning



                    In reinforcement learning, we simply build a model and let it try many different things. And we give it a reward / penalty depending on how far the arrow was from the bullseye. We are not trying to replicate any behaviour, instead, we try to find our own optimal behaviour. Because of this, we are not given any bias towards what we think the optimal shooting strategy is.



                    Because RL does not have any prior knowledge, it may be difficult for RL to converge on difficult problems. Therefore, there is a method called apprenticeship learning / imitation learning, where we basically give the RL some trajectories of master archers just so it can have a starting point and begin to converge. But after that, RL will explore by taking random actions sometimes to try to find other optimal solutions. This is something that supervised learning cannot do. Because if you explore using supervised learning, you are basically saying by taking this action in this state is optimal. Then you try to make your model replicate it. But this scenario is wrong in supervised learning, and should instead be seen as an outlier in the data.



                    Key differences of Supervised learning vs RL:



                    • Supervised Learning replicates what's already done

                    • Reinforcement learning can explore the state space, and do random actions. This then allows RL to be potentially better than the current best.


                    Why we don’t use the same simulated environment to generate the labeled data and then use supervised learning methods instead of RL




                    We do this for Deep RL because it has an experience replay buffer. But this is not possible for supervised learning because the concept of reward is lacking.



                    Example: Walking in a maze.



                    Reinforcement Learning



                    Taking a right in square 3: Reward = 5



                    Taking a left in square 3: Reward = 0



                    Taking a up in square 3: Reward = -5



                    Supervised Learning



                    Taking a right in square 3



                    Taking a left in square 3



                    Taking a up in square 3



                    When you try to make a decision in square 3, RL will know to go right. Supervised learning will be confused, because in one example, your data said to take a right in square 3, 2nd example says to take left, 3rd example says to go up. So it will never converge.






                    share|improve this answer













                    The reason is because the two fields has a fundamental difference:



                    One tries to replicate previous results and the other tries to be better than previous results.



                    There are 4 fields in machine learning:



                    • Supervised learning

                    • Unsupervised Learning

                    • Semi-supervised Learning

                    • Reinforcement learning

                    Let's talking about the two fields you asked for, and let's intuitively explore them with a real life example of archery.



                    Supervised Learning



                    For supervised learning, we would observe a master archer in action for maybe a week and record how far they pulled the bow string back, angle of shot, etc. And then we go home and build a model. In the most ideal scenario, our model becomes equally as good as the master archer. It cannot get better because the loss function in supervised learning is usually MSE or Cross entropy, so we simply try to replicate the feature label mapping. After building the model, we deploy it. And let's just say we're extra fancy and make it learn online. So we continually take data from the master archer and continue to learn to be exactly the same as the master archer.



                    The biggest takeaway:



                    We're trying to replicate the master archer simply because we think he is the best. Therefore we can never beat him.



                    Reinforcement Learning



                    In reinforcement learning, we simply build a model and let it try many different things. And we give it a reward / penalty depending on how far the arrow was from the bullseye. We are not trying to replicate any behaviour, instead, we try to find our own optimal behaviour. Because of this, we are not given any bias towards what we think the optimal shooting strategy is.



                    Because RL does not have any prior knowledge, it may be difficult for RL to converge on difficult problems. Therefore, there is a method called apprenticeship learning / imitation learning, where we basically give the RL some trajectories of master archers just so it can have a starting point and begin to converge. But after that, RL will explore by taking random actions sometimes to try to find other optimal solutions. This is something that supervised learning cannot do. Because if you explore using supervised learning, you are basically saying by taking this action in this state is optimal. Then you try to make your model replicate it. But this scenario is wrong in supervised learning, and should instead be seen as an outlier in the data.



                    Key differences of Supervised learning vs RL:



                    • Supervised Learning replicates what's already done

                    • Reinforcement learning can explore the state space, and do random actions. This then allows RL to be potentially better than the current best.


                    Why we don’t use the same simulated environment to generate the labeled data and then use supervised learning methods instead of RL




                    We do this for Deep RL because it has an experience replay buffer. But this is not possible for supervised learning because the concept of reward is lacking.



                    Example: Walking in a maze.



                    Reinforcement Learning



                    Taking a right in square 3: Reward = 5



                    Taking a left in square 3: Reward = 0



                    Taking a up in square 3: Reward = -5



                    Supervised Learning



                    Taking a right in square 3



                    Taking a left in square 3



                    Taking a up in square 3



                    When you try to make a decision in square 3, RL will know to go right. Supervised learning will be confused, because in one example, your data said to take a right in square 3, 2nd example says to take left, 3rd example says to go up. So it will never converge.







                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered Nov 14 '18 at 0:06









                    Rui NianRui Nian

                    530311




                    530311























                        0














                        In supervised learning we have target labelled data which is assumed to be correct.



                        In RL that's not the case we have nothing but rewards. Agents needs to figure itself which action to take by playing with the environment while observing the rewards it gets.






                        share|improve this answer



























                          0














                          In supervised learning we have target labelled data which is assumed to be correct.



                          In RL that's not the case we have nothing but rewards. Agents needs to figure itself which action to take by playing with the environment while observing the rewards it gets.






                          share|improve this answer

























                            0












                            0








                            0







                            In supervised learning we have target labelled data which is assumed to be correct.



                            In RL that's not the case we have nothing but rewards. Agents needs to figure itself which action to take by playing with the environment while observing the rewards it gets.






                            share|improve this answer













                            In supervised learning we have target labelled data which is assumed to be correct.



                            In RL that's not the case we have nothing but rewards. Agents needs to figure itself which action to take by playing with the environment while observing the rewards it gets.







                            share|improve this answer












                            share|improve this answer



                            share|improve this answer










                            answered Nov 28 '18 at 5:59









                            Karthick vadivelKarthick vadivel

                            82




                            82





















                                0














                                In short, supervised learning is passive learning, that is all the data is collected before you start training your model.



                                However, reinforcement learning is active learning. In RL, usually you don't have much data at first and you collect new data as you are training your model. Your RL algorithm and model decide what specific data samples you can collect while training.






                                share|improve this answer



























                                  0














                                  In short, supervised learning is passive learning, that is all the data is collected before you start training your model.



                                  However, reinforcement learning is active learning. In RL, usually you don't have much data at first and you collect new data as you are training your model. Your RL algorithm and model decide what specific data samples you can collect while training.






                                  share|improve this answer

























                                    0












                                    0








                                    0







                                    In short, supervised learning is passive learning, that is all the data is collected before you start training your model.



                                    However, reinforcement learning is active learning. In RL, usually you don't have much data at first and you collect new data as you are training your model. Your RL algorithm and model decide what specific data samples you can collect while training.






                                    share|improve this answer













                                    In short, supervised learning is passive learning, that is all the data is collected before you start training your model.



                                    However, reinforcement learning is active learning. In RL, usually you don't have much data at first and you collect new data as you are training your model. Your RL algorithm and model decide what specific data samples you can collect while training.







                                    share|improve this answer












                                    share|improve this answer



                                    share|improve this answer










                                    answered Jan 17 at 20:41









                                    yuren zhongyuren zhong

                                    937




                                    937



























                                        draft saved

                                        draft discarded
















































                                        Thanks for contributing an answer to Stack Overflow!


                                        • Please be sure to answer the question. Provide details and share your research!

                                        But avoid


                                        • Asking for help, clarification, or responding to other answers.

                                        • Making statements based on opinion; back them up with references or personal experience.

                                        To learn more, see our tips on writing great answers.




                                        draft saved


                                        draft discarded














                                        StackExchange.ready(
                                        function ()
                                        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53291055%2freinforcement-learning-or-supervised-learning%23new-answer', 'question_page');

                                        );

                                        Post as a guest















                                        Required, but never shown





















































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown

































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown







                                        Popular posts from this blog

                                        Use pre created SQLite database for Android project in kotlin

                                        Darth Vader #20

                                        Ondo