Does enabling, CPU scheduling in YARN will really improve the parallel processing in spark?










0















YARN with capacity scheduler will take only memory into account when it is allocating resources for user requests If I submit a spark job like this "--master yarn --deploy-mode client --driver-memory 4g --executor-memory 4g --num-executors 1 --executor-cores 3", yarn will allocate an executor with 4gb memory and 1 vcpu, but when it is executing tasks, it will execute 3 tasks parallelly.



Is it using that single core alone to execute all tasks as a set of 3 at a time?



So If I enable CPU scheduling and CGroups (in HDP cluster), will yarn assign 3 vcpu cores and will that set of 3 tasks will get executed in each cpu? Will it really improve the processing time?



As for now, I could not enable CPU scheduling in my cluster (HDP 2.6.5 centos 7.5) due to the below error in starting node manager "Not able to enforce cpu weights; cannot write to cgroup at: /sys/fs/cgroup/cpu,cpuacct"










share|improve this question




























    0















    YARN with capacity scheduler will take only memory into account when it is allocating resources for user requests If I submit a spark job like this "--master yarn --deploy-mode client --driver-memory 4g --executor-memory 4g --num-executors 1 --executor-cores 3", yarn will allocate an executor with 4gb memory and 1 vcpu, but when it is executing tasks, it will execute 3 tasks parallelly.



    Is it using that single core alone to execute all tasks as a set of 3 at a time?



    So If I enable CPU scheduling and CGroups (in HDP cluster), will yarn assign 3 vcpu cores and will that set of 3 tasks will get executed in each cpu? Will it really improve the processing time?



    As for now, I could not enable CPU scheduling in my cluster (HDP 2.6.5 centos 7.5) due to the below error in starting node manager "Not able to enforce cpu weights; cannot write to cgroup at: /sys/fs/cgroup/cpu,cpuacct"










    share|improve this question


























      0












      0








      0


      1






      YARN with capacity scheduler will take only memory into account when it is allocating resources for user requests If I submit a spark job like this "--master yarn --deploy-mode client --driver-memory 4g --executor-memory 4g --num-executors 1 --executor-cores 3", yarn will allocate an executor with 4gb memory and 1 vcpu, but when it is executing tasks, it will execute 3 tasks parallelly.



      Is it using that single core alone to execute all tasks as a set of 3 at a time?



      So If I enable CPU scheduling and CGroups (in HDP cluster), will yarn assign 3 vcpu cores and will that set of 3 tasks will get executed in each cpu? Will it really improve the processing time?



      As for now, I could not enable CPU scheduling in my cluster (HDP 2.6.5 centos 7.5) due to the below error in starting node manager "Not able to enforce cpu weights; cannot write to cgroup at: /sys/fs/cgroup/cpu,cpuacct"










      share|improve this question
















      YARN with capacity scheduler will take only memory into account when it is allocating resources for user requests If I submit a spark job like this "--master yarn --deploy-mode client --driver-memory 4g --executor-memory 4g --num-executors 1 --executor-cores 3", yarn will allocate an executor with 4gb memory and 1 vcpu, but when it is executing tasks, it will execute 3 tasks parallelly.



      Is it using that single core alone to execute all tasks as a set of 3 at a time?



      So If I enable CPU scheduling and CGroups (in HDP cluster), will yarn assign 3 vcpu cores and will that set of 3 tasks will get executed in each cpu? Will it really improve the processing time?



      As for now, I could not enable CPU scheduling in my cluster (HDP 2.6.5 centos 7.5) due to the below error in starting node manager "Not able to enforce cpu weights; cannot write to cgroup at: /sys/fs/cgroup/cpu,cpuacct"







      apache-spark hadoop bigdata yarn hdp






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Dec 1 '18 at 3:01









      steveax

      14.5k53555




      14.5k53555










      asked Nov 12 '18 at 10:34









      Dharaneesh VrdDharaneesh Vrd

      86




      86






















          1 Answer
          1






          active

          oldest

          votes


















          0














          No, vcores and vcpus are really a logical construct that are not related to what is actually on the system but more closely related to how many processes are running. The OS (Linux in this case) will migrate work to all CPUs if the process is designed for this. Most long running Java applications will do this due to the multiple threads running.



          YARN doesn't control CPU cores unless you enable CGroups. The only thing YARN controls is the memory usage. The reason this doesn't matter is because typical Hadoop workloads are I/O bound not CPU bound.



          References



          • Using CGroups with YARN





          share|improve this answer






















            Your Answer






            StackExchange.ifUsing("editor", function ()
            StackExchange.using("externalEditor", function ()
            StackExchange.using("snippets", function ()
            StackExchange.snippets.init();
            );
            );
            , "code-snippets");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "1"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53260323%2fdoes-enabling-cpu-scheduling-in-yarn-will-really-improve-the-parallel-processin%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            0














            No, vcores and vcpus are really a logical construct that are not related to what is actually on the system but more closely related to how many processes are running. The OS (Linux in this case) will migrate work to all CPUs if the process is designed for this. Most long running Java applications will do this due to the multiple threads running.



            YARN doesn't control CPU cores unless you enable CGroups. The only thing YARN controls is the memory usage. The reason this doesn't matter is because typical Hadoop workloads are I/O bound not CPU bound.



            References



            • Using CGroups with YARN





            share|improve this answer



























              0














              No, vcores and vcpus are really a logical construct that are not related to what is actually on the system but more closely related to how many processes are running. The OS (Linux in this case) will migrate work to all CPUs if the process is designed for this. Most long running Java applications will do this due to the multiple threads running.



              YARN doesn't control CPU cores unless you enable CGroups. The only thing YARN controls is the memory usage. The reason this doesn't matter is because typical Hadoop workloads are I/O bound not CPU bound.



              References



              • Using CGroups with YARN





              share|improve this answer

























                0












                0








                0







                No, vcores and vcpus are really a logical construct that are not related to what is actually on the system but more closely related to how many processes are running. The OS (Linux in this case) will migrate work to all CPUs if the process is designed for this. Most long running Java applications will do this due to the multiple threads running.



                YARN doesn't control CPU cores unless you enable CGroups. The only thing YARN controls is the memory usage. The reason this doesn't matter is because typical Hadoop workloads are I/O bound not CPU bound.



                References



                • Using CGroups with YARN





                share|improve this answer













                No, vcores and vcpus are really a logical construct that are not related to what is actually on the system but more closely related to how many processes are running. The OS (Linux in this case) will migrate work to all CPUs if the process is designed for this. Most long running Java applications will do this due to the multiple threads running.



                YARN doesn't control CPU cores unless you enable CGroups. The only thing YARN controls is the memory usage. The reason this doesn't matter is because typical Hadoop workloads are I/O bound not CPU bound.



                References



                • Using CGroups with YARN






                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Nov 15 '18 at 22:20









                tk421tk421

                3,40431426




                3,40431426



























                    draft saved

                    draft discarded
















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53260323%2fdoes-enabling-cpu-scheduling-in-yarn-will-really-improve-the-parallel-processin%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Use pre created SQLite database for Android project in kotlin

                    Darth Vader #20

                    Ondo