Numpy and other library dependencies for Spark application on Kubernetes










1















I am running pyspark application, v2.4.0, on Kubernetes, my spark application depends on numpy and tensorflow modules, please suggest the way to add these dependencies to Spark executors.



I have checked the documentation, we can include the remote dependencies using --py-files, --jars etc. but nothing mentioned about library dependencies.










share|improve this question


























    1















    I am running pyspark application, v2.4.0, on Kubernetes, my spark application depends on numpy and tensorflow modules, please suggest the way to add these dependencies to Spark executors.



    I have checked the documentation, we can include the remote dependencies using --py-files, --jars etc. but nothing mentioned about library dependencies.










    share|improve this question
























      1












      1








      1


      1






      I am running pyspark application, v2.4.0, on Kubernetes, my spark application depends on numpy and tensorflow modules, please suggest the way to add these dependencies to Spark executors.



      I have checked the documentation, we can include the remote dependencies using --py-files, --jars etc. but nothing mentioned about library dependencies.










      share|improve this question














      I am running pyspark application, v2.4.0, on Kubernetes, my spark application depends on numpy and tensorflow modules, please suggest the way to add these dependencies to Spark executors.



      I have checked the documentation, we can include the remote dependencies using --py-files, --jars etc. but nothing mentioned about library dependencies.







      apache-spark kubernetes






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 13 '18 at 9:48









      Lakshman BattiniLakshman Battini

      1,099315




      1,099315






















          1 Answer
          1






          active

          oldest

          votes


















          0














          Found the way to add the library dependencies to Spark applications on K8S, thought of sharing it here.



          Mention the required dependencies installation commands in Dockerfile and rebuild the spark image, when we submit the spark job, new container will be instantiated with the dependencies as well.



          Dokerfile (/spark_folder_path/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/python/Dockerfile) contents:



          RUN apk add --no-cache python && 
          apk add --no-cache python3 &&
          python -m ensurepip &&
          python3 -m ensurepip &&
          # We remove ensurepip since it adds no functionality since pip is
          # installed on the image and it just takes up 1.6MB on the image
          rm -r /usr/lib/python*/ensurepip &&
          pip install --upgrade pip setuptools &&
          # You may install with python3 packages by using pip3.6
          pip install numpy &&
          # Removed the .cache to save space
          rm -r /root/.cache





          share|improve this answer






















            Your Answer






            StackExchange.ifUsing("editor", function ()
            StackExchange.using("externalEditor", function ()
            StackExchange.using("snippets", function ()
            StackExchange.snippets.init();
            );
            );
            , "code-snippets");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "1"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53278159%2fnumpy-and-other-library-dependencies-for-spark-application-on-kubernetes%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            0














            Found the way to add the library dependencies to Spark applications on K8S, thought of sharing it here.



            Mention the required dependencies installation commands in Dockerfile and rebuild the spark image, when we submit the spark job, new container will be instantiated with the dependencies as well.



            Dokerfile (/spark_folder_path/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/python/Dockerfile) contents:



            RUN apk add --no-cache python && 
            apk add --no-cache python3 &&
            python -m ensurepip &&
            python3 -m ensurepip &&
            # We remove ensurepip since it adds no functionality since pip is
            # installed on the image and it just takes up 1.6MB on the image
            rm -r /usr/lib/python*/ensurepip &&
            pip install --upgrade pip setuptools &&
            # You may install with python3 packages by using pip3.6
            pip install numpy &&
            # Removed the .cache to save space
            rm -r /root/.cache





            share|improve this answer



























              0














              Found the way to add the library dependencies to Spark applications on K8S, thought of sharing it here.



              Mention the required dependencies installation commands in Dockerfile and rebuild the spark image, when we submit the spark job, new container will be instantiated with the dependencies as well.



              Dokerfile (/spark_folder_path/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/python/Dockerfile) contents:



              RUN apk add --no-cache python && 
              apk add --no-cache python3 &&
              python -m ensurepip &&
              python3 -m ensurepip &&
              # We remove ensurepip since it adds no functionality since pip is
              # installed on the image and it just takes up 1.6MB on the image
              rm -r /usr/lib/python*/ensurepip &&
              pip install --upgrade pip setuptools &&
              # You may install with python3 packages by using pip3.6
              pip install numpy &&
              # Removed the .cache to save space
              rm -r /root/.cache





              share|improve this answer

























                0












                0








                0







                Found the way to add the library dependencies to Spark applications on K8S, thought of sharing it here.



                Mention the required dependencies installation commands in Dockerfile and rebuild the spark image, when we submit the spark job, new container will be instantiated with the dependencies as well.



                Dokerfile (/spark_folder_path/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/python/Dockerfile) contents:



                RUN apk add --no-cache python && 
                apk add --no-cache python3 &&
                python -m ensurepip &&
                python3 -m ensurepip &&
                # We remove ensurepip since it adds no functionality since pip is
                # installed on the image and it just takes up 1.6MB on the image
                rm -r /usr/lib/python*/ensurepip &&
                pip install --upgrade pip setuptools &&
                # You may install with python3 packages by using pip3.6
                pip install numpy &&
                # Removed the .cache to save space
                rm -r /root/.cache





                share|improve this answer













                Found the way to add the library dependencies to Spark applications on K8S, thought of sharing it here.



                Mention the required dependencies installation commands in Dockerfile and rebuild the spark image, when we submit the spark job, new container will be instantiated with the dependencies as well.



                Dokerfile (/spark_folder_path/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/python/Dockerfile) contents:



                RUN apk add --no-cache python && 
                apk add --no-cache python3 &&
                python -m ensurepip &&
                python3 -m ensurepip &&
                # We remove ensurepip since it adds no functionality since pip is
                # installed on the image and it just takes up 1.6MB on the image
                rm -r /usr/lib/python*/ensurepip &&
                pip install --upgrade pip setuptools &&
                # You may install with python3 packages by using pip3.6
                pip install numpy &&
                # Removed the .cache to save space
                rm -r /root/.cache






                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Nov 17 '18 at 10:19









                Lakshman BattiniLakshman Battini

                1,099315




                1,099315



























                    draft saved

                    draft discarded
















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53278159%2fnumpy-and-other-library-dependencies-for-spark-application-on-kubernetes%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    How to how show current date and time by default on contact form 7 in WordPress without taking input from user in datetimepicker

                    Syphilis

                    Darth Vader #20