Numpy and other library dependencies for Spark application on Kubernetes
I am running pyspark application, v2.4.0, on Kubernetes, my spark application depends on numpy and tensorflow modules, please suggest the way to add these dependencies to Spark executors.
I have checked the documentation, we can include the remote dependencies using --py-files, --jars etc. but nothing mentioned about library dependencies.
apache-spark kubernetes
add a comment |
I am running pyspark application, v2.4.0, on Kubernetes, my spark application depends on numpy and tensorflow modules, please suggest the way to add these dependencies to Spark executors.
I have checked the documentation, we can include the remote dependencies using --py-files, --jars etc. but nothing mentioned about library dependencies.
apache-spark kubernetes
add a comment |
I am running pyspark application, v2.4.0, on Kubernetes, my spark application depends on numpy and tensorflow modules, please suggest the way to add these dependencies to Spark executors.
I have checked the documentation, we can include the remote dependencies using --py-files, --jars etc. but nothing mentioned about library dependencies.
apache-spark kubernetes
I am running pyspark application, v2.4.0, on Kubernetes, my spark application depends on numpy and tensorflow modules, please suggest the way to add these dependencies to Spark executors.
I have checked the documentation, we can include the remote dependencies using --py-files, --jars etc. but nothing mentioned about library dependencies.
apache-spark kubernetes
apache-spark kubernetes
asked Nov 13 '18 at 9:48
Lakshman BattiniLakshman Battini
1,099315
1,099315
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
Found the way to add the library dependencies to Spark applications on K8S, thought of sharing it here.
Mention the required dependencies installation commands in Dockerfile and rebuild the spark image, when we submit the spark job, new container will be instantiated with the dependencies as well.
Dokerfile (/spark_folder_path/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/python/Dockerfile) contents:
RUN apk add --no-cache python &&
apk add --no-cache python3 &&
python -m ensurepip &&
python3 -m ensurepip &&
# We remove ensurepip since it adds no functionality since pip is
# installed on the image and it just takes up 1.6MB on the image
rm -r /usr/lib/python*/ensurepip &&
pip install --upgrade pip setuptools &&
# You may install with python3 packages by using pip3.6
pip install numpy &&
# Removed the .cache to save space
rm -r /root/.cache
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53278159%2fnumpy-and-other-library-dependencies-for-spark-application-on-kubernetes%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Found the way to add the library dependencies to Spark applications on K8S, thought of sharing it here.
Mention the required dependencies installation commands in Dockerfile and rebuild the spark image, when we submit the spark job, new container will be instantiated with the dependencies as well.
Dokerfile (/spark_folder_path/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/python/Dockerfile) contents:
RUN apk add --no-cache python &&
apk add --no-cache python3 &&
python -m ensurepip &&
python3 -m ensurepip &&
# We remove ensurepip since it adds no functionality since pip is
# installed on the image and it just takes up 1.6MB on the image
rm -r /usr/lib/python*/ensurepip &&
pip install --upgrade pip setuptools &&
# You may install with python3 packages by using pip3.6
pip install numpy &&
# Removed the .cache to save space
rm -r /root/.cache
add a comment |
Found the way to add the library dependencies to Spark applications on K8S, thought of sharing it here.
Mention the required dependencies installation commands in Dockerfile and rebuild the spark image, when we submit the spark job, new container will be instantiated with the dependencies as well.
Dokerfile (/spark_folder_path/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/python/Dockerfile) contents:
RUN apk add --no-cache python &&
apk add --no-cache python3 &&
python -m ensurepip &&
python3 -m ensurepip &&
# We remove ensurepip since it adds no functionality since pip is
# installed on the image and it just takes up 1.6MB on the image
rm -r /usr/lib/python*/ensurepip &&
pip install --upgrade pip setuptools &&
# You may install with python3 packages by using pip3.6
pip install numpy &&
# Removed the .cache to save space
rm -r /root/.cache
add a comment |
Found the way to add the library dependencies to Spark applications on K8S, thought of sharing it here.
Mention the required dependencies installation commands in Dockerfile and rebuild the spark image, when we submit the spark job, new container will be instantiated with the dependencies as well.
Dokerfile (/spark_folder_path/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/python/Dockerfile) contents:
RUN apk add --no-cache python &&
apk add --no-cache python3 &&
python -m ensurepip &&
python3 -m ensurepip &&
# We remove ensurepip since it adds no functionality since pip is
# installed on the image and it just takes up 1.6MB on the image
rm -r /usr/lib/python*/ensurepip &&
pip install --upgrade pip setuptools &&
# You may install with python3 packages by using pip3.6
pip install numpy &&
# Removed the .cache to save space
rm -r /root/.cache
Found the way to add the library dependencies to Spark applications on K8S, thought of sharing it here.
Mention the required dependencies installation commands in Dockerfile and rebuild the spark image, when we submit the spark job, new container will be instantiated with the dependencies as well.
Dokerfile (/spark_folder_path/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/python/Dockerfile) contents:
RUN apk add --no-cache python &&
apk add --no-cache python3 &&
python -m ensurepip &&
python3 -m ensurepip &&
# We remove ensurepip since it adds no functionality since pip is
# installed on the image and it just takes up 1.6MB on the image
rm -r /usr/lib/python*/ensurepip &&
pip install --upgrade pip setuptools &&
# You may install with python3 packages by using pip3.6
pip install numpy &&
# Removed the .cache to save space
rm -r /root/.cache
answered Nov 17 '18 at 10:19
Lakshman BattiniLakshman Battini
1,099315
1,099315
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53278159%2fnumpy-and-other-library-dependencies-for-spark-application-on-kubernetes%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown