Add new fitted stage to a exitsting PipelineModel without fitting again
I would like to concatenate several trained Pipelines to one, which is similar to
"Spark add new fitted stage to a exitsting PipelineModel without fitting again" however the solution as below is for PySpark.
> pipe_model_new = PipelineModel(stages = [pipe_model , pipe_model2])
> final_df = pipe_model_new.transform(df1)
In Apache Spark 2.0 "PipelineModel"'s constructor is marked as private, hence it can not be called outside. While in "Pipeline" class, only "fit" method creates "PipelineModel"
val pipelineModel = new PipelineModel("randomUID", trainedStages)
val df_final_full = pipelineModel.transform(df)
Error:(266, 26) constructor PipelineModel in class PipelineModel cannot be accessed in class Preprocessor
val pipelineModel = new PipelineModel("randomUID", trainedStages)
apache-spark pipeline apache-spark-ml apache-spark-2.0
add a comment |
I would like to concatenate several trained Pipelines to one, which is similar to
"Spark add new fitted stage to a exitsting PipelineModel without fitting again" however the solution as below is for PySpark.
> pipe_model_new = PipelineModel(stages = [pipe_model , pipe_model2])
> final_df = pipe_model_new.transform(df1)
In Apache Spark 2.0 "PipelineModel"'s constructor is marked as private, hence it can not be called outside. While in "Pipeline" class, only "fit" method creates "PipelineModel"
val pipelineModel = new PipelineModel("randomUID", trainedStages)
val df_final_full = pipelineModel.transform(df)
Error:(266, 26) constructor PipelineModel in class PipelineModel cannot be accessed in class Preprocessor
val pipelineModel = new PipelineModel("randomUID", trainedStages)
apache-spark pipeline apache-spark-ml apache-spark-2.0
add a comment |
I would like to concatenate several trained Pipelines to one, which is similar to
"Spark add new fitted stage to a exitsting PipelineModel without fitting again" however the solution as below is for PySpark.
> pipe_model_new = PipelineModel(stages = [pipe_model , pipe_model2])
> final_df = pipe_model_new.transform(df1)
In Apache Spark 2.0 "PipelineModel"'s constructor is marked as private, hence it can not be called outside. While in "Pipeline" class, only "fit" method creates "PipelineModel"
val pipelineModel = new PipelineModel("randomUID", trainedStages)
val df_final_full = pipelineModel.transform(df)
Error:(266, 26) constructor PipelineModel in class PipelineModel cannot be accessed in class Preprocessor
val pipelineModel = new PipelineModel("randomUID", trainedStages)
apache-spark pipeline apache-spark-ml apache-spark-2.0
I would like to concatenate several trained Pipelines to one, which is similar to
"Spark add new fitted stage to a exitsting PipelineModel without fitting again" however the solution as below is for PySpark.
> pipe_model_new = PipelineModel(stages = [pipe_model , pipe_model2])
> final_df = pipe_model_new.transform(df1)
In Apache Spark 2.0 "PipelineModel"'s constructor is marked as private, hence it can not be called outside. While in "Pipeline" class, only "fit" method creates "PipelineModel"
val pipelineModel = new PipelineModel("randomUID", trainedStages)
val df_final_full = pipelineModel.transform(df)
Error:(266, 26) constructor PipelineModel in class PipelineModel cannot be accessed in class Preprocessor
val pipelineModel = new PipelineModel("randomUID", trainedStages)
apache-spark pipeline apache-spark-ml apache-spark-2.0
apache-spark pipeline apache-spark-ml apache-spark-2.0
edited Nov 28 '18 at 0:55
user10465355
1,7892416
1,7892416
asked Nov 12 '18 at 19:45
user1269298user1269298
152315
152315
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
There is nothing* wrong with using Pipeline
and invoking fit
method. If a stage is a Transfomer
, and PipelineModel
is**, fit
works like identity.
You can check relevant Python:
if isinstance(stage, Transformer):
transformers.append(stage)
dataset = stage.transform(dataset)
and Scala code:
This means that fitting process will only validate the schema and create a new PipelineModel
object.
case t: Transformer =>
t
* The only possible concern is presence of non-lazy Transformers
, though, with exception to deprecated OneHotEncoder
, Spark core API doesn't provide such.
** In Python:
from pyspark.ml import Transformer, PipelineModel
issubclass(PipelineModel, Transformer)
True
In Scala
import scala.reflect.runtime.universe.typeOf
import org.apache.spark.ml._
typeOf[PipelineModel] <:< typeOf[Transformer]
Boolean = true
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53269075%2fadd-new-fitted-stage-to-a-exitsting-pipelinemodel-without-fitting-again%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
There is nothing* wrong with using Pipeline
and invoking fit
method. If a stage is a Transfomer
, and PipelineModel
is**, fit
works like identity.
You can check relevant Python:
if isinstance(stage, Transformer):
transformers.append(stage)
dataset = stage.transform(dataset)
and Scala code:
This means that fitting process will only validate the schema and create a new PipelineModel
object.
case t: Transformer =>
t
* The only possible concern is presence of non-lazy Transformers
, though, with exception to deprecated OneHotEncoder
, Spark core API doesn't provide such.
** In Python:
from pyspark.ml import Transformer, PipelineModel
issubclass(PipelineModel, Transformer)
True
In Scala
import scala.reflect.runtime.universe.typeOf
import org.apache.spark.ml._
typeOf[PipelineModel] <:< typeOf[Transformer]
Boolean = true
add a comment |
There is nothing* wrong with using Pipeline
and invoking fit
method. If a stage is a Transfomer
, and PipelineModel
is**, fit
works like identity.
You can check relevant Python:
if isinstance(stage, Transformer):
transformers.append(stage)
dataset = stage.transform(dataset)
and Scala code:
This means that fitting process will only validate the schema and create a new PipelineModel
object.
case t: Transformer =>
t
* The only possible concern is presence of non-lazy Transformers
, though, with exception to deprecated OneHotEncoder
, Spark core API doesn't provide such.
** In Python:
from pyspark.ml import Transformer, PipelineModel
issubclass(PipelineModel, Transformer)
True
In Scala
import scala.reflect.runtime.universe.typeOf
import org.apache.spark.ml._
typeOf[PipelineModel] <:< typeOf[Transformer]
Boolean = true
add a comment |
There is nothing* wrong with using Pipeline
and invoking fit
method. If a stage is a Transfomer
, and PipelineModel
is**, fit
works like identity.
You can check relevant Python:
if isinstance(stage, Transformer):
transformers.append(stage)
dataset = stage.transform(dataset)
and Scala code:
This means that fitting process will only validate the schema and create a new PipelineModel
object.
case t: Transformer =>
t
* The only possible concern is presence of non-lazy Transformers
, though, with exception to deprecated OneHotEncoder
, Spark core API doesn't provide such.
** In Python:
from pyspark.ml import Transformer, PipelineModel
issubclass(PipelineModel, Transformer)
True
In Scala
import scala.reflect.runtime.universe.typeOf
import org.apache.spark.ml._
typeOf[PipelineModel] <:< typeOf[Transformer]
Boolean = true
There is nothing* wrong with using Pipeline
and invoking fit
method. If a stage is a Transfomer
, and PipelineModel
is**, fit
works like identity.
You can check relevant Python:
if isinstance(stage, Transformer):
transformers.append(stage)
dataset = stage.transform(dataset)
and Scala code:
This means that fitting process will only validate the schema and create a new PipelineModel
object.
case t: Transformer =>
t
* The only possible concern is presence of non-lazy Transformers
, though, with exception to deprecated OneHotEncoder
, Spark core API doesn't provide such.
** In Python:
from pyspark.ml import Transformer, PipelineModel
issubclass(PipelineModel, Transformer)
True
In Scala
import scala.reflect.runtime.universe.typeOf
import org.apache.spark.ml._
typeOf[PipelineModel] <:< typeOf[Transformer]
Boolean = true
edited Nov 13 '18 at 22:01
answered Nov 12 '18 at 20:26
user6910411user6910411
33.4k97499
33.4k97499
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53269075%2fadd-new-fitted-stage-to-a-exitsting-pipelinemodel-without-fitting-again%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown