Add new fitted stage to a exitsting PipelineModel without fitting again










3















I would like to concatenate several trained Pipelines to one, which is similar to
"Spark add new fitted stage to a exitsting PipelineModel without fitting again" however the solution as below is for PySpark.



> pipe_model_new = PipelineModel(stages = [pipe_model , pipe_model2])
> final_df = pipe_model_new.transform(df1)


In Apache Spark 2.0 "PipelineModel"'s constructor is marked as private, hence it can not be called outside. While in "Pipeline" class, only "fit" method creates "PipelineModel"



val pipelineModel = new PipelineModel("randomUID", trainedStages)
val df_final_full = pipelineModel.transform(df)



Error:(266, 26) constructor PipelineModel in class PipelineModel cannot be accessed in class Preprocessor
val pipelineModel = new PipelineModel("randomUID", trainedStages)










share|improve this question




























    3















    I would like to concatenate several trained Pipelines to one, which is similar to
    "Spark add new fitted stage to a exitsting PipelineModel without fitting again" however the solution as below is for PySpark.



    > pipe_model_new = PipelineModel(stages = [pipe_model , pipe_model2])
    > final_df = pipe_model_new.transform(df1)


    In Apache Spark 2.0 "PipelineModel"'s constructor is marked as private, hence it can not be called outside. While in "Pipeline" class, only "fit" method creates "PipelineModel"



    val pipelineModel = new PipelineModel("randomUID", trainedStages)
    val df_final_full = pipelineModel.transform(df)



    Error:(266, 26) constructor PipelineModel in class PipelineModel cannot be accessed in class Preprocessor
    val pipelineModel = new PipelineModel("randomUID", trainedStages)










    share|improve this question


























      3












      3








      3








      I would like to concatenate several trained Pipelines to one, which is similar to
      "Spark add new fitted stage to a exitsting PipelineModel without fitting again" however the solution as below is for PySpark.



      > pipe_model_new = PipelineModel(stages = [pipe_model , pipe_model2])
      > final_df = pipe_model_new.transform(df1)


      In Apache Spark 2.0 "PipelineModel"'s constructor is marked as private, hence it can not be called outside. While in "Pipeline" class, only "fit" method creates "PipelineModel"



      val pipelineModel = new PipelineModel("randomUID", trainedStages)
      val df_final_full = pipelineModel.transform(df)



      Error:(266, 26) constructor PipelineModel in class PipelineModel cannot be accessed in class Preprocessor
      val pipelineModel = new PipelineModel("randomUID", trainedStages)










      share|improve this question
















      I would like to concatenate several trained Pipelines to one, which is similar to
      "Spark add new fitted stage to a exitsting PipelineModel without fitting again" however the solution as below is for PySpark.



      > pipe_model_new = PipelineModel(stages = [pipe_model , pipe_model2])
      > final_df = pipe_model_new.transform(df1)


      In Apache Spark 2.0 "PipelineModel"'s constructor is marked as private, hence it can not be called outside. While in "Pipeline" class, only "fit" method creates "PipelineModel"



      val pipelineModel = new PipelineModel("randomUID", trainedStages)
      val df_final_full = pipelineModel.transform(df)



      Error:(266, 26) constructor PipelineModel in class PipelineModel cannot be accessed in class Preprocessor
      val pipelineModel = new PipelineModel("randomUID", trainedStages)







      apache-spark pipeline apache-spark-ml apache-spark-2.0






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 28 '18 at 0:55









      user10465355

      1,7892416




      1,7892416










      asked Nov 12 '18 at 19:45









      user1269298user1269298

      152315




      152315






















          1 Answer
          1






          active

          oldest

          votes


















          1














          There is nothing* wrong with using Pipeline and invoking fit method. If a stage is a Transfomer, and PipelineModel is**, fit works like identity.



          You can check relevant Python:



          if isinstance(stage, Transformer):
          transformers.append(stage)
          dataset = stage.transform(dataset)


          and Scala code:



          This means that fitting process will only validate the schema and create a new PipelineModel object.



          case t: Transformer =>
          t



          * The only possible concern is presence of non-lazy Transformers, though, with exception to deprecated OneHotEncoder, Spark core API doesn't provide such.



          ** In Python:



          from pyspark.ml import Transformer, PipelineModel

          issubclass(PipelineModel, Transformer)


          True 


          In Scala



          import scala.reflect.runtime.universe.typeOf
          import org.apache.spark.ml._

          typeOf[PipelineModel] <:< typeOf[Transformer]




          Boolean = true





          share|improve this answer
























            Your Answer






            StackExchange.ifUsing("editor", function ()
            StackExchange.using("externalEditor", function ()
            StackExchange.using("snippets", function ()
            StackExchange.snippets.init();
            );
            );
            , "code-snippets");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "1"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53269075%2fadd-new-fitted-stage-to-a-exitsting-pipelinemodel-without-fitting-again%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            1














            There is nothing* wrong with using Pipeline and invoking fit method. If a stage is a Transfomer, and PipelineModel is**, fit works like identity.



            You can check relevant Python:



            if isinstance(stage, Transformer):
            transformers.append(stage)
            dataset = stage.transform(dataset)


            and Scala code:



            This means that fitting process will only validate the schema and create a new PipelineModel object.



            case t: Transformer =>
            t



            * The only possible concern is presence of non-lazy Transformers, though, with exception to deprecated OneHotEncoder, Spark core API doesn't provide such.



            ** In Python:



            from pyspark.ml import Transformer, PipelineModel

            issubclass(PipelineModel, Transformer)


            True 


            In Scala



            import scala.reflect.runtime.universe.typeOf
            import org.apache.spark.ml._

            typeOf[PipelineModel] <:< typeOf[Transformer]




            Boolean = true





            share|improve this answer





























              1














              There is nothing* wrong with using Pipeline and invoking fit method. If a stage is a Transfomer, and PipelineModel is**, fit works like identity.



              You can check relevant Python:



              if isinstance(stage, Transformer):
              transformers.append(stage)
              dataset = stage.transform(dataset)


              and Scala code:



              This means that fitting process will only validate the schema and create a new PipelineModel object.



              case t: Transformer =>
              t



              * The only possible concern is presence of non-lazy Transformers, though, with exception to deprecated OneHotEncoder, Spark core API doesn't provide such.



              ** In Python:



              from pyspark.ml import Transformer, PipelineModel

              issubclass(PipelineModel, Transformer)


              True 


              In Scala



              import scala.reflect.runtime.universe.typeOf
              import org.apache.spark.ml._

              typeOf[PipelineModel] <:< typeOf[Transformer]




              Boolean = true





              share|improve this answer



























                1












                1








                1







                There is nothing* wrong with using Pipeline and invoking fit method. If a stage is a Transfomer, and PipelineModel is**, fit works like identity.



                You can check relevant Python:



                if isinstance(stage, Transformer):
                transformers.append(stage)
                dataset = stage.transform(dataset)


                and Scala code:



                This means that fitting process will only validate the schema and create a new PipelineModel object.



                case t: Transformer =>
                t



                * The only possible concern is presence of non-lazy Transformers, though, with exception to deprecated OneHotEncoder, Spark core API doesn't provide such.



                ** In Python:



                from pyspark.ml import Transformer, PipelineModel

                issubclass(PipelineModel, Transformer)


                True 


                In Scala



                import scala.reflect.runtime.universe.typeOf
                import org.apache.spark.ml._

                typeOf[PipelineModel] <:< typeOf[Transformer]




                Boolean = true





                share|improve this answer















                There is nothing* wrong with using Pipeline and invoking fit method. If a stage is a Transfomer, and PipelineModel is**, fit works like identity.



                You can check relevant Python:



                if isinstance(stage, Transformer):
                transformers.append(stage)
                dataset = stage.transform(dataset)


                and Scala code:



                This means that fitting process will only validate the schema and create a new PipelineModel object.



                case t: Transformer =>
                t



                * The only possible concern is presence of non-lazy Transformers, though, with exception to deprecated OneHotEncoder, Spark core API doesn't provide such.



                ** In Python:



                from pyspark.ml import Transformer, PipelineModel

                issubclass(PipelineModel, Transformer)


                True 


                In Scala



                import scala.reflect.runtime.universe.typeOf
                import org.apache.spark.ml._

                typeOf[PipelineModel] <:< typeOf[Transformer]




                Boolean = true






                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Nov 13 '18 at 22:01

























                answered Nov 12 '18 at 20:26









                user6910411user6910411

                33.4k97499




                33.4k97499



























                    draft saved

                    draft discarded
















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53269075%2fadd-new-fitted-stage-to-a-exitsting-pipelinemodel-without-fitting-again%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Darth Vader #20

                    How to how show current date and time by default on contact form 7 in WordPress without taking input from user in datetimepicker

                    Ondo