Linear Regression - Predict ŷ









up vote
-6
down vote

favorite












I'm trying to plot a scatter plot of the values of actual sales (y) and predicted sales (ŷ).



I have imported the csv file and currently the codes I have for the linear regression model is:



result = smf.ols('sales ~ discount + holiday + product', data=data).fit()
print(result.summary())


Since, I only have the actual sales values, how do I find the predicted sales (ŷ) values to plot the scatter plot? I have tried researching and found lm.predict() and result.predict(). Is there a difference? lm = LinearRegression()
Thank you in advance!










share|improve this question























  • Please clarify what you mean by ‚predicted sales‘. Why do you make a regression if you do not consider it to be the prediction?
    – MisterMiyagi
    Nov 10 at 10:30










  • Predicted sales based on all the x variables in the regression model so that I can plot the actual sales and predicted sales on a scatter plot.
    – Smile
    Nov 10 at 10:38










  • I dont really understand the downvotes here. You can get your predicted values by calling result.predict(), which will be your yhat values
    – Simon
    Nov 10 at 10:45










  • @Simon The question leaves it entirely unclear what problem there actually is. The problem itself is trivial and the two variants of ‚predict‘ are not qualified - it is pretty difficult to tell the difference without knowing what the things even are.
    – MisterMiyagi
    Nov 10 at 10:52














up vote
-6
down vote

favorite












I'm trying to plot a scatter plot of the values of actual sales (y) and predicted sales (ŷ).



I have imported the csv file and currently the codes I have for the linear regression model is:



result = smf.ols('sales ~ discount + holiday + product', data=data).fit()
print(result.summary())


Since, I only have the actual sales values, how do I find the predicted sales (ŷ) values to plot the scatter plot? I have tried researching and found lm.predict() and result.predict(). Is there a difference? lm = LinearRegression()
Thank you in advance!










share|improve this question























  • Please clarify what you mean by ‚predicted sales‘. Why do you make a regression if you do not consider it to be the prediction?
    – MisterMiyagi
    Nov 10 at 10:30










  • Predicted sales based on all the x variables in the regression model so that I can plot the actual sales and predicted sales on a scatter plot.
    – Smile
    Nov 10 at 10:38










  • I dont really understand the downvotes here. You can get your predicted values by calling result.predict(), which will be your yhat values
    – Simon
    Nov 10 at 10:45










  • @Simon The question leaves it entirely unclear what problem there actually is. The problem itself is trivial and the two variants of ‚predict‘ are not qualified - it is pretty difficult to tell the difference without knowing what the things even are.
    – MisterMiyagi
    Nov 10 at 10:52












up vote
-6
down vote

favorite









up vote
-6
down vote

favorite











I'm trying to plot a scatter plot of the values of actual sales (y) and predicted sales (ŷ).



I have imported the csv file and currently the codes I have for the linear regression model is:



result = smf.ols('sales ~ discount + holiday + product', data=data).fit()
print(result.summary())


Since, I only have the actual sales values, how do I find the predicted sales (ŷ) values to plot the scatter plot? I have tried researching and found lm.predict() and result.predict(). Is there a difference? lm = LinearRegression()
Thank you in advance!










share|improve this question















I'm trying to plot a scatter plot of the values of actual sales (y) and predicted sales (ŷ).



I have imported the csv file and currently the codes I have for the linear regression model is:



result = smf.ols('sales ~ discount + holiday + product', data=data).fit()
print(result.summary())


Since, I only have the actual sales values, how do I find the predicted sales (ŷ) values to plot the scatter plot? I have tried researching and found lm.predict() and result.predict(). Is there a difference? lm = LinearRegression()
Thank you in advance!







python linear-regression statsmodels predict






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 10 at 11:50

























asked Nov 10 at 10:17









Smile

14




14











  • Please clarify what you mean by ‚predicted sales‘. Why do you make a regression if you do not consider it to be the prediction?
    – MisterMiyagi
    Nov 10 at 10:30










  • Predicted sales based on all the x variables in the regression model so that I can plot the actual sales and predicted sales on a scatter plot.
    – Smile
    Nov 10 at 10:38










  • I dont really understand the downvotes here. You can get your predicted values by calling result.predict(), which will be your yhat values
    – Simon
    Nov 10 at 10:45










  • @Simon The question leaves it entirely unclear what problem there actually is. The problem itself is trivial and the two variants of ‚predict‘ are not qualified - it is pretty difficult to tell the difference without knowing what the things even are.
    – MisterMiyagi
    Nov 10 at 10:52
















  • Please clarify what you mean by ‚predicted sales‘. Why do you make a regression if you do not consider it to be the prediction?
    – MisterMiyagi
    Nov 10 at 10:30










  • Predicted sales based on all the x variables in the regression model so that I can plot the actual sales and predicted sales on a scatter plot.
    – Smile
    Nov 10 at 10:38










  • I dont really understand the downvotes here. You can get your predicted values by calling result.predict(), which will be your yhat values
    – Simon
    Nov 10 at 10:45










  • @Simon The question leaves it entirely unclear what problem there actually is. The problem itself is trivial and the two variants of ‚predict‘ are not qualified - it is pretty difficult to tell the difference without knowing what the things even are.
    – MisterMiyagi
    Nov 10 at 10:52















Please clarify what you mean by ‚predicted sales‘. Why do you make a regression if you do not consider it to be the prediction?
– MisterMiyagi
Nov 10 at 10:30




Please clarify what you mean by ‚predicted sales‘. Why do you make a regression if you do not consider it to be the prediction?
– MisterMiyagi
Nov 10 at 10:30












Predicted sales based on all the x variables in the regression model so that I can plot the actual sales and predicted sales on a scatter plot.
– Smile
Nov 10 at 10:38




Predicted sales based on all the x variables in the regression model so that I can plot the actual sales and predicted sales on a scatter plot.
– Smile
Nov 10 at 10:38












I dont really understand the downvotes here. You can get your predicted values by calling result.predict(), which will be your yhat values
– Simon
Nov 10 at 10:45




I dont really understand the downvotes here. You can get your predicted values by calling result.predict(), which will be your yhat values
– Simon
Nov 10 at 10:45












@Simon The question leaves it entirely unclear what problem there actually is. The problem itself is trivial and the two variants of ‚predict‘ are not qualified - it is pretty difficult to tell the difference without knowing what the things even are.
– MisterMiyagi
Nov 10 at 10:52




@Simon The question leaves it entirely unclear what problem there actually is. The problem itself is trivial and the two variants of ‚predict‘ are not qualified - it is pretty difficult to tell the difference without knowing what the things even are.
– MisterMiyagi
Nov 10 at 10:52












1 Answer
1






active

oldest

votes

















up vote
0
down vote













Without data it is hard to help, but I guess you have X and y from dataset because you want to perform linear regression. You can split data into training and test set using scikit-learn:



from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 1/3)


Then you need to fit linear regression to the training set:



from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)


and afterwards predict test set results:



y_pred = regressor.predict(X_test)


Finally, you can plot your test or training results:



# Visualising the Training set results
plt.scatter(X_train, y_train, color = 'red')
plt.plot(X_train, regressor.predict(X_train), color = 'blue')
plt.title('Discount vs Sales (Training set)')
plt.xlabel('Discount percentage')
plt.ylabel('Sales')
plt.show()

# Visualising the Test set results
plt.scatter(X_test, y_test, color = 'red')
plt.plot(X_train, regressor.predict(X_train), color = 'blue')
plt.title('Discount vs Sales (Test set)')
plt.xlabel('Discount percentage')
plt.ylabel('Sales')
plt.show()


(In this scenario we want to predict how many Sales will be if we set specific value of e.g. Discount percentage). If you have more than one X parameter, things are more complicated and you will need to use dummy variables, perform statistical analysis etc..






share|improve this answer




















    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53237954%2flinear-regression-predict-y%25cc%2582%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    0
    down vote













    Without data it is hard to help, but I guess you have X and y from dataset because you want to perform linear regression. You can split data into training and test set using scikit-learn:



    from sklearn.cross_validation import train_test_split
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 1/3)


    Then you need to fit linear regression to the training set:



    from sklearn.linear_model import LinearRegression
    regressor = LinearRegression()
    regressor.fit(X_train, y_train)


    and afterwards predict test set results:



    y_pred = regressor.predict(X_test)


    Finally, you can plot your test or training results:



    # Visualising the Training set results
    plt.scatter(X_train, y_train, color = 'red')
    plt.plot(X_train, regressor.predict(X_train), color = 'blue')
    plt.title('Discount vs Sales (Training set)')
    plt.xlabel('Discount percentage')
    plt.ylabel('Sales')
    plt.show()

    # Visualising the Test set results
    plt.scatter(X_test, y_test, color = 'red')
    plt.plot(X_train, regressor.predict(X_train), color = 'blue')
    plt.title('Discount vs Sales (Test set)')
    plt.xlabel('Discount percentage')
    plt.ylabel('Sales')
    plt.show()


    (In this scenario we want to predict how many Sales will be if we set specific value of e.g. Discount percentage). If you have more than one X parameter, things are more complicated and you will need to use dummy variables, perform statistical analysis etc..






    share|improve this answer
























      up vote
      0
      down vote













      Without data it is hard to help, but I guess you have X and y from dataset because you want to perform linear regression. You can split data into training and test set using scikit-learn:



      from sklearn.cross_validation import train_test_split
      X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 1/3)


      Then you need to fit linear regression to the training set:



      from sklearn.linear_model import LinearRegression
      regressor = LinearRegression()
      regressor.fit(X_train, y_train)


      and afterwards predict test set results:



      y_pred = regressor.predict(X_test)


      Finally, you can plot your test or training results:



      # Visualising the Training set results
      plt.scatter(X_train, y_train, color = 'red')
      plt.plot(X_train, regressor.predict(X_train), color = 'blue')
      plt.title('Discount vs Sales (Training set)')
      plt.xlabel('Discount percentage')
      plt.ylabel('Sales')
      plt.show()

      # Visualising the Test set results
      plt.scatter(X_test, y_test, color = 'red')
      plt.plot(X_train, regressor.predict(X_train), color = 'blue')
      plt.title('Discount vs Sales (Test set)')
      plt.xlabel('Discount percentage')
      plt.ylabel('Sales')
      plt.show()


      (In this scenario we want to predict how many Sales will be if we set specific value of e.g. Discount percentage). If you have more than one X parameter, things are more complicated and you will need to use dummy variables, perform statistical analysis etc..






      share|improve this answer






















        up vote
        0
        down vote










        up vote
        0
        down vote









        Without data it is hard to help, but I guess you have X and y from dataset because you want to perform linear regression. You can split data into training and test set using scikit-learn:



        from sklearn.cross_validation import train_test_split
        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 1/3)


        Then you need to fit linear regression to the training set:



        from sklearn.linear_model import LinearRegression
        regressor = LinearRegression()
        regressor.fit(X_train, y_train)


        and afterwards predict test set results:



        y_pred = regressor.predict(X_test)


        Finally, you can plot your test or training results:



        # Visualising the Training set results
        plt.scatter(X_train, y_train, color = 'red')
        plt.plot(X_train, regressor.predict(X_train), color = 'blue')
        plt.title('Discount vs Sales (Training set)')
        plt.xlabel('Discount percentage')
        plt.ylabel('Sales')
        plt.show()

        # Visualising the Test set results
        plt.scatter(X_test, y_test, color = 'red')
        plt.plot(X_train, regressor.predict(X_train), color = 'blue')
        plt.title('Discount vs Sales (Test set)')
        plt.xlabel('Discount percentage')
        plt.ylabel('Sales')
        plt.show()


        (In this scenario we want to predict how many Sales will be if we set specific value of e.g. Discount percentage). If you have more than one X parameter, things are more complicated and you will need to use dummy variables, perform statistical analysis etc..






        share|improve this answer












        Without data it is hard to help, but I guess you have X and y from dataset because you want to perform linear regression. You can split data into training and test set using scikit-learn:



        from sklearn.cross_validation import train_test_split
        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 1/3)


        Then you need to fit linear regression to the training set:



        from sklearn.linear_model import LinearRegression
        regressor = LinearRegression()
        regressor.fit(X_train, y_train)


        and afterwards predict test set results:



        y_pred = regressor.predict(X_test)


        Finally, you can plot your test or training results:



        # Visualising the Training set results
        plt.scatter(X_train, y_train, color = 'red')
        plt.plot(X_train, regressor.predict(X_train), color = 'blue')
        plt.title('Discount vs Sales (Training set)')
        plt.xlabel('Discount percentage')
        plt.ylabel('Sales')
        plt.show()

        # Visualising the Test set results
        plt.scatter(X_test, y_test, color = 'red')
        plt.plot(X_train, regressor.predict(X_train), color = 'blue')
        plt.title('Discount vs Sales (Test set)')
        plt.xlabel('Discount percentage')
        plt.ylabel('Sales')
        plt.show()


        (In this scenario we want to predict how many Sales will be if we set specific value of e.g. Discount percentage). If you have more than one X parameter, things are more complicated and you will need to use dummy variables, perform statistical analysis etc..







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 10 at 12:42









        Dejan Marić

        436212




        436212



























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53237954%2flinear-regression-predict-y%25cc%2582%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            How to how show current date and time by default on contact form 7 in WordPress without taking input from user in datetimepicker

            Syphilis

            Darth Vader #20