Getting different accuracy on each run of Random Forest, Non-Linear SVC and Multinomial NB in python for text classification










1















I am working on a binary text classification problem in python, and have developed models in Random Forest, Non-Linear SVC & Multinomial NB.



But on each run, of these respective models, am getting different accuracy & confusion matrix parameters on the test set. I have used random_state parameter in train_test_split and while initializing each of these models. Random.Seed is also added in the code.



Is there anything else I am missing?



Thanks.



Code Sample:



X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.15, stratify= Y, random_state = 42) 

tfidf_vectorizer = TfidfVectorizer(analyzer='word', stop_words = 'english', max_df = 0.8, min_df = 0.05, ngram_range=(1,3))
tfidf_train = tfidf_vectorizer.fit_transform(X_train)
tfidf_test = tfidf_vectorizer.transform(X_test) #Default Hyperparameters

rfc = RandomForestClassifier(random_state = 42)

rfc.fit(tfidf_train,Y_train)
predictions = rfc.predict(tfidf_test)

score = metrics.accuracy_score(Y_test, predictions) # get scores

print("accuracy: %0.3f" % score) #printing score









share|improve this question



















  • 2





    Please provide some code example regarding the data split, learner initialization and fits.

    – Geeocode
    Nov 14 '18 at 15:13











  • the others too pls i.e. randomforest(....), model.fit(....)

    – Geeocode
    Nov 14 '18 at 15:20











  • You have to provide some real code sample in your question. This is the standard way to help yourself to get better(sometimes any) answer to your question on SO.

    – Geeocode
    Nov 14 '18 at 15:24












  • Some tools uses numpy.random.seed() not random.random.seed() for example.

    – Geeocode
    Nov 14 '18 at 15:28












  • try to use numpy.random.seed() in your code, their might be some random action in some of the utility you used.

    – Geeocode
    Nov 14 '18 at 15:58















1















I am working on a binary text classification problem in python, and have developed models in Random Forest, Non-Linear SVC & Multinomial NB.



But on each run, of these respective models, am getting different accuracy & confusion matrix parameters on the test set. I have used random_state parameter in train_test_split and while initializing each of these models. Random.Seed is also added in the code.



Is there anything else I am missing?



Thanks.



Code Sample:



X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.15, stratify= Y, random_state = 42) 

tfidf_vectorizer = TfidfVectorizer(analyzer='word', stop_words = 'english', max_df = 0.8, min_df = 0.05, ngram_range=(1,3))
tfidf_train = tfidf_vectorizer.fit_transform(X_train)
tfidf_test = tfidf_vectorizer.transform(X_test) #Default Hyperparameters

rfc = RandomForestClassifier(random_state = 42)

rfc.fit(tfidf_train,Y_train)
predictions = rfc.predict(tfidf_test)

score = metrics.accuracy_score(Y_test, predictions) # get scores

print("accuracy: %0.3f" % score) #printing score









share|improve this question



















  • 2





    Please provide some code example regarding the data split, learner initialization and fits.

    – Geeocode
    Nov 14 '18 at 15:13











  • the others too pls i.e. randomforest(....), model.fit(....)

    – Geeocode
    Nov 14 '18 at 15:20











  • You have to provide some real code sample in your question. This is the standard way to help yourself to get better(sometimes any) answer to your question on SO.

    – Geeocode
    Nov 14 '18 at 15:24












  • Some tools uses numpy.random.seed() not random.random.seed() for example.

    – Geeocode
    Nov 14 '18 at 15:28












  • try to use numpy.random.seed() in your code, their might be some random action in some of the utility you used.

    – Geeocode
    Nov 14 '18 at 15:58













1












1








1








I am working on a binary text classification problem in python, and have developed models in Random Forest, Non-Linear SVC & Multinomial NB.



But on each run, of these respective models, am getting different accuracy & confusion matrix parameters on the test set. I have used random_state parameter in train_test_split and while initializing each of these models. Random.Seed is also added in the code.



Is there anything else I am missing?



Thanks.



Code Sample:



X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.15, stratify= Y, random_state = 42) 

tfidf_vectorizer = TfidfVectorizer(analyzer='word', stop_words = 'english', max_df = 0.8, min_df = 0.05, ngram_range=(1,3))
tfidf_train = tfidf_vectorizer.fit_transform(X_train)
tfidf_test = tfidf_vectorizer.transform(X_test) #Default Hyperparameters

rfc = RandomForestClassifier(random_state = 42)

rfc.fit(tfidf_train,Y_train)
predictions = rfc.predict(tfidf_test)

score = metrics.accuracy_score(Y_test, predictions) # get scores

print("accuracy: %0.3f" % score) #printing score









share|improve this question
















I am working on a binary text classification problem in python, and have developed models in Random Forest, Non-Linear SVC & Multinomial NB.



But on each run, of these respective models, am getting different accuracy & confusion matrix parameters on the test set. I have used random_state parameter in train_test_split and while initializing each of these models. Random.Seed is also added in the code.



Is there anything else I am missing?



Thanks.



Code Sample:



X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.15, stratify= Y, random_state = 42) 

tfidf_vectorizer = TfidfVectorizer(analyzer='word', stop_words = 'english', max_df = 0.8, min_df = 0.05, ngram_range=(1,3))
tfidf_train = tfidf_vectorizer.fit_transform(X_train)
tfidf_test = tfidf_vectorizer.transform(X_test) #Default Hyperparameters

rfc = RandomForestClassifier(random_state = 42)

rfc.fit(tfidf_train,Y_train)
predictions = rfc.predict(tfidf_test)

score = metrics.accuracy_score(Y_test, predictions) # get scores

print("accuracy: %0.3f" % score) #printing score






python machine-learning classification random-forest text-classification






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 14 '18 at 18:22









Geeocode

2,4001921




2,4001921










asked Nov 14 '18 at 14:57









Biplab GhosalBiplab Ghosal

65




65







  • 2





    Please provide some code example regarding the data split, learner initialization and fits.

    – Geeocode
    Nov 14 '18 at 15:13











  • the others too pls i.e. randomforest(....), model.fit(....)

    – Geeocode
    Nov 14 '18 at 15:20











  • You have to provide some real code sample in your question. This is the standard way to help yourself to get better(sometimes any) answer to your question on SO.

    – Geeocode
    Nov 14 '18 at 15:24












  • Some tools uses numpy.random.seed() not random.random.seed() for example.

    – Geeocode
    Nov 14 '18 at 15:28












  • try to use numpy.random.seed() in your code, their might be some random action in some of the utility you used.

    – Geeocode
    Nov 14 '18 at 15:58












  • 2





    Please provide some code example regarding the data split, learner initialization and fits.

    – Geeocode
    Nov 14 '18 at 15:13











  • the others too pls i.e. randomforest(....), model.fit(....)

    – Geeocode
    Nov 14 '18 at 15:20











  • You have to provide some real code sample in your question. This is the standard way to help yourself to get better(sometimes any) answer to your question on SO.

    – Geeocode
    Nov 14 '18 at 15:24












  • Some tools uses numpy.random.seed() not random.random.seed() for example.

    – Geeocode
    Nov 14 '18 at 15:28












  • try to use numpy.random.seed() in your code, their might be some random action in some of the utility you used.

    – Geeocode
    Nov 14 '18 at 15:58







2




2





Please provide some code example regarding the data split, learner initialization and fits.

– Geeocode
Nov 14 '18 at 15:13





Please provide some code example regarding the data split, learner initialization and fits.

– Geeocode
Nov 14 '18 at 15:13













the others too pls i.e. randomforest(....), model.fit(....)

– Geeocode
Nov 14 '18 at 15:20





the others too pls i.e. randomforest(....), model.fit(....)

– Geeocode
Nov 14 '18 at 15:20













You have to provide some real code sample in your question. This is the standard way to help yourself to get better(sometimes any) answer to your question on SO.

– Geeocode
Nov 14 '18 at 15:24






You have to provide some real code sample in your question. This is the standard way to help yourself to get better(sometimes any) answer to your question on SO.

– Geeocode
Nov 14 '18 at 15:24














Some tools uses numpy.random.seed() not random.random.seed() for example.

– Geeocode
Nov 14 '18 at 15:28






Some tools uses numpy.random.seed() not random.random.seed() for example.

– Geeocode
Nov 14 '18 at 15:28














try to use numpy.random.seed() in your code, their might be some random action in some of the utility you used.

– Geeocode
Nov 14 '18 at 15:58





try to use numpy.random.seed() in your code, their might be some random action in some of the utility you used.

– Geeocode
Nov 14 '18 at 15:58












1 Answer
1






active

oldest

votes


















0














Some of the utility you used might be contain some hidden random action, uncertainty.



As some of the libraries use numpy.random() instead of random.random() you should use numpy.random.seed().






share|improve this answer






















    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53303064%2fgetting-different-accuracy-on-each-run-of-random-forest-non-linear-svc-and-mult%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    Some of the utility you used might be contain some hidden random action, uncertainty.



    As some of the libraries use numpy.random() instead of random.random() you should use numpy.random.seed().






    share|improve this answer



























      0














      Some of the utility you used might be contain some hidden random action, uncertainty.



      As some of the libraries use numpy.random() instead of random.random() you should use numpy.random.seed().






      share|improve this answer

























        0












        0








        0







        Some of the utility you used might be contain some hidden random action, uncertainty.



        As some of the libraries use numpy.random() instead of random.random() you should use numpy.random.seed().






        share|improve this answer













        Some of the utility you used might be contain some hidden random action, uncertainty.



        As some of the libraries use numpy.random() instead of random.random() you should use numpy.random.seed().







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 14 '18 at 16:34









        GeeocodeGeeocode

        2,4001921




        2,4001921





























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53303064%2fgetting-different-accuracy-on-each-run-of-random-forest-non-linear-svc-and-mult%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Use pre created SQLite database for Android project in kotlin

            Darth Vader #20

            Ondo