Getting different accuracy on each run of Random Forest, Non-Linear SVC and Multinomial NB in python for text classification

I am working on a binary text classification problem in python, and have developed models in Random Forest, Non-Linear SVC & Multinomial NB.

But on each run, of these respective models, am getting different accuracy & confusion matrix parameters on the test set. I have used random_state parameter in train_test_split and while initializing each of these models. Random.Seed is also added in the code.

Is there anything else I am missing?

Thanks.

Code Sample:

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.15, stratify= Y, random_state = 42) 

tfidf_vectorizer = TfidfVectorizer(analyzer='word', stop_words = 'english', max_df = 0.8, min_df = 0.05, ngram_range=(1,3)) 
tfidf_train = tfidf_vectorizer.fit_transform(X_train) 
tfidf_test = tfidf_vectorizer.transform(X_test) #Default Hyperparameters 

rfc = RandomForestClassifier(random_state = 42) 

rfc.fit(tfidf_train,Y_train) 
predictions = rfc.predict(tfidf_test) 

score = metrics.accuracy_score(Y_test, predictions) # get scores

print("accuracy: %0.3f" % score) #printing score

edited Nov 14 '18 at 18:22

Geeocode

2,4001921

asked Nov 14 '18 at 14:57

Biplab Ghosal

2

Please provide some code example regarding the data split, learner initialization and fits.

– Geeocode
Nov 14 '18 at 15:13

the others too pls i.e. randomforest(....), model.fit(....)

– Geeocode
Nov 14 '18 at 15:20

You have to provide some real code sample in your question. This is the standard way to help yourself to get better(sometimes any) answer to your question on SO.

– Geeocode
Nov 14 '18 at 15:24

Some tools uses numpy.random.seed() not random.random.seed() for example.

– Geeocode
Nov 14 '18 at 15:28

try to use numpy.random.seed() in your code, their might be some random action in some of the utility you used.

– Geeocode
Nov 14 '18 at 15:58

|
show 2 more comments

I am working on a binary text classification problem in python, and have developed models in Random Forest, Non-Linear SVC & Multinomial NB.

Is there anything else I am missing?

Thanks.

Code Sample:

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.15, stratify= Y, random_state = 42) 

tfidf_vectorizer = TfidfVectorizer(analyzer='word', stop_words = 'english', max_df = 0.8, min_df = 0.05, ngram_range=(1,3)) 
tfidf_train = tfidf_vectorizer.fit_transform(X_train) 
tfidf_test = tfidf_vectorizer.transform(X_test) #Default Hyperparameters 

rfc = RandomForestClassifier(random_state = 42) 

rfc.fit(tfidf_train,Y_train) 
predictions = rfc.predict(tfidf_test) 

score = metrics.accuracy_score(Y_test, predictions) # get scores

print("accuracy: %0.3f" % score) #printing score

edited Nov 14 '18 at 18:22

Geeocode

2,4001921

asked Nov 14 '18 at 14:57

Biplab Ghosal

2

Please provide some code example regarding the data split, learner initialization and fits.

– Geeocode
Nov 14 '18 at 15:13

the others too pls i.e. randomforest(....), model.fit(....)

– Geeocode
Nov 14 '18 at 15:20

You have to provide some real code sample in your question. This is the standard way to help yourself to get better(sometimes any) answer to your question on SO.

– Geeocode
Nov 14 '18 at 15:24

Some tools uses numpy.random.seed() not random.random.seed() for example.

– Geeocode
Nov 14 '18 at 15:28

try to use numpy.random.seed() in your code, their might be some random action in some of the utility you used.

– Geeocode
Nov 14 '18 at 15:58

|
show 2 more comments

I am working on a binary text classification problem in python, and have developed models in Random Forest, Non-Linear SVC & Multinomial NB.

Is there anything else I am missing?

Thanks.

Code Sample:

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.15, stratify= Y, random_state = 42) 

tfidf_vectorizer = TfidfVectorizer(analyzer='word', stop_words = 'english', max_df = 0.8, min_df = 0.05, ngram_range=(1,3)) 
tfidf_train = tfidf_vectorizer.fit_transform(X_train) 
tfidf_test = tfidf_vectorizer.transform(X_test) #Default Hyperparameters 

rfc = RandomForestClassifier(random_state = 42) 

rfc.fit(tfidf_train,Y_train) 
predictions = rfc.predict(tfidf_test) 

score = metrics.accuracy_score(Y_test, predictions) # get scores

print("accuracy: %0.3f" % score) #printing score

edited Nov 14 '18 at 18:22

Geeocode

2,4001921

asked Nov 14 '18 at 14:57

Biplab Ghosal

I am working on a binary text classification problem in python, and have developed models in Random Forest, Non-Linear SVC & Multinomial NB.

Is there anything else I am missing?

Thanks.

Code Sample:

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.15, stratify= Y, random_state = 42) 

tfidf_vectorizer = TfidfVectorizer(analyzer='word', stop_words = 'english', max_df = 0.8, min_df = 0.05, ngram_range=(1,3)) 
tfidf_train = tfidf_vectorizer.fit_transform(X_train) 
tfidf_test = tfidf_vectorizer.transform(X_test) #Default Hyperparameters 

rfc = RandomForestClassifier(random_state = 42) 

rfc.fit(tfidf_train,Y_train) 
predictions = rfc.predict(tfidf_test) 

score = metrics.accuracy_score(Y_test, predictions) # get scores

print("accuracy: %0.3f" % score) #printing score

python machine-learning classification random-forest text-classification

edited Nov 14 '18 at 18:22

Geeocode

2,4001921

asked Nov 14 '18 at 14:57

Biplab Ghosal

edited Nov 14 '18 at 18:22

Geeocode

2,4001921

asked Nov 14 '18 at 14:57

Biplab Ghosal

edited Nov 14 '18 at 18:22

Geeocode

2,4001921

edited Nov 14 '18 at 18:22

Geeocode

2,4001921

edited Nov 14 '18 at 18:22

Geeocode

2,4001921

asked Nov 14 '18 at 14:57

Biplab Ghosal

asked Nov 14 '18 at 14:57

Biplab Ghosal

asked Nov 14 '18 at 14:57

Biplab Ghosal

2

Please provide some code example regarding the data split, learner initialization and fits.

– Geeocode
Nov 14 '18 at 15:13

the others too pls i.e. randomforest(....), model.fit(....)

– Geeocode
Nov 14 '18 at 15:20

You have to provide some real code sample in your question. This is the standard way to help yourself to get better(sometimes any) answer to your question on SO.

– Geeocode
Nov 14 '18 at 15:24

Some tools uses numpy.random.seed() not random.random.seed() for example.

– Geeocode
Nov 14 '18 at 15:28

try to use numpy.random.seed() in your code, their might be some random action in some of the utility you used.

– Geeocode
Nov 14 '18 at 15:58

|
show 2 more comments

2

Please provide some code example regarding the data split, learner initialization and fits.

– Geeocode
Nov 14 '18 at 15:13

the others too pls i.e. randomforest(....), model.fit(....)

– Geeocode
Nov 14 '18 at 15:20

You have to provide some real code sample in your question. This is the standard way to help yourself to get better(sometimes any) answer to your question on SO.

– Geeocode
Nov 14 '18 at 15:24

Some tools uses numpy.random.seed() not random.random.seed() for example.

– Geeocode
Nov 14 '18 at 15:28

try to use numpy.random.seed() in your code, their might be some random action in some of the utility you used.

– Geeocode
Nov 14 '18 at 15:58

Please provide some code example regarding the data split, learner initialization and fits.

– Geeocode
Nov 14 '18 at 15:13

the others too pls i.e. randomforest(....), model.fit(....)

– Geeocode
Nov 14 '18 at 15:20

You have to provide some real code sample in your question. This is the standard way to help yourself to get better(sometimes any) answer to your question on SO.

– Geeocode
Nov 14 '18 at 15:24

Some tools uses numpy.random.seed() not random.random.seed() for example.

– Geeocode
Nov 14 '18 at 15:28

try to use numpy.random.seed() in your code, their might be some random action in some of the utility you used.

– Geeocode
Nov 14 '18 at 15:58

|
show 2 more comments

1 Answer
1

active

oldest

votes

Some of the utility you used might be contain some hidden random action, uncertainty.

As some of the libraries use numpy.random() instead of random.random() you should use numpy.random.seed().

answered Nov 14 '18 at 16:34

Geeocode

2,4001921

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53303064%2fgetting-different-accuracy-on-each-run-of-random-forest-non-linear-svc-and-mult%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Some of the utility you used might be contain some hidden random action, uncertainty.

As some of the libraries use numpy.random() instead of random.random() you should use numpy.random.seed().

answered Nov 14 '18 at 16:34

Geeocode

2,4001921

add a comment |

Some of the utility you used might be contain some hidden random action, uncertainty.

As some of the libraries use numpy.random() instead of random.random() you should use numpy.random.seed().

answered Nov 14 '18 at 16:34

Geeocode

2,4001921

add a comment |

Some of the utility you used might be contain some hidden random action, uncertainty.

As some of the libraries use numpy.random() instead of random.random() you should use numpy.random.seed().

answered Nov 14 '18 at 16:34

Geeocode

2,4001921

Some of the utility you used might be contain some hidden random action, uncertainty.

As some of the libraries use numpy.random() instead of random.random() you should use numpy.random.seed().

answered Nov 14 '18 at 16:34

Geeocode

2,4001921

answered Nov 14 '18 at 16:34

Geeocode

2,4001921

answered Nov 14 '18 at 16:34

Geeocode

2,4001921

answered Nov 14 '18 at 16:34

Geeocode

2,4001921

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Pfthb