Getting different accuracy on each run of Random Forest, Non-Linear SVC and Multinomial NB in python for text classification
I am working on a binary text classification problem in python, and have developed models in Random Forest, Non-Linear SVC & Multinomial NB.
But on each run, of these respective models, am getting different accuracy & confusion matrix parameters on the test set. I have used random_state parameter in train_test_split and while initializing each of these models. Random.Seed is also added in the code.
Is there anything else I am missing?
Thanks.
Code Sample:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.15, stratify= Y, random_state = 42)
tfidf_vectorizer = TfidfVectorizer(analyzer='word', stop_words = 'english', max_df = 0.8, min_df = 0.05, ngram_range=(1,3))
tfidf_train = tfidf_vectorizer.fit_transform(X_train)
tfidf_test = tfidf_vectorizer.transform(X_test) #Default Hyperparameters
rfc = RandomForestClassifier(random_state = 42)
rfc.fit(tfidf_train,Y_train)
predictions = rfc.predict(tfidf_test)
score = metrics.accuracy_score(Y_test, predictions) # get scores
print("accuracy: %0.3f" % score) #printing score
python machine-learning classification random-forest text-classification
|
show 2 more comments
I am working on a binary text classification problem in python, and have developed models in Random Forest, Non-Linear SVC & Multinomial NB.
But on each run, of these respective models, am getting different accuracy & confusion matrix parameters on the test set. I have used random_state parameter in train_test_split and while initializing each of these models. Random.Seed is also added in the code.
Is there anything else I am missing?
Thanks.
Code Sample:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.15, stratify= Y, random_state = 42)
tfidf_vectorizer = TfidfVectorizer(analyzer='word', stop_words = 'english', max_df = 0.8, min_df = 0.05, ngram_range=(1,3))
tfidf_train = tfidf_vectorizer.fit_transform(X_train)
tfidf_test = tfidf_vectorizer.transform(X_test) #Default Hyperparameters
rfc = RandomForestClassifier(random_state = 42)
rfc.fit(tfidf_train,Y_train)
predictions = rfc.predict(tfidf_test)
score = metrics.accuracy_score(Y_test, predictions) # get scores
print("accuracy: %0.3f" % score) #printing score
python machine-learning classification random-forest text-classification
2
Please provide some code example regarding the data split, learner initialization and fits.
– Geeocode
Nov 14 '18 at 15:13
the others too pls i.e. randomforest(....), model.fit(....)
– Geeocode
Nov 14 '18 at 15:20
You have to provide some real code sample in your question. This is the standard way to help yourself to get better(sometimes any) answer to your question on SO.
– Geeocode
Nov 14 '18 at 15:24
Some tools usesnumpy.random.seed()
notrandom.random.seed()
for example.
– Geeocode
Nov 14 '18 at 15:28
try to use numpy.random.seed() in your code, their might be some random action in some of the utility you used.
– Geeocode
Nov 14 '18 at 15:58
|
show 2 more comments
I am working on a binary text classification problem in python, and have developed models in Random Forest, Non-Linear SVC & Multinomial NB.
But on each run, of these respective models, am getting different accuracy & confusion matrix parameters on the test set. I have used random_state parameter in train_test_split and while initializing each of these models. Random.Seed is also added in the code.
Is there anything else I am missing?
Thanks.
Code Sample:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.15, stratify= Y, random_state = 42)
tfidf_vectorizer = TfidfVectorizer(analyzer='word', stop_words = 'english', max_df = 0.8, min_df = 0.05, ngram_range=(1,3))
tfidf_train = tfidf_vectorizer.fit_transform(X_train)
tfidf_test = tfidf_vectorizer.transform(X_test) #Default Hyperparameters
rfc = RandomForestClassifier(random_state = 42)
rfc.fit(tfidf_train,Y_train)
predictions = rfc.predict(tfidf_test)
score = metrics.accuracy_score(Y_test, predictions) # get scores
print("accuracy: %0.3f" % score) #printing score
python machine-learning classification random-forest text-classification
I am working on a binary text classification problem in python, and have developed models in Random Forest, Non-Linear SVC & Multinomial NB.
But on each run, of these respective models, am getting different accuracy & confusion matrix parameters on the test set. I have used random_state parameter in train_test_split and while initializing each of these models. Random.Seed is also added in the code.
Is there anything else I am missing?
Thanks.
Code Sample:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.15, stratify= Y, random_state = 42)
tfidf_vectorizer = TfidfVectorizer(analyzer='word', stop_words = 'english', max_df = 0.8, min_df = 0.05, ngram_range=(1,3))
tfidf_train = tfidf_vectorizer.fit_transform(X_train)
tfidf_test = tfidf_vectorizer.transform(X_test) #Default Hyperparameters
rfc = RandomForestClassifier(random_state = 42)
rfc.fit(tfidf_train,Y_train)
predictions = rfc.predict(tfidf_test)
score = metrics.accuracy_score(Y_test, predictions) # get scores
print("accuracy: %0.3f" % score) #printing score
python machine-learning classification random-forest text-classification
python machine-learning classification random-forest text-classification
edited Nov 14 '18 at 18:22
Geeocode
2,4001921
2,4001921
asked Nov 14 '18 at 14:57
Biplab GhosalBiplab Ghosal
65
65
2
Please provide some code example regarding the data split, learner initialization and fits.
– Geeocode
Nov 14 '18 at 15:13
the others too pls i.e. randomforest(....), model.fit(....)
– Geeocode
Nov 14 '18 at 15:20
You have to provide some real code sample in your question. This is the standard way to help yourself to get better(sometimes any) answer to your question on SO.
– Geeocode
Nov 14 '18 at 15:24
Some tools usesnumpy.random.seed()
notrandom.random.seed()
for example.
– Geeocode
Nov 14 '18 at 15:28
try to use numpy.random.seed() in your code, their might be some random action in some of the utility you used.
– Geeocode
Nov 14 '18 at 15:58
|
show 2 more comments
2
Please provide some code example regarding the data split, learner initialization and fits.
– Geeocode
Nov 14 '18 at 15:13
the others too pls i.e. randomforest(....), model.fit(....)
– Geeocode
Nov 14 '18 at 15:20
You have to provide some real code sample in your question. This is the standard way to help yourself to get better(sometimes any) answer to your question on SO.
– Geeocode
Nov 14 '18 at 15:24
Some tools usesnumpy.random.seed()
notrandom.random.seed()
for example.
– Geeocode
Nov 14 '18 at 15:28
try to use numpy.random.seed() in your code, their might be some random action in some of the utility you used.
– Geeocode
Nov 14 '18 at 15:58
2
2
Please provide some code example regarding the data split, learner initialization and fits.
– Geeocode
Nov 14 '18 at 15:13
Please provide some code example regarding the data split, learner initialization and fits.
– Geeocode
Nov 14 '18 at 15:13
the others too pls i.e. randomforest(....), model.fit(....)
– Geeocode
Nov 14 '18 at 15:20
the others too pls i.e. randomforest(....), model.fit(....)
– Geeocode
Nov 14 '18 at 15:20
You have to provide some real code sample in your question. This is the standard way to help yourself to get better(sometimes any) answer to your question on SO.
– Geeocode
Nov 14 '18 at 15:24
You have to provide some real code sample in your question. This is the standard way to help yourself to get better(sometimes any) answer to your question on SO.
– Geeocode
Nov 14 '18 at 15:24
Some tools uses
numpy.random.seed()
not random.random.seed()
for example.– Geeocode
Nov 14 '18 at 15:28
Some tools uses
numpy.random.seed()
not random.random.seed()
for example.– Geeocode
Nov 14 '18 at 15:28
try to use numpy.random.seed() in your code, their might be some random action in some of the utility you used.
– Geeocode
Nov 14 '18 at 15:58
try to use numpy.random.seed() in your code, their might be some random action in some of the utility you used.
– Geeocode
Nov 14 '18 at 15:58
|
show 2 more comments
1 Answer
1
active
oldest
votes
Some of the utility you used might be contain some hidden random action, uncertainty.
As some of the libraries use numpy.random() instead of random.random() you should use numpy.random.seed()
.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53303064%2fgetting-different-accuracy-on-each-run-of-random-forest-non-linear-svc-and-mult%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Some of the utility you used might be contain some hidden random action, uncertainty.
As some of the libraries use numpy.random() instead of random.random() you should use numpy.random.seed()
.
add a comment |
Some of the utility you used might be contain some hidden random action, uncertainty.
As some of the libraries use numpy.random() instead of random.random() you should use numpy.random.seed()
.
add a comment |
Some of the utility you used might be contain some hidden random action, uncertainty.
As some of the libraries use numpy.random() instead of random.random() you should use numpy.random.seed()
.
Some of the utility you used might be contain some hidden random action, uncertainty.
As some of the libraries use numpy.random() instead of random.random() you should use numpy.random.seed()
.
answered Nov 14 '18 at 16:34
GeeocodeGeeocode
2,4001921
2,4001921
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53303064%2fgetting-different-accuracy-on-each-run-of-random-forest-non-linear-svc-and-mult%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
Please provide some code example regarding the data split, learner initialization and fits.
– Geeocode
Nov 14 '18 at 15:13
the others too pls i.e. randomforest(....), model.fit(....)
– Geeocode
Nov 14 '18 at 15:20
You have to provide some real code sample in your question. This is the standard way to help yourself to get better(sometimes any) answer to your question on SO.
– Geeocode
Nov 14 '18 at 15:24
Some tools uses
numpy.random.seed()
notrandom.random.seed()
for example.– Geeocode
Nov 14 '18 at 15:28
try to use numpy.random.seed() in your code, their might be some random action in some of the utility you used.
– Geeocode
Nov 14 '18 at 15:58