Storing the topic models in a list also considering the maximum occurrences
up vote
0
down vote
favorite
I am performing topic modelling and using functions to get the top keywords in the topic models as shown below.
def getTopKWords(self, K):
results =
"""
returns top K discriminative words for topic t
ie words v for which p(v|t) is maximum
"""
index =
key_terms =
pseudocounts = np.copy(self.n_vt)
normalizer = np.sum(pseudocounts, (0))
pseudocounts /= normalizer[np.newaxis, :]
for t in range(self.numTopics):
topWordIndices = pseudocounts[:, t].argsort()[-1:-(K+1):-1]
vocab = self.vectorizer.get_feature_names()
print (t, [vocab[i] for i in topWordIndices])
## Code for storing the values in a single list
return results
The above functions gives me the code as shown in the fig
0 ['computer', 'laptop', 'mac', 'use', 'bought', 'like', 'warranty', 'screen', 'way', 'just']
1 ['laptop', 'computer', 'use', 'just', 'like', 'time', 'great', 'windows', 'macbook', 'months']
2 ['computer', 'great', 'laptop', 'mac', 'buy', 'just', 'macbook', 'use', 'pro', 'windows']
3 ['laptop', 'computer', 'great', 'time', 'battery', 'use', 'apple', 'love', 'just', 'work']
It results from the 4 time the loop runs and print index and all keywords in each vocab.
Now, I want to return a single list from the function which returns me the following output.
return [keyword1, keyword2, keyword3, keyword4]
where, keyword1/2/3/4 are the words which were occuring the most in vocab lists with index 0, 1,2,3 in output.
python python-3.x
add a comment |
up vote
0
down vote
favorite
I am performing topic modelling and using functions to get the top keywords in the topic models as shown below.
def getTopKWords(self, K):
results =
"""
returns top K discriminative words for topic t
ie words v for which p(v|t) is maximum
"""
index =
key_terms =
pseudocounts = np.copy(self.n_vt)
normalizer = np.sum(pseudocounts, (0))
pseudocounts /= normalizer[np.newaxis, :]
for t in range(self.numTopics):
topWordIndices = pseudocounts[:, t].argsort()[-1:-(K+1):-1]
vocab = self.vectorizer.get_feature_names()
print (t, [vocab[i] for i in topWordIndices])
## Code for storing the values in a single list
return results
The above functions gives me the code as shown in the fig
0 ['computer', 'laptop', 'mac', 'use', 'bought', 'like', 'warranty', 'screen', 'way', 'just']
1 ['laptop', 'computer', 'use', 'just', 'like', 'time', 'great', 'windows', 'macbook', 'months']
2 ['computer', 'great', 'laptop', 'mac', 'buy', 'just', 'macbook', 'use', 'pro', 'windows']
3 ['laptop', 'computer', 'great', 'time', 'battery', 'use', 'apple', 'love', 'just', 'work']
It results from the 4 time the loop runs and print index and all keywords in each vocab.
Now, I want to return a single list from the function which returns me the following output.
return [keyword1, keyword2, keyword3, keyword4]
where, keyword1/2/3/4 are the words which were occuring the most in vocab lists with index 0, 1,2,3 in output.
python python-3.x
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I am performing topic modelling and using functions to get the top keywords in the topic models as shown below.
def getTopKWords(self, K):
results =
"""
returns top K discriminative words for topic t
ie words v for which p(v|t) is maximum
"""
index =
key_terms =
pseudocounts = np.copy(self.n_vt)
normalizer = np.sum(pseudocounts, (0))
pseudocounts /= normalizer[np.newaxis, :]
for t in range(self.numTopics):
topWordIndices = pseudocounts[:, t].argsort()[-1:-(K+1):-1]
vocab = self.vectorizer.get_feature_names()
print (t, [vocab[i] for i in topWordIndices])
## Code for storing the values in a single list
return results
The above functions gives me the code as shown in the fig
0 ['computer', 'laptop', 'mac', 'use', 'bought', 'like', 'warranty', 'screen', 'way', 'just']
1 ['laptop', 'computer', 'use', 'just', 'like', 'time', 'great', 'windows', 'macbook', 'months']
2 ['computer', 'great', 'laptop', 'mac', 'buy', 'just', 'macbook', 'use', 'pro', 'windows']
3 ['laptop', 'computer', 'great', 'time', 'battery', 'use', 'apple', 'love', 'just', 'work']
It results from the 4 time the loop runs and print index and all keywords in each vocab.
Now, I want to return a single list from the function which returns me the following output.
return [keyword1, keyword2, keyword3, keyword4]
where, keyword1/2/3/4 are the words which were occuring the most in vocab lists with index 0, 1,2,3 in output.
python python-3.x
I am performing topic modelling and using functions to get the top keywords in the topic models as shown below.
def getTopKWords(self, K):
results =
"""
returns top K discriminative words for topic t
ie words v for which p(v|t) is maximum
"""
index =
key_terms =
pseudocounts = np.copy(self.n_vt)
normalizer = np.sum(pseudocounts, (0))
pseudocounts /= normalizer[np.newaxis, :]
for t in range(self.numTopics):
topWordIndices = pseudocounts[:, t].argsort()[-1:-(K+1):-1]
vocab = self.vectorizer.get_feature_names()
print (t, [vocab[i] for i in topWordIndices])
## Code for storing the values in a single list
return results
The above functions gives me the code as shown in the fig
0 ['computer', 'laptop', 'mac', 'use', 'bought', 'like', 'warranty', 'screen', 'way', 'just']
1 ['laptop', 'computer', 'use', 'just', 'like', 'time', 'great', 'windows', 'macbook', 'months']
2 ['computer', 'great', 'laptop', 'mac', 'buy', 'just', 'macbook', 'use', 'pro', 'windows']
3 ['laptop', 'computer', 'great', 'time', 'battery', 'use', 'apple', 'love', 'just', 'work']
It results from the 4 time the loop runs and print index and all keywords in each vocab.
Now, I want to return a single list from the function which returns me the following output.
return [keyword1, keyword2, keyword3, keyword4]
where, keyword1/2/3/4 are the words which were occuring the most in vocab lists with index 0, 1,2,3 in output.
python python-3.x
python python-3.x
edited Nov 10 at 18:49
asked Nov 10 at 18:21
Shivam Panchal
388
388
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
up vote
1
down vote
accepted
You can use collection.Counter:
from collections import Counter
a = ['computer', 'laptop', 'mac', 'use', 'bought', 'like',
'warranty', 'screen', 'way', 'just']
b = ['laptop', 'computer', 'use', 'just', 'like', 'time',
'great', 'windows', 'macbook', 'months']
c = ['computer', 'great', 'laptop', 'mac', 'buy', 'just',
'macbook', 'use', 'pro', 'windows']
d = ['laptop', 'computer', 'great', 'time', 'battery', 'use',
'apple', 'love', 'just', 'work']
def get_most_common(*kwargs):
"""Accepts iterables, feeds all into Counter and returns the Counter instance"""
c = Counter()
for k in kwargs:
c.update(k)
return c
# get the most common ones
mc = get_most_common(a,b,c,d).most_common()
# print top 4 keys
top4 = [k for k,v in mc[0:4]]
print (top4)
Output:
['computer', 'laptop', 'use', 'just']
some_results = # store stuff
for t in range(self.numTopics):
topWordIndices = pseudocounts[:, t].argsort()[-1:-(K+1):-1]
vocab = self.vectorizer.get_feature_names()
print (t, [vocab[i] for i in topWordIndices])
some_results.append( [vocab[i] for i in topWordIndices] )
mc = get_most_common(*some_results).most_common()
return [k for k,v in mc[0:4]]
1
@ShivamPanchal what? It is one fuction that you provide your lists -.most_common()
is explained in the documentation of Counter - read it.top4
is just list slicing of the(key,count)
tuples provided bymost_common()
. Your code above uses list slicing - so thats nothing new to you - is it?
– Patrick Artner
Nov 10 at 18:57
I am trying to use it in my code, but not working, can you add it in my code, It will be great
– Shivam Panchal
Nov 10 at 19:22
really sorry, but I got this. TypeError: unhashable type: 'slice'
– Shivam Panchal
Nov 10 at 19:51
1
@ShivamPanchal forgot a .most_common()
– Patrick Artner
Nov 10 at 19:51
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53242049%2fstoring-the-topic-models-in-a-list-also-considering-the-maximum-occurrences%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
You can use collection.Counter:
from collections import Counter
a = ['computer', 'laptop', 'mac', 'use', 'bought', 'like',
'warranty', 'screen', 'way', 'just']
b = ['laptop', 'computer', 'use', 'just', 'like', 'time',
'great', 'windows', 'macbook', 'months']
c = ['computer', 'great', 'laptop', 'mac', 'buy', 'just',
'macbook', 'use', 'pro', 'windows']
d = ['laptop', 'computer', 'great', 'time', 'battery', 'use',
'apple', 'love', 'just', 'work']
def get_most_common(*kwargs):
"""Accepts iterables, feeds all into Counter and returns the Counter instance"""
c = Counter()
for k in kwargs:
c.update(k)
return c
# get the most common ones
mc = get_most_common(a,b,c,d).most_common()
# print top 4 keys
top4 = [k for k,v in mc[0:4]]
print (top4)
Output:
['computer', 'laptop', 'use', 'just']
some_results = # store stuff
for t in range(self.numTopics):
topWordIndices = pseudocounts[:, t].argsort()[-1:-(K+1):-1]
vocab = self.vectorizer.get_feature_names()
print (t, [vocab[i] for i in topWordIndices])
some_results.append( [vocab[i] for i in topWordIndices] )
mc = get_most_common(*some_results).most_common()
return [k for k,v in mc[0:4]]
1
@ShivamPanchal what? It is one fuction that you provide your lists -.most_common()
is explained in the documentation of Counter - read it.top4
is just list slicing of the(key,count)
tuples provided bymost_common()
. Your code above uses list slicing - so thats nothing new to you - is it?
– Patrick Artner
Nov 10 at 18:57
I am trying to use it in my code, but not working, can you add it in my code, It will be great
– Shivam Panchal
Nov 10 at 19:22
really sorry, but I got this. TypeError: unhashable type: 'slice'
– Shivam Panchal
Nov 10 at 19:51
1
@ShivamPanchal forgot a .most_common()
– Patrick Artner
Nov 10 at 19:51
add a comment |
up vote
1
down vote
accepted
You can use collection.Counter:
from collections import Counter
a = ['computer', 'laptop', 'mac', 'use', 'bought', 'like',
'warranty', 'screen', 'way', 'just']
b = ['laptop', 'computer', 'use', 'just', 'like', 'time',
'great', 'windows', 'macbook', 'months']
c = ['computer', 'great', 'laptop', 'mac', 'buy', 'just',
'macbook', 'use', 'pro', 'windows']
d = ['laptop', 'computer', 'great', 'time', 'battery', 'use',
'apple', 'love', 'just', 'work']
def get_most_common(*kwargs):
"""Accepts iterables, feeds all into Counter and returns the Counter instance"""
c = Counter()
for k in kwargs:
c.update(k)
return c
# get the most common ones
mc = get_most_common(a,b,c,d).most_common()
# print top 4 keys
top4 = [k for k,v in mc[0:4]]
print (top4)
Output:
['computer', 'laptop', 'use', 'just']
some_results = # store stuff
for t in range(self.numTopics):
topWordIndices = pseudocounts[:, t].argsort()[-1:-(K+1):-1]
vocab = self.vectorizer.get_feature_names()
print (t, [vocab[i] for i in topWordIndices])
some_results.append( [vocab[i] for i in topWordIndices] )
mc = get_most_common(*some_results).most_common()
return [k for k,v in mc[0:4]]
1
@ShivamPanchal what? It is one fuction that you provide your lists -.most_common()
is explained in the documentation of Counter - read it.top4
is just list slicing of the(key,count)
tuples provided bymost_common()
. Your code above uses list slicing - so thats nothing new to you - is it?
– Patrick Artner
Nov 10 at 18:57
I am trying to use it in my code, but not working, can you add it in my code, It will be great
– Shivam Panchal
Nov 10 at 19:22
really sorry, but I got this. TypeError: unhashable type: 'slice'
– Shivam Panchal
Nov 10 at 19:51
1
@ShivamPanchal forgot a .most_common()
– Patrick Artner
Nov 10 at 19:51
add a comment |
up vote
1
down vote
accepted
up vote
1
down vote
accepted
You can use collection.Counter:
from collections import Counter
a = ['computer', 'laptop', 'mac', 'use', 'bought', 'like',
'warranty', 'screen', 'way', 'just']
b = ['laptop', 'computer', 'use', 'just', 'like', 'time',
'great', 'windows', 'macbook', 'months']
c = ['computer', 'great', 'laptop', 'mac', 'buy', 'just',
'macbook', 'use', 'pro', 'windows']
d = ['laptop', 'computer', 'great', 'time', 'battery', 'use',
'apple', 'love', 'just', 'work']
def get_most_common(*kwargs):
"""Accepts iterables, feeds all into Counter and returns the Counter instance"""
c = Counter()
for k in kwargs:
c.update(k)
return c
# get the most common ones
mc = get_most_common(a,b,c,d).most_common()
# print top 4 keys
top4 = [k for k,v in mc[0:4]]
print (top4)
Output:
['computer', 'laptop', 'use', 'just']
some_results = # store stuff
for t in range(self.numTopics):
topWordIndices = pseudocounts[:, t].argsort()[-1:-(K+1):-1]
vocab = self.vectorizer.get_feature_names()
print (t, [vocab[i] for i in topWordIndices])
some_results.append( [vocab[i] for i in topWordIndices] )
mc = get_most_common(*some_results).most_common()
return [k for k,v in mc[0:4]]
You can use collection.Counter:
from collections import Counter
a = ['computer', 'laptop', 'mac', 'use', 'bought', 'like',
'warranty', 'screen', 'way', 'just']
b = ['laptop', 'computer', 'use', 'just', 'like', 'time',
'great', 'windows', 'macbook', 'months']
c = ['computer', 'great', 'laptop', 'mac', 'buy', 'just',
'macbook', 'use', 'pro', 'windows']
d = ['laptop', 'computer', 'great', 'time', 'battery', 'use',
'apple', 'love', 'just', 'work']
def get_most_common(*kwargs):
"""Accepts iterables, feeds all into Counter and returns the Counter instance"""
c = Counter()
for k in kwargs:
c.update(k)
return c
# get the most common ones
mc = get_most_common(a,b,c,d).most_common()
# print top 4 keys
top4 = [k for k,v in mc[0:4]]
print (top4)
Output:
['computer', 'laptop', 'use', 'just']
some_results = # store stuff
for t in range(self.numTopics):
topWordIndices = pseudocounts[:, t].argsort()[-1:-(K+1):-1]
vocab = self.vectorizer.get_feature_names()
print (t, [vocab[i] for i in topWordIndices])
some_results.append( [vocab[i] for i in topWordIndices] )
mc = get_most_common(*some_results).most_common()
return [k for k,v in mc[0:4]]
edited Nov 10 at 19:51
answered Nov 10 at 18:37
Patrick Artner
19.2k51940
19.2k51940
1
@ShivamPanchal what? It is one fuction that you provide your lists -.most_common()
is explained in the documentation of Counter - read it.top4
is just list slicing of the(key,count)
tuples provided bymost_common()
. Your code above uses list slicing - so thats nothing new to you - is it?
– Patrick Artner
Nov 10 at 18:57
I am trying to use it in my code, but not working, can you add it in my code, It will be great
– Shivam Panchal
Nov 10 at 19:22
really sorry, but I got this. TypeError: unhashable type: 'slice'
– Shivam Panchal
Nov 10 at 19:51
1
@ShivamPanchal forgot a .most_common()
– Patrick Artner
Nov 10 at 19:51
add a comment |
1
@ShivamPanchal what? It is one fuction that you provide your lists -.most_common()
is explained in the documentation of Counter - read it.top4
is just list slicing of the(key,count)
tuples provided bymost_common()
. Your code above uses list slicing - so thats nothing new to you - is it?
– Patrick Artner
Nov 10 at 18:57
I am trying to use it in my code, but not working, can you add it in my code, It will be great
– Shivam Panchal
Nov 10 at 19:22
really sorry, but I got this. TypeError: unhashable type: 'slice'
– Shivam Panchal
Nov 10 at 19:51
1
@ShivamPanchal forgot a .most_common()
– Patrick Artner
Nov 10 at 19:51
1
1
@ShivamPanchal what? It is one fuction that you provide your lists -
.most_common()
is explained in the documentation of Counter - read it. top4
is just list slicing of the (key,count)
tuples provided by most_common()
. Your code above uses list slicing - so thats nothing new to you - is it?– Patrick Artner
Nov 10 at 18:57
@ShivamPanchal what? It is one fuction that you provide your lists -
.most_common()
is explained in the documentation of Counter - read it. top4
is just list slicing of the (key,count)
tuples provided by most_common()
. Your code above uses list slicing - so thats nothing new to you - is it?– Patrick Artner
Nov 10 at 18:57
I am trying to use it in my code, but not working, can you add it in my code, It will be great
– Shivam Panchal
Nov 10 at 19:22
I am trying to use it in my code, but not working, can you add it in my code, It will be great
– Shivam Panchal
Nov 10 at 19:22
really sorry, but I got this. TypeError: unhashable type: 'slice'
– Shivam Panchal
Nov 10 at 19:51
really sorry, but I got this. TypeError: unhashable type: 'slice'
– Shivam Panchal
Nov 10 at 19:51
1
1
@ShivamPanchal forgot a .most_common()
– Patrick Artner
Nov 10 at 19:51
@ShivamPanchal forgot a .most_common()
– Patrick Artner
Nov 10 at 19:51
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53242049%2fstoring-the-topic-models-in-a-list-also-considering-the-maximum-occurrences%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown