Querying and writing keywords to a new column
I have a script I wrote with the help of some online resources but it isn't quite doing what I need it to. I am taking an existing csv file, querying 4 textfields within that file for a group of keywords and than writing the results to a new CSV file. Right now the script only looks at any given textfield and than if it finds any of the keywords within the text file, it writes each keyword to a separate column for that record. I need to be able to do 2 things with the script.
1) Query more than one textfield. So instead of just 'essay1', I need to query 'essay1','essay2','essay3','essay4' for all those keywords.
2) If the script finds any of those keywords, it should write whatever keyword it finds into one column versus individual columns for that particular record.
import csv
import time
#Read data from word enrichment
f = open('word_enrichment_approved.txt', 'r')
allKeywords = f.read().lower().split("n")
f.close()
#Read in raw data from csv file
#Import the 'essay' column from the CSV file
allTexts =
fullRow =
with open('Join_train_and_resources_edited.csv') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
#the full row for each entry, which will be used to recreate the improved CSV file
fullRow.append((row['record_id'],row['teacher_id'],row['teacher_prefix'],row['school_state'],row['project_submitted_datetime'],
row['project_grade_category'], row['project_subject_categories'], row['project_subject_subcategories'],
row['project_title'],row['project_essay_1'],row['project_essay_2'],row['project_essay_3'],row['project_essay_4'],
row['project_resource_summary'],row['teacher_number_of_previously_posted_projects'],row['project_is_approved'],
row['id-1'],row['quantity'],row['price']))
#the column we want to parse for our keywords (change to mactch actual name of the column)
row = row['project_essay_1'].lower()
allTexts.append(row)
#NEW! a flag used to keep track of which row is being printed to the CSV file
counter = 0
#NEW! use the current date and time to create a unique output filename
timestr = time.strftime("%Y-%m-%d-(%H-%M-%S)")
filename = 'output-' + str(timestr) + '.csv'
#NEW! Open the new output CSV file to append ('a') rows one at a time.
with open(filename, 'a') as csvfile:
#NEW! define the column headers and write them to the new file
fieldnames = ['id', 'teacher_id','teacher_prefix ','school_state','project_submitted_datetime','project_grade_category',
'project_subject_categories','project_subject_subcategories','project_title','project_essay_1','project_essay_2',
'project_essay_3','project_essay_4','project_resource_summary','teacher_number_of_previously_posted_projects',
'project_is_approved','id-1','quantity','price']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
#NEW! define the output for each row and then print to the output csv file
writer = csv.writer(csvfile)
for entry in allTexts:
matches = 0
storedMatches =
#for each entry:
allWords = entry.split(' ')
for words in allWords:
#remove punctuation that will interfere with matching
words = words.replace(',', '')
words = words.replace('.', '')
words = words.replace(';', '')
#if a keyword match is found, store the result.
if words in allKeywords:
if words in storedMatches:
continue
else:
storedMatches.append(words)
matches += 1
#CHANGED! send any matches to a new row of the csv file.
if matches == 0:
newRow = fullRow[counter]
else:
matchTuple = tuple(storedMatches)
newRow = fullRow[counter] + matchTuple
#NEW! write the result of each row to the csv file
writer.writerows([newRow])
counter += 1
python
add a comment |
I have a script I wrote with the help of some online resources but it isn't quite doing what I need it to. I am taking an existing csv file, querying 4 textfields within that file for a group of keywords and than writing the results to a new CSV file. Right now the script only looks at any given textfield and than if it finds any of the keywords within the text file, it writes each keyword to a separate column for that record. I need to be able to do 2 things with the script.
1) Query more than one textfield. So instead of just 'essay1', I need to query 'essay1','essay2','essay3','essay4' for all those keywords.
2) If the script finds any of those keywords, it should write whatever keyword it finds into one column versus individual columns for that particular record.
import csv
import time
#Read data from word enrichment
f = open('word_enrichment_approved.txt', 'r')
allKeywords = f.read().lower().split("n")
f.close()
#Read in raw data from csv file
#Import the 'essay' column from the CSV file
allTexts =
fullRow =
with open('Join_train_and_resources_edited.csv') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
#the full row for each entry, which will be used to recreate the improved CSV file
fullRow.append((row['record_id'],row['teacher_id'],row['teacher_prefix'],row['school_state'],row['project_submitted_datetime'],
row['project_grade_category'], row['project_subject_categories'], row['project_subject_subcategories'],
row['project_title'],row['project_essay_1'],row['project_essay_2'],row['project_essay_3'],row['project_essay_4'],
row['project_resource_summary'],row['teacher_number_of_previously_posted_projects'],row['project_is_approved'],
row['id-1'],row['quantity'],row['price']))
#the column we want to parse for our keywords (change to mactch actual name of the column)
row = row['project_essay_1'].lower()
allTexts.append(row)
#NEW! a flag used to keep track of which row is being printed to the CSV file
counter = 0
#NEW! use the current date and time to create a unique output filename
timestr = time.strftime("%Y-%m-%d-(%H-%M-%S)")
filename = 'output-' + str(timestr) + '.csv'
#NEW! Open the new output CSV file to append ('a') rows one at a time.
with open(filename, 'a') as csvfile:
#NEW! define the column headers and write them to the new file
fieldnames = ['id', 'teacher_id','teacher_prefix ','school_state','project_submitted_datetime','project_grade_category',
'project_subject_categories','project_subject_subcategories','project_title','project_essay_1','project_essay_2',
'project_essay_3','project_essay_4','project_resource_summary','teacher_number_of_previously_posted_projects',
'project_is_approved','id-1','quantity','price']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
#NEW! define the output for each row and then print to the output csv file
writer = csv.writer(csvfile)
for entry in allTexts:
matches = 0
storedMatches =
#for each entry:
allWords = entry.split(' ')
for words in allWords:
#remove punctuation that will interfere with matching
words = words.replace(',', '')
words = words.replace('.', '')
words = words.replace(';', '')
#if a keyword match is found, store the result.
if words in allKeywords:
if words in storedMatches:
continue
else:
storedMatches.append(words)
matches += 1
#CHANGED! send any matches to a new row of the csv file.
if matches == 0:
newRow = fullRow[counter]
else:
matchTuple = tuple(storedMatches)
newRow = fullRow[counter] + matchTuple
#NEW! write the result of each row to the csv file
writer.writerows([newRow])
counter += 1
python
Hi, to add code you need to paste the code into the question and place control + k or command + k (mac) to format it properly. Please do this as it will be easier for people to answer.
– user
Nov 14 '18 at 21:04
1
Awesome, thank you for the tip!
– V Chau
Nov 14 '18 at 21:09
add a comment |
I have a script I wrote with the help of some online resources but it isn't quite doing what I need it to. I am taking an existing csv file, querying 4 textfields within that file for a group of keywords and than writing the results to a new CSV file. Right now the script only looks at any given textfield and than if it finds any of the keywords within the text file, it writes each keyword to a separate column for that record. I need to be able to do 2 things with the script.
1) Query more than one textfield. So instead of just 'essay1', I need to query 'essay1','essay2','essay3','essay4' for all those keywords.
2) If the script finds any of those keywords, it should write whatever keyword it finds into one column versus individual columns for that particular record.
import csv
import time
#Read data from word enrichment
f = open('word_enrichment_approved.txt', 'r')
allKeywords = f.read().lower().split("n")
f.close()
#Read in raw data from csv file
#Import the 'essay' column from the CSV file
allTexts =
fullRow =
with open('Join_train_and_resources_edited.csv') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
#the full row for each entry, which will be used to recreate the improved CSV file
fullRow.append((row['record_id'],row['teacher_id'],row['teacher_prefix'],row['school_state'],row['project_submitted_datetime'],
row['project_grade_category'], row['project_subject_categories'], row['project_subject_subcategories'],
row['project_title'],row['project_essay_1'],row['project_essay_2'],row['project_essay_3'],row['project_essay_4'],
row['project_resource_summary'],row['teacher_number_of_previously_posted_projects'],row['project_is_approved'],
row['id-1'],row['quantity'],row['price']))
#the column we want to parse for our keywords (change to mactch actual name of the column)
row = row['project_essay_1'].lower()
allTexts.append(row)
#NEW! a flag used to keep track of which row is being printed to the CSV file
counter = 0
#NEW! use the current date and time to create a unique output filename
timestr = time.strftime("%Y-%m-%d-(%H-%M-%S)")
filename = 'output-' + str(timestr) + '.csv'
#NEW! Open the new output CSV file to append ('a') rows one at a time.
with open(filename, 'a') as csvfile:
#NEW! define the column headers and write them to the new file
fieldnames = ['id', 'teacher_id','teacher_prefix ','school_state','project_submitted_datetime','project_grade_category',
'project_subject_categories','project_subject_subcategories','project_title','project_essay_1','project_essay_2',
'project_essay_3','project_essay_4','project_resource_summary','teacher_number_of_previously_posted_projects',
'project_is_approved','id-1','quantity','price']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
#NEW! define the output for each row and then print to the output csv file
writer = csv.writer(csvfile)
for entry in allTexts:
matches = 0
storedMatches =
#for each entry:
allWords = entry.split(' ')
for words in allWords:
#remove punctuation that will interfere with matching
words = words.replace(',', '')
words = words.replace('.', '')
words = words.replace(';', '')
#if a keyword match is found, store the result.
if words in allKeywords:
if words in storedMatches:
continue
else:
storedMatches.append(words)
matches += 1
#CHANGED! send any matches to a new row of the csv file.
if matches == 0:
newRow = fullRow[counter]
else:
matchTuple = tuple(storedMatches)
newRow = fullRow[counter] + matchTuple
#NEW! write the result of each row to the csv file
writer.writerows([newRow])
counter += 1
python
I have a script I wrote with the help of some online resources but it isn't quite doing what I need it to. I am taking an existing csv file, querying 4 textfields within that file for a group of keywords and than writing the results to a new CSV file. Right now the script only looks at any given textfield and than if it finds any of the keywords within the text file, it writes each keyword to a separate column for that record. I need to be able to do 2 things with the script.
1) Query more than one textfield. So instead of just 'essay1', I need to query 'essay1','essay2','essay3','essay4' for all those keywords.
2) If the script finds any of those keywords, it should write whatever keyword it finds into one column versus individual columns for that particular record.
import csv
import time
#Read data from word enrichment
f = open('word_enrichment_approved.txt', 'r')
allKeywords = f.read().lower().split("n")
f.close()
#Read in raw data from csv file
#Import the 'essay' column from the CSV file
allTexts =
fullRow =
with open('Join_train_and_resources_edited.csv') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
#the full row for each entry, which will be used to recreate the improved CSV file
fullRow.append((row['record_id'],row['teacher_id'],row['teacher_prefix'],row['school_state'],row['project_submitted_datetime'],
row['project_grade_category'], row['project_subject_categories'], row['project_subject_subcategories'],
row['project_title'],row['project_essay_1'],row['project_essay_2'],row['project_essay_3'],row['project_essay_4'],
row['project_resource_summary'],row['teacher_number_of_previously_posted_projects'],row['project_is_approved'],
row['id-1'],row['quantity'],row['price']))
#the column we want to parse for our keywords (change to mactch actual name of the column)
row = row['project_essay_1'].lower()
allTexts.append(row)
#NEW! a flag used to keep track of which row is being printed to the CSV file
counter = 0
#NEW! use the current date and time to create a unique output filename
timestr = time.strftime("%Y-%m-%d-(%H-%M-%S)")
filename = 'output-' + str(timestr) + '.csv'
#NEW! Open the new output CSV file to append ('a') rows one at a time.
with open(filename, 'a') as csvfile:
#NEW! define the column headers and write them to the new file
fieldnames = ['id', 'teacher_id','teacher_prefix ','school_state','project_submitted_datetime','project_grade_category',
'project_subject_categories','project_subject_subcategories','project_title','project_essay_1','project_essay_2',
'project_essay_3','project_essay_4','project_resource_summary','teacher_number_of_previously_posted_projects',
'project_is_approved','id-1','quantity','price']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
#NEW! define the output for each row and then print to the output csv file
writer = csv.writer(csvfile)
for entry in allTexts:
matches = 0
storedMatches =
#for each entry:
allWords = entry.split(' ')
for words in allWords:
#remove punctuation that will interfere with matching
words = words.replace(',', '')
words = words.replace('.', '')
words = words.replace(';', '')
#if a keyword match is found, store the result.
if words in allKeywords:
if words in storedMatches:
continue
else:
storedMatches.append(words)
matches += 1
#CHANGED! send any matches to a new row of the csv file.
if matches == 0:
newRow = fullRow[counter]
else:
matchTuple = tuple(storedMatches)
newRow = fullRow[counter] + matchTuple
#NEW! write the result of each row to the csv file
writer.writerows([newRow])
counter += 1
python
python
edited Nov 14 '18 at 21:09
V Chau
asked Nov 14 '18 at 20:57
V ChauV Chau
63
63
Hi, to add code you need to paste the code into the question and place control + k or command + k (mac) to format it properly. Please do this as it will be easier for people to answer.
– user
Nov 14 '18 at 21:04
1
Awesome, thank you for the tip!
– V Chau
Nov 14 '18 at 21:09
add a comment |
Hi, to add code you need to paste the code into the question and place control + k or command + k (mac) to format it properly. Please do this as it will be easier for people to answer.
– user
Nov 14 '18 at 21:04
1
Awesome, thank you for the tip!
– V Chau
Nov 14 '18 at 21:09
Hi, to add code you need to paste the code into the question and place control + k or command + k (mac) to format it properly. Please do this as it will be easier for people to answer.
– user
Nov 14 '18 at 21:04
Hi, to add code you need to paste the code into the question and place control + k or command + k (mac) to format it properly. Please do this as it will be easier for people to answer.
– user
Nov 14 '18 at 21:04
1
1
Awesome, thank you for the tip!
– V Chau
Nov 14 '18 at 21:09
Awesome, thank you for the tip!
– V Chau
Nov 14 '18 at 21:09
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53308602%2fquerying-and-writing-keywords-to-a-new-column%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53308602%2fquerying-and-writing-keywords-to-a-new-column%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Hi, to add code you need to paste the code into the question and place control + k or command + k (mac) to format it properly. Please do this as it will be easier for people to answer.
– user
Nov 14 '18 at 21:04
1
Awesome, thank you for the tip!
– V Chau
Nov 14 '18 at 21:09