Querying and writing keywords to a new column










1















I have a script I wrote with the help of some online resources but it isn't quite doing what I need it to. I am taking an existing csv file, querying 4 textfields within that file for a group of keywords and than writing the results to a new CSV file. Right now the script only looks at any given textfield and than if it finds any of the keywords within the text file, it writes each keyword to a separate column for that record. I need to be able to do 2 things with the script.



1) Query more than one textfield. So instead of just 'essay1', I need to query 'essay1','essay2','essay3','essay4' for all those keywords.



2) If the script finds any of those keywords, it should write whatever keyword it finds into one column versus individual columns for that particular record.



import csv
import time

#Read data from word enrichment
f = open('word_enrichment_approved.txt', 'r')
allKeywords = f.read().lower().split("n")
f.close()

#Read in raw data from csv file
#Import the 'essay' column from the CSV file
allTexts =
fullRow =
with open('Join_train_and_resources_edited.csv') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
#the full row for each entry, which will be used to recreate the improved CSV file
fullRow.append((row['record_id'],row['teacher_id'],row['teacher_prefix'],row['school_state'],row['project_submitted_datetime'],
row['project_grade_category'], row['project_subject_categories'], row['project_subject_subcategories'],
row['project_title'],row['project_essay_1'],row['project_essay_2'],row['project_essay_3'],row['project_essay_4'],
row['project_resource_summary'],row['teacher_number_of_previously_posted_projects'],row['project_is_approved'],
row['id-1'],row['quantity'],row['price']))

#the column we want to parse for our keywords (change to mactch actual name of the column)
row = row['project_essay_1'].lower()
allTexts.append(row)
#NEW! a flag used to keep track of which row is being printed to the CSV file
counter = 0
#NEW! use the current date and time to create a unique output filename
timestr = time.strftime("%Y-%m-%d-(%H-%M-%S)")
filename = 'output-' + str(timestr) + '.csv'

#NEW! Open the new output CSV file to append ('a') rows one at a time.
with open(filename, 'a') as csvfile:

#NEW! define the column headers and write them to the new file
fieldnames = ['id', 'teacher_id','teacher_prefix ','school_state','project_submitted_datetime','project_grade_category',
'project_subject_categories','project_subject_subcategories','project_title','project_essay_1','project_essay_2',
'project_essay_3','project_essay_4','project_resource_summary','teacher_number_of_previously_posted_projects',
'project_is_approved','id-1','quantity','price']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()

#NEW! define the output for each row and then print to the output csv file
writer = csv.writer(csvfile)

for entry in allTexts:

matches = 0
storedMatches =

#for each entry:
allWords = entry.split(' ')
for words in allWords:

#remove punctuation that will interfere with matching
words = words.replace(',', '')
words = words.replace('.', '')
words = words.replace(';', '')

#if a keyword match is found, store the result.
if words in allKeywords:
if words in storedMatches:
continue
else:
storedMatches.append(words)
matches += 1

#CHANGED! send any matches to a new row of the csv file.
if matches == 0:
newRow = fullRow[counter]
else:
matchTuple = tuple(storedMatches)
newRow = fullRow[counter] + matchTuple

#NEW! write the result of each row to the csv file
writer.writerows([newRow])
counter += 1









share|improve this question
























  • Hi, to add code you need to paste the code into the question and place control + k or command + k (mac) to format it properly. Please do this as it will be easier for people to answer.

    – user
    Nov 14 '18 at 21:04






  • 1





    Awesome, thank you for the tip!

    – V Chau
    Nov 14 '18 at 21:09
















1















I have a script I wrote with the help of some online resources but it isn't quite doing what I need it to. I am taking an existing csv file, querying 4 textfields within that file for a group of keywords and than writing the results to a new CSV file. Right now the script only looks at any given textfield and than if it finds any of the keywords within the text file, it writes each keyword to a separate column for that record. I need to be able to do 2 things with the script.



1) Query more than one textfield. So instead of just 'essay1', I need to query 'essay1','essay2','essay3','essay4' for all those keywords.



2) If the script finds any of those keywords, it should write whatever keyword it finds into one column versus individual columns for that particular record.



import csv
import time

#Read data from word enrichment
f = open('word_enrichment_approved.txt', 'r')
allKeywords = f.read().lower().split("n")
f.close()

#Read in raw data from csv file
#Import the 'essay' column from the CSV file
allTexts =
fullRow =
with open('Join_train_and_resources_edited.csv') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
#the full row for each entry, which will be used to recreate the improved CSV file
fullRow.append((row['record_id'],row['teacher_id'],row['teacher_prefix'],row['school_state'],row['project_submitted_datetime'],
row['project_grade_category'], row['project_subject_categories'], row['project_subject_subcategories'],
row['project_title'],row['project_essay_1'],row['project_essay_2'],row['project_essay_3'],row['project_essay_4'],
row['project_resource_summary'],row['teacher_number_of_previously_posted_projects'],row['project_is_approved'],
row['id-1'],row['quantity'],row['price']))

#the column we want to parse for our keywords (change to mactch actual name of the column)
row = row['project_essay_1'].lower()
allTexts.append(row)
#NEW! a flag used to keep track of which row is being printed to the CSV file
counter = 0
#NEW! use the current date and time to create a unique output filename
timestr = time.strftime("%Y-%m-%d-(%H-%M-%S)")
filename = 'output-' + str(timestr) + '.csv'

#NEW! Open the new output CSV file to append ('a') rows one at a time.
with open(filename, 'a') as csvfile:

#NEW! define the column headers and write them to the new file
fieldnames = ['id', 'teacher_id','teacher_prefix ','school_state','project_submitted_datetime','project_grade_category',
'project_subject_categories','project_subject_subcategories','project_title','project_essay_1','project_essay_2',
'project_essay_3','project_essay_4','project_resource_summary','teacher_number_of_previously_posted_projects',
'project_is_approved','id-1','quantity','price']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()

#NEW! define the output for each row and then print to the output csv file
writer = csv.writer(csvfile)

for entry in allTexts:

matches = 0
storedMatches =

#for each entry:
allWords = entry.split(' ')
for words in allWords:

#remove punctuation that will interfere with matching
words = words.replace(',', '')
words = words.replace('.', '')
words = words.replace(';', '')

#if a keyword match is found, store the result.
if words in allKeywords:
if words in storedMatches:
continue
else:
storedMatches.append(words)
matches += 1

#CHANGED! send any matches to a new row of the csv file.
if matches == 0:
newRow = fullRow[counter]
else:
matchTuple = tuple(storedMatches)
newRow = fullRow[counter] + matchTuple

#NEW! write the result of each row to the csv file
writer.writerows([newRow])
counter += 1









share|improve this question
























  • Hi, to add code you need to paste the code into the question and place control + k or command + k (mac) to format it properly. Please do this as it will be easier for people to answer.

    – user
    Nov 14 '18 at 21:04






  • 1





    Awesome, thank you for the tip!

    – V Chau
    Nov 14 '18 at 21:09














1












1








1








I have a script I wrote with the help of some online resources but it isn't quite doing what I need it to. I am taking an existing csv file, querying 4 textfields within that file for a group of keywords and than writing the results to a new CSV file. Right now the script only looks at any given textfield and than if it finds any of the keywords within the text file, it writes each keyword to a separate column for that record. I need to be able to do 2 things with the script.



1) Query more than one textfield. So instead of just 'essay1', I need to query 'essay1','essay2','essay3','essay4' for all those keywords.



2) If the script finds any of those keywords, it should write whatever keyword it finds into one column versus individual columns for that particular record.



import csv
import time

#Read data from word enrichment
f = open('word_enrichment_approved.txt', 'r')
allKeywords = f.read().lower().split("n")
f.close()

#Read in raw data from csv file
#Import the 'essay' column from the CSV file
allTexts =
fullRow =
with open('Join_train_and_resources_edited.csv') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
#the full row for each entry, which will be used to recreate the improved CSV file
fullRow.append((row['record_id'],row['teacher_id'],row['teacher_prefix'],row['school_state'],row['project_submitted_datetime'],
row['project_grade_category'], row['project_subject_categories'], row['project_subject_subcategories'],
row['project_title'],row['project_essay_1'],row['project_essay_2'],row['project_essay_3'],row['project_essay_4'],
row['project_resource_summary'],row['teacher_number_of_previously_posted_projects'],row['project_is_approved'],
row['id-1'],row['quantity'],row['price']))

#the column we want to parse for our keywords (change to mactch actual name of the column)
row = row['project_essay_1'].lower()
allTexts.append(row)
#NEW! a flag used to keep track of which row is being printed to the CSV file
counter = 0
#NEW! use the current date and time to create a unique output filename
timestr = time.strftime("%Y-%m-%d-(%H-%M-%S)")
filename = 'output-' + str(timestr) + '.csv'

#NEW! Open the new output CSV file to append ('a') rows one at a time.
with open(filename, 'a') as csvfile:

#NEW! define the column headers and write them to the new file
fieldnames = ['id', 'teacher_id','teacher_prefix ','school_state','project_submitted_datetime','project_grade_category',
'project_subject_categories','project_subject_subcategories','project_title','project_essay_1','project_essay_2',
'project_essay_3','project_essay_4','project_resource_summary','teacher_number_of_previously_posted_projects',
'project_is_approved','id-1','quantity','price']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()

#NEW! define the output for each row and then print to the output csv file
writer = csv.writer(csvfile)

for entry in allTexts:

matches = 0
storedMatches =

#for each entry:
allWords = entry.split(' ')
for words in allWords:

#remove punctuation that will interfere with matching
words = words.replace(',', '')
words = words.replace('.', '')
words = words.replace(';', '')

#if a keyword match is found, store the result.
if words in allKeywords:
if words in storedMatches:
continue
else:
storedMatches.append(words)
matches += 1

#CHANGED! send any matches to a new row of the csv file.
if matches == 0:
newRow = fullRow[counter]
else:
matchTuple = tuple(storedMatches)
newRow = fullRow[counter] + matchTuple

#NEW! write the result of each row to the csv file
writer.writerows([newRow])
counter += 1









share|improve this question
















I have a script I wrote with the help of some online resources but it isn't quite doing what I need it to. I am taking an existing csv file, querying 4 textfields within that file for a group of keywords and than writing the results to a new CSV file. Right now the script only looks at any given textfield and than if it finds any of the keywords within the text file, it writes each keyword to a separate column for that record. I need to be able to do 2 things with the script.



1) Query more than one textfield. So instead of just 'essay1', I need to query 'essay1','essay2','essay3','essay4' for all those keywords.



2) If the script finds any of those keywords, it should write whatever keyword it finds into one column versus individual columns for that particular record.



import csv
import time

#Read data from word enrichment
f = open('word_enrichment_approved.txt', 'r')
allKeywords = f.read().lower().split("n")
f.close()

#Read in raw data from csv file
#Import the 'essay' column from the CSV file
allTexts =
fullRow =
with open('Join_train_and_resources_edited.csv') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
#the full row for each entry, which will be used to recreate the improved CSV file
fullRow.append((row['record_id'],row['teacher_id'],row['teacher_prefix'],row['school_state'],row['project_submitted_datetime'],
row['project_grade_category'], row['project_subject_categories'], row['project_subject_subcategories'],
row['project_title'],row['project_essay_1'],row['project_essay_2'],row['project_essay_3'],row['project_essay_4'],
row['project_resource_summary'],row['teacher_number_of_previously_posted_projects'],row['project_is_approved'],
row['id-1'],row['quantity'],row['price']))

#the column we want to parse for our keywords (change to mactch actual name of the column)
row = row['project_essay_1'].lower()
allTexts.append(row)
#NEW! a flag used to keep track of which row is being printed to the CSV file
counter = 0
#NEW! use the current date and time to create a unique output filename
timestr = time.strftime("%Y-%m-%d-(%H-%M-%S)")
filename = 'output-' + str(timestr) + '.csv'

#NEW! Open the new output CSV file to append ('a') rows one at a time.
with open(filename, 'a') as csvfile:

#NEW! define the column headers and write them to the new file
fieldnames = ['id', 'teacher_id','teacher_prefix ','school_state','project_submitted_datetime','project_grade_category',
'project_subject_categories','project_subject_subcategories','project_title','project_essay_1','project_essay_2',
'project_essay_3','project_essay_4','project_resource_summary','teacher_number_of_previously_posted_projects',
'project_is_approved','id-1','quantity','price']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()

#NEW! define the output for each row and then print to the output csv file
writer = csv.writer(csvfile)

for entry in allTexts:

matches = 0
storedMatches =

#for each entry:
allWords = entry.split(' ')
for words in allWords:

#remove punctuation that will interfere with matching
words = words.replace(',', '')
words = words.replace('.', '')
words = words.replace(';', '')

#if a keyword match is found, store the result.
if words in allKeywords:
if words in storedMatches:
continue
else:
storedMatches.append(words)
matches += 1

#CHANGED! send any matches to a new row of the csv file.
if matches == 0:
newRow = fullRow[counter]
else:
matchTuple = tuple(storedMatches)
newRow = fullRow[counter] + matchTuple

#NEW! write the result of each row to the csv file
writer.writerows([newRow])
counter += 1






python






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 14 '18 at 21:09







V Chau

















asked Nov 14 '18 at 20:57









V ChauV Chau

63




63












  • Hi, to add code you need to paste the code into the question and place control + k or command + k (mac) to format it properly. Please do this as it will be easier for people to answer.

    – user
    Nov 14 '18 at 21:04






  • 1





    Awesome, thank you for the tip!

    – V Chau
    Nov 14 '18 at 21:09


















  • Hi, to add code you need to paste the code into the question and place control + k or command + k (mac) to format it properly. Please do this as it will be easier for people to answer.

    – user
    Nov 14 '18 at 21:04






  • 1





    Awesome, thank you for the tip!

    – V Chau
    Nov 14 '18 at 21:09

















Hi, to add code you need to paste the code into the question and place control + k or command + k (mac) to format it properly. Please do this as it will be easier for people to answer.

– user
Nov 14 '18 at 21:04





Hi, to add code you need to paste the code into the question and place control + k or command + k (mac) to format it properly. Please do this as it will be easier for people to answer.

– user
Nov 14 '18 at 21:04




1




1





Awesome, thank you for the tip!

– V Chau
Nov 14 '18 at 21:09






Awesome, thank you for the tip!

– V Chau
Nov 14 '18 at 21:09













0






active

oldest

votes











Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53308602%2fquerying-and-writing-keywords-to-a-new-column%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53308602%2fquerying-and-writing-keywords-to-a-new-column%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

How to how show current date and time by default on contact form 7 in WordPress without taking input from user in datetimepicker

Syphilis

Darth Vader #20