Querying and writing keywords to a new column

I have a script I wrote with the help of some online resources but it isn't quite doing what I need it to. I am taking an existing csv file, querying 4 textfields within that file for a group of keywords and than writing the results to a new CSV file. Right now the script only looks at any given textfield and than if it finds any of the keywords within the text file, it writes each keyword to a separate column for that record. I need to be able to do 2 things with the script.

1) Query more than one textfield. So instead of just 'essay1', I need to query 'essay1','essay2','essay3','essay4' for all those keywords.

2) If the script finds any of those keywords, it should write whatever keyword it finds into one column versus individual columns for that particular record.

import csv
import time

#Read data from word enrichment
f = open('word_enrichment_approved.txt', 'r')
allKeywords = f.read().lower().split("n")
f.close()

#Read in raw data from csv file
#Import the 'essay' column from the CSV file
allTexts = 
fullRow = 
with open('Join_train_and_resources_edited.csv') as csvfile:
 reader = csv.DictReader(csvfile)
 for row in reader:
 #the full row for each entry, which will be used to recreate the improved CSV file 
 fullRow.append((row['record_id'],row['teacher_id'],row['teacher_prefix'],row['school_state'],row['project_submitted_datetime'],
 row['project_grade_category'], row['project_subject_categories'], row['project_subject_subcategories'],
 row['project_title'],row['project_essay_1'],row['project_essay_2'],row['project_essay_3'],row['project_essay_4'],
 row['project_resource_summary'],row['teacher_number_of_previously_posted_projects'],row['project_is_approved'],
 row['id-1'],row['quantity'],row['price']))

 #the column we want to parse for our keywords (change to mactch actual name of the column)
 row = row['project_essay_1'].lower()
 allTexts.append(row)
#NEW! a flag used to keep track of which row is being printed to the CSV file 
counter = 0
#NEW! use the current date and time to create a unique output filename
timestr = time.strftime("%Y-%m-%d-(%H-%M-%S)")
filename = 'output-' + str(timestr) + '.csv'

#NEW! Open the new output CSV file to append ('a') rows one at a time.
with open(filename, 'a') as csvfile:

 #NEW! define the column headers and write them to the new file
 fieldnames = ['id', 'teacher_id','teacher_prefix ','school_state','project_submitted_datetime','project_grade_category',
 'project_subject_categories','project_subject_subcategories','project_title','project_essay_1','project_essay_2',
 'project_essay_3','project_essay_4','project_resource_summary','teacher_number_of_previously_posted_projects',
 'project_is_approved','id-1','quantity','price']
 writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
 writer.writeheader()

 #NEW! define the output for each row and then print to the output csv file
 writer = csv.writer(csvfile)

 for entry in allTexts:

 matches = 0
 storedMatches = 

 #for each entry:
 allWords = entry.split(' ')
 for words in allWords:

 #remove punctuation that will interfere with matching
 words = words.replace(',', '')
 words = words.replace('.', '')
 words = words.replace(';', '')

 #if a keyword match is found, store the result.
 if words in allKeywords:
 if words in storedMatches:
 continue
 else:
 storedMatches.append(words)
 matches += 1

 #CHANGED! send any matches to a new row of the csv file.
 if matches == 0:
 newRow = fullRow[counter]
 else:
 matchTuple = tuple(storedMatches)
 newRow = fullRow[counter] + matchTuple

 #NEW! write the result of each row to the csv file
 writer.writerows([newRow])
 counter += 1

edited Nov 14 '18 at 21:09

asked Nov 14 '18 at 20:57

V Chau

Hi, to add code you need to paste the code into the question and place control + k or command + k (mac) to format it properly. Please do this as it will be easier for people to answer.

– user
Nov 14 '18 at 21:04

1

Awesome, thank you for the tip!

– V Chau
Nov 14 '18 at 21:09

add a comment |

1) Query more than one textfield. So instead of just 'essay1', I need to query 'essay1','essay2','essay3','essay4' for all those keywords.

2) If the script finds any of those keywords, it should write whatever keyword it finds into one column versus individual columns for that particular record.

import csv
import time

#Read data from word enrichment
f = open('word_enrichment_approved.txt', 'r')
allKeywords = f.read().lower().split("n")
f.close()

#Read in raw data from csv file
#Import the 'essay' column from the CSV file
allTexts = 
fullRow = 
with open('Join_train_and_resources_edited.csv') as csvfile:
 reader = csv.DictReader(csvfile)
 for row in reader:
 #the full row for each entry, which will be used to recreate the improved CSV file 
 fullRow.append((row['record_id'],row['teacher_id'],row['teacher_prefix'],row['school_state'],row['project_submitted_datetime'],
 row['project_grade_category'], row['project_subject_categories'], row['project_subject_subcategories'],
 row['project_title'],row['project_essay_1'],row['project_essay_2'],row['project_essay_3'],row['project_essay_4'],
 row['project_resource_summary'],row['teacher_number_of_previously_posted_projects'],row['project_is_approved'],
 row['id-1'],row['quantity'],row['price']))

 #the column we want to parse for our keywords (change to mactch actual name of the column)
 row = row['project_essay_1'].lower()
 allTexts.append(row)
#NEW! a flag used to keep track of which row is being printed to the CSV file 
counter = 0
#NEW! use the current date and time to create a unique output filename
timestr = time.strftime("%Y-%m-%d-(%H-%M-%S)")
filename = 'output-' + str(timestr) + '.csv'

#NEW! Open the new output CSV file to append ('a') rows one at a time.
with open(filename, 'a') as csvfile:

 #NEW! define the column headers and write them to the new file
 fieldnames = ['id', 'teacher_id','teacher_prefix ','school_state','project_submitted_datetime','project_grade_category',
 'project_subject_categories','project_subject_subcategories','project_title','project_essay_1','project_essay_2',
 'project_essay_3','project_essay_4','project_resource_summary','teacher_number_of_previously_posted_projects',
 'project_is_approved','id-1','quantity','price']
 writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
 writer.writeheader()

 #NEW! define the output for each row and then print to the output csv file
 writer = csv.writer(csvfile)

 for entry in allTexts:

 matches = 0
 storedMatches = 

 #for each entry:
 allWords = entry.split(' ')
 for words in allWords:

 #remove punctuation that will interfere with matching
 words = words.replace(',', '')
 words = words.replace('.', '')
 words = words.replace(';', '')

 #if a keyword match is found, store the result.
 if words in allKeywords:
 if words in storedMatches:
 continue
 else:
 storedMatches.append(words)
 matches += 1

 #CHANGED! send any matches to a new row of the csv file.
 if matches == 0:
 newRow = fullRow[counter]
 else:
 matchTuple = tuple(storedMatches)
 newRow = fullRow[counter] + matchTuple

 #NEW! write the result of each row to the csv file
 writer.writerows([newRow])
 counter += 1

edited Nov 14 '18 at 21:09

asked Nov 14 '18 at 20:57

V Chau

Hi, to add code you need to paste the code into the question and place control + k or command + k (mac) to format it properly. Please do this as it will be easier for people to answer.

– user
Nov 14 '18 at 21:04

1

Awesome, thank you for the tip!

– V Chau
Nov 14 '18 at 21:09

add a comment |

1) Query more than one textfield. So instead of just 'essay1', I need to query 'essay1','essay2','essay3','essay4' for all those keywords.

2) If the script finds any of those keywords, it should write whatever keyword it finds into one column versus individual columns for that particular record.

import csv
import time

#Read data from word enrichment
f = open('word_enrichment_approved.txt', 'r')
allKeywords = f.read().lower().split("n")
f.close()

#Read in raw data from csv file
#Import the 'essay' column from the CSV file
allTexts = 
fullRow = 
with open('Join_train_and_resources_edited.csv') as csvfile:
 reader = csv.DictReader(csvfile)
 for row in reader:
 #the full row for each entry, which will be used to recreate the improved CSV file 
 fullRow.append((row['record_id'],row['teacher_id'],row['teacher_prefix'],row['school_state'],row['project_submitted_datetime'],
 row['project_grade_category'], row['project_subject_categories'], row['project_subject_subcategories'],
 row['project_title'],row['project_essay_1'],row['project_essay_2'],row['project_essay_3'],row['project_essay_4'],
 row['project_resource_summary'],row['teacher_number_of_previously_posted_projects'],row['project_is_approved'],
 row['id-1'],row['quantity'],row['price']))

 #the column we want to parse for our keywords (change to mactch actual name of the column)
 row = row['project_essay_1'].lower()
 allTexts.append(row)
#NEW! a flag used to keep track of which row is being printed to the CSV file 
counter = 0
#NEW! use the current date and time to create a unique output filename
timestr = time.strftime("%Y-%m-%d-(%H-%M-%S)")
filename = 'output-' + str(timestr) + '.csv'

#NEW! Open the new output CSV file to append ('a') rows one at a time.
with open(filename, 'a') as csvfile:

 #NEW! define the column headers and write them to the new file
 fieldnames = ['id', 'teacher_id','teacher_prefix ','school_state','project_submitted_datetime','project_grade_category',
 'project_subject_categories','project_subject_subcategories','project_title','project_essay_1','project_essay_2',
 'project_essay_3','project_essay_4','project_resource_summary','teacher_number_of_previously_posted_projects',
 'project_is_approved','id-1','quantity','price']
 writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
 writer.writeheader()

 #NEW! define the output for each row and then print to the output csv file
 writer = csv.writer(csvfile)

 for entry in allTexts:

 matches = 0
 storedMatches = 

 #for each entry:
 allWords = entry.split(' ')
 for words in allWords:

 #remove punctuation that will interfere with matching
 words = words.replace(',', '')
 words = words.replace('.', '')
 words = words.replace(';', '')

 #if a keyword match is found, store the result.
 if words in allKeywords:
 if words in storedMatches:
 continue
 else:
 storedMatches.append(words)
 matches += 1

 #CHANGED! send any matches to a new row of the csv file.
 if matches == 0:
 newRow = fullRow[counter]
 else:
 matchTuple = tuple(storedMatches)
 newRow = fullRow[counter] + matchTuple

 #NEW! write the result of each row to the csv file
 writer.writerows([newRow])
 counter += 1

edited Nov 14 '18 at 21:09

asked Nov 14 '18 at 20:57

V Chau

1) Query more than one textfield. So instead of just 'essay1', I need to query 'essay1','essay2','essay3','essay4' for all those keywords.

2) If the script finds any of those keywords, it should write whatever keyword it finds into one column versus individual columns for that particular record.

import csv
import time

#Read data from word enrichment
f = open('word_enrichment_approved.txt', 'r')
allKeywords = f.read().lower().split("n")
f.close()

#Read in raw data from csv file
#Import the 'essay' column from the CSV file
allTexts = 
fullRow = 
with open('Join_train_and_resources_edited.csv') as csvfile:
 reader = csv.DictReader(csvfile)
 for row in reader:
 #the full row for each entry, which will be used to recreate the improved CSV file 
 fullRow.append((row['record_id'],row['teacher_id'],row['teacher_prefix'],row['school_state'],row['project_submitted_datetime'],
 row['project_grade_category'], row['project_subject_categories'], row['project_subject_subcategories'],
 row['project_title'],row['project_essay_1'],row['project_essay_2'],row['project_essay_3'],row['project_essay_4'],
 row['project_resource_summary'],row['teacher_number_of_previously_posted_projects'],row['project_is_approved'],
 row['id-1'],row['quantity'],row['price']))

 #the column we want to parse for our keywords (change to mactch actual name of the column)
 row = row['project_essay_1'].lower()
 allTexts.append(row)
#NEW! a flag used to keep track of which row is being printed to the CSV file 
counter = 0
#NEW! use the current date and time to create a unique output filename
timestr = time.strftime("%Y-%m-%d-(%H-%M-%S)")
filename = 'output-' + str(timestr) + '.csv'

#NEW! Open the new output CSV file to append ('a') rows one at a time.
with open(filename, 'a') as csvfile:

 #NEW! define the column headers and write them to the new file
 fieldnames = ['id', 'teacher_id','teacher_prefix ','school_state','project_submitted_datetime','project_grade_category',
 'project_subject_categories','project_subject_subcategories','project_title','project_essay_1','project_essay_2',
 'project_essay_3','project_essay_4','project_resource_summary','teacher_number_of_previously_posted_projects',
 'project_is_approved','id-1','quantity','price']
 writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
 writer.writeheader()

 #NEW! define the output for each row and then print to the output csv file
 writer = csv.writer(csvfile)

 for entry in allTexts:

 matches = 0
 storedMatches = 

 #for each entry:
 allWords = entry.split(' ')
 for words in allWords:

 #remove punctuation that will interfere with matching
 words = words.replace(',', '')
 words = words.replace('.', '')
 words = words.replace(';', '')

 #if a keyword match is found, store the result.
 if words in allKeywords:
 if words in storedMatches:
 continue
 else:
 storedMatches.append(words)
 matches += 1

 #CHANGED! send any matches to a new row of the csv file.
 if matches == 0:
 newRow = fullRow[counter]
 else:
 matchTuple = tuple(storedMatches)
 newRow = fullRow[counter] + matchTuple

 #NEW! write the result of each row to the csv file
 writer.writerows([newRow])
 counter += 1

python

edited Nov 14 '18 at 21:09

asked Nov 14 '18 at 20:57

V Chau

edited Nov 14 '18 at 21:09

asked Nov 14 '18 at 20:57

V Chau

edited Nov 14 '18 at 21:09

asked Nov 14 '18 at 20:57

V Chau

asked Nov 14 '18 at 20:57

V Chau

asked Nov 14 '18 at 20:57

V Chau

Hi, to add code you need to paste the code into the question and place control + k or command + k (mac) to format it properly. Please do this as it will be easier for people to answer.

– user
Nov 14 '18 at 21:04

1

Awesome, thank you for the tip!

– V Chau
Nov 14 '18 at 21:09

add a comment |

Hi, to add code you need to paste the code into the question and place control + k or command + k (mac) to format it properly. Please do this as it will be easier for people to answer.

– user
Nov 14 '18 at 21:04

1

Awesome, thank you for the tip!

– V Chau
Nov 14 '18 at 21:09

Hi, to add code you need to paste the code into the question and place control + k or command + k (mac) to format it properly. Please do this as it will be easier for people to answer.

– user
Nov 14 '18 at 21:04

Awesome, thank you for the tip!

– V Chau
Nov 14 '18 at 21:09

add a comment |

0

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53308602%2fquerying-and-writing-keywords-to-a-new-column%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Pfthb