Pandas - Pipe Delimiter exists in Field Value causing Bad Lines

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;

I am using Pandas to import a text file like so:

data = pd.read_csv('filepath.txt', sep='|', quoting=3, 
 error_bad_lines=False, encoding='latin1', low_memory=False)

I'm getting an error on 1 line because the field value has a Pipe found within it. When it attempts to parse the row, it finds that the length of the row is too long throwing an error. It allows the file to process; however, this row is missing.

Example:

Row -

4321|Test|1/2/1900
1234|Test||1/1/1900

Parsing this file will create:

4321 Test 1/2/1900
1234 Test 1/1/1900

I want to eliminate the extra | in the second row "Test|" or allow pandas to understand that it exists to create:

4321 Test 1/2/1900
1234 Test 1/1/1900

or this would be fine:

 1234 Test| 1/1/1900

I have attempted to use converters, other quoting methods (quotchars, etc), but to no avail.

Any ideas on how to get by this? All recommendations welcome.

Eric

edited Nov 15 '18 at 17:12

asked Nov 15 '18 at 16:03

Eric

256

Can you guarantee this is a problem with only the first line?

– jpp
Nov 15 '18 at 16:08

Well, it's not actually the first line. It's line 116950 of the file. It will be random; however, I could potentially strip that string column out because I don't need it. Not sure how to evaluate the file before pandas consumes it.

– Eric
Nov 15 '18 at 16:20

add a comment |

I am using Pandas to import a text file like so:

data = pd.read_csv('filepath.txt', sep='|', quoting=3, 
 error_bad_lines=False, encoding='latin1', low_memory=False)

Example:

Row -

4321|Test|1/2/1900
1234|Test||1/1/1900

Parsing this file will create:

4321 Test 1/2/1900
1234 Test 1/1/1900

I want to eliminate the extra | in the second row "Test|" or allow pandas to understand that it exists to create:

4321 Test 1/2/1900
1234 Test 1/1/1900

or this would be fine:

 1234 Test| 1/1/1900

I have attempted to use converters, other quoting methods (quotchars, etc), but to no avail.

Any ideas on how to get by this? All recommendations welcome.

Eric

edited Nov 15 '18 at 17:12

asked Nov 15 '18 at 16:03

Eric

256

Can you guarantee this is a problem with only the first line?

– jpp
Nov 15 '18 at 16:08

Well, it's not actually the first line. It's line 116950 of the file. It will be random; however, I could potentially strip that string column out because I don't need it. Not sure how to evaluate the file before pandas consumes it.

– Eric
Nov 15 '18 at 16:20

add a comment |

I am using Pandas to import a text file like so:

data = pd.read_csv('filepath.txt', sep='|', quoting=3, 
 error_bad_lines=False, encoding='latin1', low_memory=False)

Example:

Row -

4321|Test|1/2/1900
1234|Test||1/1/1900

Parsing this file will create:

4321 Test 1/2/1900
1234 Test 1/1/1900

I want to eliminate the extra | in the second row "Test|" or allow pandas to understand that it exists to create:

4321 Test 1/2/1900
1234 Test 1/1/1900

or this would be fine:

 1234 Test| 1/1/1900

I have attempted to use converters, other quoting methods (quotchars, etc), but to no avail.

Any ideas on how to get by this? All recommendations welcome.

Eric

edited Nov 15 '18 at 17:12

asked Nov 15 '18 at 16:03

Eric

256

I am using Pandas to import a text file like so:

data = pd.read_csv('filepath.txt', sep='|', quoting=3, 
 error_bad_lines=False, encoding='latin1', low_memory=False)

Example:

Row -

4321|Test|1/2/1900
1234|Test||1/1/1900

Parsing this file will create:

4321 Test 1/2/1900
1234 Test 1/1/1900

I want to eliminate the extra | in the second row "Test|" or allow pandas to understand that it exists to create:

4321 Test 1/2/1900
1234 Test 1/1/1900

or this would be fine:

 1234 Test| 1/1/1900

I have attempted to use converters, other quoting methods (quotchars, etc), but to no avail.

Any ideas on how to get by this? All recommendations welcome.

Eric

python pandas

edited Nov 15 '18 at 17:12

asked Nov 15 '18 at 16:03

Eric

256

edited Nov 15 '18 at 17:12

asked Nov 15 '18 at 16:03

Eric

256

edited Nov 15 '18 at 17:12

asked Nov 15 '18 at 16:03

Eric

256

asked Nov 15 '18 at 16:03

Eric

256

asked Nov 15 '18 at 16:03

Eric

256

Can you guarantee this is a problem with only the first line?

– jpp
Nov 15 '18 at 16:08

Well, it's not actually the first line. It's line 116950 of the file. It will be random; however, I could potentially strip that string column out because I don't need it. Not sure how to evaluate the file before pandas consumes it.

– Eric
Nov 15 '18 at 16:20

add a comment |

Can you guarantee this is a problem with only the first line?

– jpp
Nov 15 '18 at 16:08

Well, it's not actually the first line. It's line 116950 of the file. It will be random; however, I could potentially strip that string column out because I don't need it. Not sure how to evaluate the file before pandas consumes it.

– Eric
Nov 15 '18 at 16:20

Can you guarantee this is a problem with only the first line?

– jpp
Nov 15 '18 at 16:08

Well, it's not actually the first line. It's line 116950 of the file. It will be random; however, I could potentially strip that string column out because I don't need it. Not sure how to evaluate the file before pandas consumes it.

– Eric
Nov 15 '18 at 16:20

add a comment |

1 Answer
1

active

oldest

votes

I think the easiest way would be to remove any instance of "||" then use pandas. An example of this would be:

import pandas as pd
from io import StringIO


buffer= StringIO()
with open(r'filepath.txt', 'r') as f:
 for line in f.readlines():
 if "||" not in line:
 buffer.write(line)




buffer.seek(0)
data = pd.read_csv(buffer, sep='|', quoting=3,
 error_bad_lines=False, encoding='latin1', low_memory=False)

You could also do it outside of python with a find and replace operation.

edited Nov 15 '18 at 21:15

answered Nov 15 '18 at 17:20

krflol

56728

I considered this as an option. Unfortunately, NULL values and Non-NULL values exist within the file inside the same column, so this option would generate the same issue, just on different rows.

– Eric
Nov 15 '18 at 18:32

Are you okay with dropping every row that contains a null value?

– krflol
Nov 15 '18 at 19:03

Absolutely. I'm good with just deleting the whole column, if possible. I'd just like to do it programmatically instead of manually. I'm willing to accept any possible solution.

– Eric
Nov 15 '18 at 21:13

edited. Should remove any line with '||'

– krflol
Nov 15 '18 at 21:16

This causes my file to drop to only 2 rows as most rows have a NULL value found in an individual column. It ignores all the rows where '||' is found. I think I may just have to request the file delimiter to be changed.

– Eric
Nov 16 '18 at 15:43

|
show 1 more comment

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53323388%2fpandas-pipe-delimiter-exists-in-field-value-causing-bad-lines%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

I think the easiest way would be to remove any instance of "||" then use pandas. An example of this would be:

import pandas as pd
from io import StringIO


buffer= StringIO()
with open(r'filepath.txt', 'r') as f:
 for line in f.readlines():
 if "||" not in line:
 buffer.write(line)




buffer.seek(0)
data = pd.read_csv(buffer, sep='|', quoting=3,
 error_bad_lines=False, encoding='latin1', low_memory=False)

You could also do it outside of python with a find and replace operation.

edited Nov 15 '18 at 21:15

answered Nov 15 '18 at 17:20

krflol

56728

I considered this as an option. Unfortunately, NULL values and Non-NULL values exist within the file inside the same column, so this option would generate the same issue, just on different rows.

– Eric
Nov 15 '18 at 18:32

Are you okay with dropping every row that contains a null value?

– krflol
Nov 15 '18 at 19:03

Absolutely. I'm good with just deleting the whole column, if possible. I'd just like to do it programmatically instead of manually. I'm willing to accept any possible solution.

– Eric
Nov 15 '18 at 21:13

edited. Should remove any line with '||'

– krflol
Nov 15 '18 at 21:16

This causes my file to drop to only 2 rows as most rows have a NULL value found in an individual column. It ignores all the rows where '||' is found. I think I may just have to request the file delimiter to be changed.

– Eric
Nov 16 '18 at 15:43

|
show 1 more comment

I think the easiest way would be to remove any instance of "||" then use pandas. An example of this would be:

import pandas as pd
from io import StringIO


buffer= StringIO()
with open(r'filepath.txt', 'r') as f:
 for line in f.readlines():
 if "||" not in line:
 buffer.write(line)




buffer.seek(0)
data = pd.read_csv(buffer, sep='|', quoting=3,
 error_bad_lines=False, encoding='latin1', low_memory=False)

You could also do it outside of python with a find and replace operation.

edited Nov 15 '18 at 21:15

answered Nov 15 '18 at 17:20

krflol

56728

I considered this as an option. Unfortunately, NULL values and Non-NULL values exist within the file inside the same column, so this option would generate the same issue, just on different rows.

– Eric
Nov 15 '18 at 18:32

Are you okay with dropping every row that contains a null value?

– krflol
Nov 15 '18 at 19:03

Absolutely. I'm good with just deleting the whole column, if possible. I'd just like to do it programmatically instead of manually. I'm willing to accept any possible solution.

– Eric
Nov 15 '18 at 21:13

edited. Should remove any line with '||'

– krflol
Nov 15 '18 at 21:16

This causes my file to drop to only 2 rows as most rows have a NULL value found in an individual column. It ignores all the rows where '||' is found. I think I may just have to request the file delimiter to be changed.

– Eric
Nov 16 '18 at 15:43

|
show 1 more comment

I think the easiest way would be to remove any instance of "||" then use pandas. An example of this would be:

import pandas as pd
from io import StringIO


buffer= StringIO()
with open(r'filepath.txt', 'r') as f:
 for line in f.readlines():
 if "||" not in line:
 buffer.write(line)




buffer.seek(0)
data = pd.read_csv(buffer, sep='|', quoting=3,
 error_bad_lines=False, encoding='latin1', low_memory=False)

You could also do it outside of python with a find and replace operation.

edited Nov 15 '18 at 21:15

answered Nov 15 '18 at 17:20

krflol

56728

I think the easiest way would be to remove any instance of "||" then use pandas. An example of this would be:

import pandas as pd
from io import StringIO


buffer= StringIO()
with open(r'filepath.txt', 'r') as f:
 for line in f.readlines():
 if "||" not in line:
 buffer.write(line)




buffer.seek(0)
data = pd.read_csv(buffer, sep='|', quoting=3,
 error_bad_lines=False, encoding='latin1', low_memory=False)

You could also do it outside of python with a find and replace operation.

edited Nov 15 '18 at 21:15

answered Nov 15 '18 at 17:20

krflol

56728

edited Nov 15 '18 at 21:15

answered Nov 15 '18 at 17:20

krflol

56728

answered Nov 15 '18 at 17:20

krflol

56728

answered Nov 15 '18 at 17:20

krflol

56728

I considered this as an option. Unfortunately, NULL values and Non-NULL values exist within the file inside the same column, so this option would generate the same issue, just on different rows.

– Eric
Nov 15 '18 at 18:32

Are you okay with dropping every row that contains a null value?

– krflol
Nov 15 '18 at 19:03

Absolutely. I'm good with just deleting the whole column, if possible. I'd just like to do it programmatically instead of manually. I'm willing to accept any possible solution.

– Eric
Nov 15 '18 at 21:13

edited. Should remove any line with '||'

– krflol
Nov 15 '18 at 21:16

This causes my file to drop to only 2 rows as most rows have a NULL value found in an individual column. It ignores all the rows where '||' is found. I think I may just have to request the file delimiter to be changed.

– Eric
Nov 16 '18 at 15:43

|
show 1 more comment

I considered this as an option. Unfortunately, NULL values and Non-NULL values exist within the file inside the same column, so this option would generate the same issue, just on different rows.

– Eric
Nov 15 '18 at 18:32

Are you okay with dropping every row that contains a null value?

– krflol
Nov 15 '18 at 19:03

Absolutely. I'm good with just deleting the whole column, if possible. I'd just like to do it programmatically instead of manually. I'm willing to accept any possible solution.

– Eric
Nov 15 '18 at 21:13

edited. Should remove any line with '||'

– krflol
Nov 15 '18 at 21:16

This causes my file to drop to only 2 rows as most rows have a NULL value found in an individual column. It ignores all the rows where '||' is found. I think I may just have to request the file delimiter to be changed.

– Eric
Nov 16 '18 at 15:43

I considered this as an option. Unfortunately, NULL values and Non-NULL values exist within the file inside the same column, so this option would generate the same issue, just on different rows.

– Eric
Nov 15 '18 at 18:32

Are you okay with dropping every row that contains a null value?

– krflol
Nov 15 '18 at 19:03

Absolutely. I'm good with just deleting the whole column, if possible. I'd just like to do it programmatically instead of manually. I'm willing to accept any possible solution.

– Eric
Nov 15 '18 at 21:13

edited. Should remove any line with '||'

– krflol
Nov 15 '18 at 21:16

This causes my file to drop to only 2 rows as most rows have a NULL value found in an individual column. It ignores all the rows where '||' is found. I think I may just have to request the file delimiter to be changed.

– Eric
Nov 16 '18 at 15:43

|
show 1 more comment

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Pfthb