Having trouble splitting text due to extra commas










0















Here are a few examples:



Input



Col
"temp, temp2"
"name, inc., name2"


Output



Col_upd
["temp","temp2]
["name, inc.", "name2]


Right now, I'm using:



Col_upd.apply(lambda x: [i.lower().strip() for i in x.split(',')])


This fails in row 2 in the above example. I'm not sure what alternatives I have in this situation aside from your a dictionary.



Any suggestions would be really helpful.










share|improve this question
























  • What's the expected output?

    – Perdi Estaquel
    Nov 12 '18 at 22:25











  • Updated version is the expected output. Will make that more clear.

    – madsthaks
    Nov 12 '18 at 22:27











  • If you know that all of your problematic values are the same, you can start with something along the lines of "name, inc.".replace(", inc."," inc.") before you do the split, which would ideally use pandas built-in str.split()

    – G. Anderson
    Nov 12 '18 at 22:29











  • What if I don't? Looks like there are a wide variety of instances where this occurs.

    – madsthaks
    Nov 12 '18 at 22:32











  • @madsthaks Are the extra commas always in the first part? And do you always have only two parts?

    – Matthias Ossadnik
    Nov 12 '18 at 22:33
















0















Here are a few examples:



Input



Col
"temp, temp2"
"name, inc., name2"


Output



Col_upd
["temp","temp2]
["name, inc.", "name2]


Right now, I'm using:



Col_upd.apply(lambda x: [i.lower().strip() for i in x.split(',')])


This fails in row 2 in the above example. I'm not sure what alternatives I have in this situation aside from your a dictionary.



Any suggestions would be really helpful.










share|improve this question
























  • What's the expected output?

    – Perdi Estaquel
    Nov 12 '18 at 22:25











  • Updated version is the expected output. Will make that more clear.

    – madsthaks
    Nov 12 '18 at 22:27











  • If you know that all of your problematic values are the same, you can start with something along the lines of "name, inc.".replace(", inc."," inc.") before you do the split, which would ideally use pandas built-in str.split()

    – G. Anderson
    Nov 12 '18 at 22:29











  • What if I don't? Looks like there are a wide variety of instances where this occurs.

    – madsthaks
    Nov 12 '18 at 22:32











  • @madsthaks Are the extra commas always in the first part? And do you always have only two parts?

    – Matthias Ossadnik
    Nov 12 '18 at 22:33














0












0








0








Here are a few examples:



Input



Col
"temp, temp2"
"name, inc., name2"


Output



Col_upd
["temp","temp2]
["name, inc.", "name2]


Right now, I'm using:



Col_upd.apply(lambda x: [i.lower().strip() for i in x.split(',')])


This fails in row 2 in the above example. I'm not sure what alternatives I have in this situation aside from your a dictionary.



Any suggestions would be really helpful.










share|improve this question
















Here are a few examples:



Input



Col
"temp, temp2"
"name, inc., name2"


Output



Col_upd
["temp","temp2]
["name, inc.", "name2]


Right now, I'm using:



Col_upd.apply(lambda x: [i.lower().strip() for i in x.split(',')])


This fails in row 2 in the above example. I'm not sure what alternatives I have in this situation aside from your a dictionary.



Any suggestions would be really helpful.







python string split






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 12 '18 at 22:27







madsthaks

















asked Nov 12 '18 at 22:15









madsthaksmadsthaks

6372620




6372620












  • What's the expected output?

    – Perdi Estaquel
    Nov 12 '18 at 22:25











  • Updated version is the expected output. Will make that more clear.

    – madsthaks
    Nov 12 '18 at 22:27











  • If you know that all of your problematic values are the same, you can start with something along the lines of "name, inc.".replace(", inc."," inc.") before you do the split, which would ideally use pandas built-in str.split()

    – G. Anderson
    Nov 12 '18 at 22:29











  • What if I don't? Looks like there are a wide variety of instances where this occurs.

    – madsthaks
    Nov 12 '18 at 22:32











  • @madsthaks Are the extra commas always in the first part? And do you always have only two parts?

    – Matthias Ossadnik
    Nov 12 '18 at 22:33


















  • What's the expected output?

    – Perdi Estaquel
    Nov 12 '18 at 22:25











  • Updated version is the expected output. Will make that more clear.

    – madsthaks
    Nov 12 '18 at 22:27











  • If you know that all of your problematic values are the same, you can start with something along the lines of "name, inc.".replace(", inc."," inc.") before you do the split, which would ideally use pandas built-in str.split()

    – G. Anderson
    Nov 12 '18 at 22:29











  • What if I don't? Looks like there are a wide variety of instances where this occurs.

    – madsthaks
    Nov 12 '18 at 22:32











  • @madsthaks Are the extra commas always in the first part? And do you always have only two parts?

    – Matthias Ossadnik
    Nov 12 '18 at 22:33

















What's the expected output?

– Perdi Estaquel
Nov 12 '18 at 22:25





What's the expected output?

– Perdi Estaquel
Nov 12 '18 at 22:25













Updated version is the expected output. Will make that more clear.

– madsthaks
Nov 12 '18 at 22:27





Updated version is the expected output. Will make that more clear.

– madsthaks
Nov 12 '18 at 22:27













If you know that all of your problematic values are the same, you can start with something along the lines of "name, inc.".replace(", inc."," inc.") before you do the split, which would ideally use pandas built-in str.split()

– G. Anderson
Nov 12 '18 at 22:29





If you know that all of your problematic values are the same, you can start with something along the lines of "name, inc.".replace(", inc."," inc.") before you do the split, which would ideally use pandas built-in str.split()

– G. Anderson
Nov 12 '18 at 22:29













What if I don't? Looks like there are a wide variety of instances where this occurs.

– madsthaks
Nov 12 '18 at 22:32





What if I don't? Looks like there are a wide variety of instances where this occurs.

– madsthaks
Nov 12 '18 at 22:32













@madsthaks Are the extra commas always in the first part? And do you always have only two parts?

– Matthias Ossadnik
Nov 12 '18 at 22:33






@madsthaks Are the extra commas always in the first part? And do you always have only two parts?

– Matthias Ossadnik
Nov 12 '18 at 22:33













1 Answer
1






active

oldest

votes


















3














If we can assume that there's no extra commas in the second part, you can try to use rsplit().



Col_upd.apply(lambda x: [i.lower().strip() for i in x.rsplit(',', 1)])


str.rsplit() lets you specify how many times to split.






share|improve this answer























  • So what if there are more than 2 parts. This is an interesting solution but what would I do when there could be n number of splits

    – madsthaks
    Nov 13 '18 at 1:48











  • @madsthaks Can you be more specific? Do the additional parts have extra commas? If we can assume that only the first part has extra commas, let n = the number of parts, then we can just do Col_upd.apply(lambda x: [i.lower().strip() for i in x.rsplit(',', n-1)])

    – Eric Wang
    Nov 13 '18 at 1:54











  • If we can't assume only the first part has extra commas, then you need to provide some more information about at what condition an extra comma will be added.

    – Eric Wang
    Nov 13 '18 at 1:56










Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53270882%2fhaving-trouble-splitting-text-due-to-extra-commas%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









3














If we can assume that there's no extra commas in the second part, you can try to use rsplit().



Col_upd.apply(lambda x: [i.lower().strip() for i in x.rsplit(',', 1)])


str.rsplit() lets you specify how many times to split.






share|improve this answer























  • So what if there are more than 2 parts. This is an interesting solution but what would I do when there could be n number of splits

    – madsthaks
    Nov 13 '18 at 1:48











  • @madsthaks Can you be more specific? Do the additional parts have extra commas? If we can assume that only the first part has extra commas, let n = the number of parts, then we can just do Col_upd.apply(lambda x: [i.lower().strip() for i in x.rsplit(',', n-1)])

    – Eric Wang
    Nov 13 '18 at 1:54











  • If we can't assume only the first part has extra commas, then you need to provide some more information about at what condition an extra comma will be added.

    – Eric Wang
    Nov 13 '18 at 1:56















3














If we can assume that there's no extra commas in the second part, you can try to use rsplit().



Col_upd.apply(lambda x: [i.lower().strip() for i in x.rsplit(',', 1)])


str.rsplit() lets you specify how many times to split.






share|improve this answer























  • So what if there are more than 2 parts. This is an interesting solution but what would I do when there could be n number of splits

    – madsthaks
    Nov 13 '18 at 1:48











  • @madsthaks Can you be more specific? Do the additional parts have extra commas? If we can assume that only the first part has extra commas, let n = the number of parts, then we can just do Col_upd.apply(lambda x: [i.lower().strip() for i in x.rsplit(',', n-1)])

    – Eric Wang
    Nov 13 '18 at 1:54











  • If we can't assume only the first part has extra commas, then you need to provide some more information about at what condition an extra comma will be added.

    – Eric Wang
    Nov 13 '18 at 1:56













3












3








3







If we can assume that there's no extra commas in the second part, you can try to use rsplit().



Col_upd.apply(lambda x: [i.lower().strip() for i in x.rsplit(',', 1)])


str.rsplit() lets you specify how many times to split.






share|improve this answer













If we can assume that there's no extra commas in the second part, you can try to use rsplit().



Col_upd.apply(lambda x: [i.lower().strip() for i in x.rsplit(',', 1)])


str.rsplit() lets you specify how many times to split.







share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 12 '18 at 22:42









Eric WangEric Wang

33518




33518












  • So what if there are more than 2 parts. This is an interesting solution but what would I do when there could be n number of splits

    – madsthaks
    Nov 13 '18 at 1:48











  • @madsthaks Can you be more specific? Do the additional parts have extra commas? If we can assume that only the first part has extra commas, let n = the number of parts, then we can just do Col_upd.apply(lambda x: [i.lower().strip() for i in x.rsplit(',', n-1)])

    – Eric Wang
    Nov 13 '18 at 1:54











  • If we can't assume only the first part has extra commas, then you need to provide some more information about at what condition an extra comma will be added.

    – Eric Wang
    Nov 13 '18 at 1:56

















  • So what if there are more than 2 parts. This is an interesting solution but what would I do when there could be n number of splits

    – madsthaks
    Nov 13 '18 at 1:48











  • @madsthaks Can you be more specific? Do the additional parts have extra commas? If we can assume that only the first part has extra commas, let n = the number of parts, then we can just do Col_upd.apply(lambda x: [i.lower().strip() for i in x.rsplit(',', n-1)])

    – Eric Wang
    Nov 13 '18 at 1:54











  • If we can't assume only the first part has extra commas, then you need to provide some more information about at what condition an extra comma will be added.

    – Eric Wang
    Nov 13 '18 at 1:56
















So what if there are more than 2 parts. This is an interesting solution but what would I do when there could be n number of splits

– madsthaks
Nov 13 '18 at 1:48





So what if there are more than 2 parts. This is an interesting solution but what would I do when there could be n number of splits

– madsthaks
Nov 13 '18 at 1:48













@madsthaks Can you be more specific? Do the additional parts have extra commas? If we can assume that only the first part has extra commas, let n = the number of parts, then we can just do Col_upd.apply(lambda x: [i.lower().strip() for i in x.rsplit(',', n-1)])

– Eric Wang
Nov 13 '18 at 1:54





@madsthaks Can you be more specific? Do the additional parts have extra commas? If we can assume that only the first part has extra commas, let n = the number of parts, then we can just do Col_upd.apply(lambda x: [i.lower().strip() for i in x.rsplit(',', n-1)])

– Eric Wang
Nov 13 '18 at 1:54













If we can't assume only the first part has extra commas, then you need to provide some more information about at what condition an extra comma will be added.

– Eric Wang
Nov 13 '18 at 1:56





If we can't assume only the first part has extra commas, then you need to provide some more information about at what condition an extra comma will be added.

– Eric Wang
Nov 13 '18 at 1:56

















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53270882%2fhaving-trouble-splitting-text-due-to-extra-commas%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Use pre created SQLite database for Android project in kotlin

Darth Vader #20

Ondo