Extracting partially repeating patterns in lines of text file









up vote
-1
down vote

favorite












Given a text file of the form:



firstword<number1>,<string1>:<number2>,<string2>:<number3>,<string>:<number4>...
firstword<number1>,<string1>:<number2>,<string2>:<number3>,<string>:<number4>...
firstword<number1>,<string1>:<number2>,<string2>:<number3>,<string>:<number4>...
...


where each line can differ from each other, and can have any number of string:number pairs. "firstword" is always the same. The contents of the strings and numbers can change, e.g. numbers could be "12345", string could be "abc" (without the quotes).



In addition, a line can have multiple times the same string (how many times is unknown and different per line), each with a different associated number. For example:



firstword123,abc:123,cde:234,abc:345,def:456


If one now wants to only extract the first word and number (in this case firstword123), as well as all string:number pairs in a line for a specific string, how can one do this? In the above example, if one choses for the string the value "abc", then the extracted line should look like:



firstword123,abc:123,abc:345


I am looking for a solution which works with Bash (and possibly other commands).










share|improve this question





















  • Huh? Surely grep "^firstword123," yourFile
    – Mark Setchell
    Nov 9 at 17:41










  • what have you tried so far? did you try grep or sed?
    – Nikos M.
    Nov 9 at 17:44










  • @NikosM.: Yes, I have tried grep and sed, but nothing else. The problem is the variable number of repeats per line of the specific string of which I want to get the string:number patterns.
    – Jadzia
    Nov 9 at 17:50











  • @MarkSetchell: The problem with a simple grep is that it does not filter out the string:number pairs which do not match the specific string I want to select. See the example in my post, there the extracted line is shorter than the input line.
    – Jadzia
    Nov 9 at 17:51











  • you match the variable number of repeats in a loop using either grep or sed, there is no single-line solution
    – Nikos M.
    Nov 9 at 17:53














up vote
-1
down vote

favorite












Given a text file of the form:



firstword<number1>,<string1>:<number2>,<string2>:<number3>,<string>:<number4>...
firstword<number1>,<string1>:<number2>,<string2>:<number3>,<string>:<number4>...
firstword<number1>,<string1>:<number2>,<string2>:<number3>,<string>:<number4>...
...


where each line can differ from each other, and can have any number of string:number pairs. "firstword" is always the same. The contents of the strings and numbers can change, e.g. numbers could be "12345", string could be "abc" (without the quotes).



In addition, a line can have multiple times the same string (how many times is unknown and different per line), each with a different associated number. For example:



firstword123,abc:123,cde:234,abc:345,def:456


If one now wants to only extract the first word and number (in this case firstword123), as well as all string:number pairs in a line for a specific string, how can one do this? In the above example, if one choses for the string the value "abc", then the extracted line should look like:



firstword123,abc:123,abc:345


I am looking for a solution which works with Bash (and possibly other commands).










share|improve this question





















  • Huh? Surely grep "^firstword123," yourFile
    – Mark Setchell
    Nov 9 at 17:41










  • what have you tried so far? did you try grep or sed?
    – Nikos M.
    Nov 9 at 17:44










  • @NikosM.: Yes, I have tried grep and sed, but nothing else. The problem is the variable number of repeats per line of the specific string of which I want to get the string:number patterns.
    – Jadzia
    Nov 9 at 17:50











  • @MarkSetchell: The problem with a simple grep is that it does not filter out the string:number pairs which do not match the specific string I want to select. See the example in my post, there the extracted line is shorter than the input line.
    – Jadzia
    Nov 9 at 17:51











  • you match the variable number of repeats in a loop using either grep or sed, there is no single-line solution
    – Nikos M.
    Nov 9 at 17:53












up vote
-1
down vote

favorite









up vote
-1
down vote

favorite











Given a text file of the form:



firstword<number1>,<string1>:<number2>,<string2>:<number3>,<string>:<number4>...
firstword<number1>,<string1>:<number2>,<string2>:<number3>,<string>:<number4>...
firstword<number1>,<string1>:<number2>,<string2>:<number3>,<string>:<number4>...
...


where each line can differ from each other, and can have any number of string:number pairs. "firstword" is always the same. The contents of the strings and numbers can change, e.g. numbers could be "12345", string could be "abc" (without the quotes).



In addition, a line can have multiple times the same string (how many times is unknown and different per line), each with a different associated number. For example:



firstword123,abc:123,cde:234,abc:345,def:456


If one now wants to only extract the first word and number (in this case firstword123), as well as all string:number pairs in a line for a specific string, how can one do this? In the above example, if one choses for the string the value "abc", then the extracted line should look like:



firstword123,abc:123,abc:345


I am looking for a solution which works with Bash (and possibly other commands).










share|improve this question













Given a text file of the form:



firstword<number1>,<string1>:<number2>,<string2>:<number3>,<string>:<number4>...
firstword<number1>,<string1>:<number2>,<string2>:<number3>,<string>:<number4>...
firstword<number1>,<string1>:<number2>,<string2>:<number3>,<string>:<number4>...
...


where each line can differ from each other, and can have any number of string:number pairs. "firstword" is always the same. The contents of the strings and numbers can change, e.g. numbers could be "12345", string could be "abc" (without the quotes).



In addition, a line can have multiple times the same string (how many times is unknown and different per line), each with a different associated number. For example:



firstword123,abc:123,cde:234,abc:345,def:456


If one now wants to only extract the first word and number (in this case firstword123), as well as all string:number pairs in a line for a specific string, how can one do this? In the above example, if one choses for the string the value "abc", then the extracted line should look like:



firstword123,abc:123,abc:345


I am looking for a solution which works with Bash (and possibly other commands).







regex linux bash command-line






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 9 at 17:01









Jadzia

331310




331310











  • Huh? Surely grep "^firstword123," yourFile
    – Mark Setchell
    Nov 9 at 17:41










  • what have you tried so far? did you try grep or sed?
    – Nikos M.
    Nov 9 at 17:44










  • @NikosM.: Yes, I have tried grep and sed, but nothing else. The problem is the variable number of repeats per line of the specific string of which I want to get the string:number patterns.
    – Jadzia
    Nov 9 at 17:50











  • @MarkSetchell: The problem with a simple grep is that it does not filter out the string:number pairs which do not match the specific string I want to select. See the example in my post, there the extracted line is shorter than the input line.
    – Jadzia
    Nov 9 at 17:51











  • you match the variable number of repeats in a loop using either grep or sed, there is no single-line solution
    – Nikos M.
    Nov 9 at 17:53
















  • Huh? Surely grep "^firstword123," yourFile
    – Mark Setchell
    Nov 9 at 17:41










  • what have you tried so far? did you try grep or sed?
    – Nikos M.
    Nov 9 at 17:44










  • @NikosM.: Yes, I have tried grep and sed, but nothing else. The problem is the variable number of repeats per line of the specific string of which I want to get the string:number patterns.
    – Jadzia
    Nov 9 at 17:50











  • @MarkSetchell: The problem with a simple grep is that it does not filter out the string:number pairs which do not match the specific string I want to select. See the example in my post, there the extracted line is shorter than the input line.
    – Jadzia
    Nov 9 at 17:51











  • you match the variable number of repeats in a loop using either grep or sed, there is no single-line solution
    – Nikos M.
    Nov 9 at 17:53















Huh? Surely grep "^firstword123," yourFile
– Mark Setchell
Nov 9 at 17:41




Huh? Surely grep "^firstword123," yourFile
– Mark Setchell
Nov 9 at 17:41












what have you tried so far? did you try grep or sed?
– Nikos M.
Nov 9 at 17:44




what have you tried so far? did you try grep or sed?
– Nikos M.
Nov 9 at 17:44












@NikosM.: Yes, I have tried grep and sed, but nothing else. The problem is the variable number of repeats per line of the specific string of which I want to get the string:number patterns.
– Jadzia
Nov 9 at 17:50





@NikosM.: Yes, I have tried grep and sed, but nothing else. The problem is the variable number of repeats per line of the specific string of which I want to get the string:number patterns.
– Jadzia
Nov 9 at 17:50













@MarkSetchell: The problem with a simple grep is that it does not filter out the string:number pairs which do not match the specific string I want to select. See the example in my post, there the extracted line is shorter than the input line.
– Jadzia
Nov 9 at 17:51





@MarkSetchell: The problem with a simple grep is that it does not filter out the string:number pairs which do not match the specific string I want to select. See the example in my post, there the extracted line is shorter than the input line.
– Jadzia
Nov 9 at 17:51













you match the variable number of repeats in a loop using either grep or sed, there is no single-line solution
– Nikos M.
Nov 9 at 17:53




you match the variable number of repeats in a loop using either grep or sed, there is no single-line solution
– Nikos M.
Nov 9 at 17:53












2 Answers
2






active

oldest

votes

















up vote
1
down vote



accepted










Not a one-liner, but an all-bash solution. If you need faster code we can write something in awk or perl...



$: cat keyscan
#! /bin/env bash

key="$1"
while read line
do start=$line//,*/
line=$line#$start
line=$line#,
while [[ -n "$line" ]]
do case "$line" in
$key:[0-9]*) lead="$line//,*/"
start="$start,$lead"
line="$line#$lead"
line="$line#," ;;
*,*) line="$line#*," ;;
*) line='' ;;
esac
done
printf "$startn"
done

$: cat data
firstword123,abc:123,cde:234,abc:345,def:456

$: ./keyscan abc < data
firstword123,abc:123,abc:345

$: ./keyscan def < data
firstword123,def:456

$: ./keyscan cde < data
firstword123,cde:234


It will not be fast because it has a processing loop on every line of input, but it works on the sample line of data you gave.






share|improve this answer




















  • Hm. We ought to put together some pages for answers at idownvotedbecau.se ...
    – Paul Hodges
    Nov 9 at 22:21










  • Thank you very much for your answer. This is a nice solution, and it works with the example. That it is relatively slow is no problem, since the input files are not that large.
    – Jadzia
    Nov 9 at 23:10

















up vote
2
down vote













you can use perl for this



#!/usr/bin/perl
my $first='firstword123';
my $str='abc';

while (<DATA>)
next if not /^$first/;
print "$first";
print ",$_" for ($_ =~ /$str:d+/g);


__DATA__
firstword123,abc:123,cde:234,abc:345,def:456


out:



firstword123,abc:123,abc:345





share|improve this answer




















  • Thank you very much for your nice answer.
    – Jadzia
    Nov 9 at 23:12










Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53230237%2fextracting-partially-repeating-patterns-in-lines-of-text-file%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
1
down vote



accepted










Not a one-liner, but an all-bash solution. If you need faster code we can write something in awk or perl...



$: cat keyscan
#! /bin/env bash

key="$1"
while read line
do start=$line//,*/
line=$line#$start
line=$line#,
while [[ -n "$line" ]]
do case "$line" in
$key:[0-9]*) lead="$line//,*/"
start="$start,$lead"
line="$line#$lead"
line="$line#," ;;
*,*) line="$line#*," ;;
*) line='' ;;
esac
done
printf "$startn"
done

$: cat data
firstword123,abc:123,cde:234,abc:345,def:456

$: ./keyscan abc < data
firstword123,abc:123,abc:345

$: ./keyscan def < data
firstword123,def:456

$: ./keyscan cde < data
firstword123,cde:234


It will not be fast because it has a processing loop on every line of input, but it works on the sample line of data you gave.






share|improve this answer




















  • Hm. We ought to put together some pages for answers at idownvotedbecau.se ...
    – Paul Hodges
    Nov 9 at 22:21










  • Thank you very much for your answer. This is a nice solution, and it works with the example. That it is relatively slow is no problem, since the input files are not that large.
    – Jadzia
    Nov 9 at 23:10














up vote
1
down vote



accepted










Not a one-liner, but an all-bash solution. If you need faster code we can write something in awk or perl...



$: cat keyscan
#! /bin/env bash

key="$1"
while read line
do start=$line//,*/
line=$line#$start
line=$line#,
while [[ -n "$line" ]]
do case "$line" in
$key:[0-9]*) lead="$line//,*/"
start="$start,$lead"
line="$line#$lead"
line="$line#," ;;
*,*) line="$line#*," ;;
*) line='' ;;
esac
done
printf "$startn"
done

$: cat data
firstword123,abc:123,cde:234,abc:345,def:456

$: ./keyscan abc < data
firstword123,abc:123,abc:345

$: ./keyscan def < data
firstword123,def:456

$: ./keyscan cde < data
firstword123,cde:234


It will not be fast because it has a processing loop on every line of input, but it works on the sample line of data you gave.






share|improve this answer




















  • Hm. We ought to put together some pages for answers at idownvotedbecau.se ...
    – Paul Hodges
    Nov 9 at 22:21










  • Thank you very much for your answer. This is a nice solution, and it works with the example. That it is relatively slow is no problem, since the input files are not that large.
    – Jadzia
    Nov 9 at 23:10












up vote
1
down vote



accepted







up vote
1
down vote



accepted






Not a one-liner, but an all-bash solution. If you need faster code we can write something in awk or perl...



$: cat keyscan
#! /bin/env bash

key="$1"
while read line
do start=$line//,*/
line=$line#$start
line=$line#,
while [[ -n "$line" ]]
do case "$line" in
$key:[0-9]*) lead="$line//,*/"
start="$start,$lead"
line="$line#$lead"
line="$line#," ;;
*,*) line="$line#*," ;;
*) line='' ;;
esac
done
printf "$startn"
done

$: cat data
firstword123,abc:123,cde:234,abc:345,def:456

$: ./keyscan abc < data
firstword123,abc:123,abc:345

$: ./keyscan def < data
firstword123,def:456

$: ./keyscan cde < data
firstword123,cde:234


It will not be fast because it has a processing loop on every line of input, but it works on the sample line of data you gave.






share|improve this answer












Not a one-liner, but an all-bash solution. If you need faster code we can write something in awk or perl...



$: cat keyscan
#! /bin/env bash

key="$1"
while read line
do start=$line//,*/
line=$line#$start
line=$line#,
while [[ -n "$line" ]]
do case "$line" in
$key:[0-9]*) lead="$line//,*/"
start="$start,$lead"
line="$line#$lead"
line="$line#," ;;
*,*) line="$line#*," ;;
*) line='' ;;
esac
done
printf "$startn"
done

$: cat data
firstword123,abc:123,cde:234,abc:345,def:456

$: ./keyscan abc < data
firstword123,abc:123,abc:345

$: ./keyscan def < data
firstword123,def:456

$: ./keyscan cde < data
firstword123,cde:234


It will not be fast because it has a processing loop on every line of input, but it works on the sample line of data you gave.







share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 9 at 18:38









Paul Hodges

2,1591320




2,1591320











  • Hm. We ought to put together some pages for answers at idownvotedbecau.se ...
    – Paul Hodges
    Nov 9 at 22:21










  • Thank you very much for your answer. This is a nice solution, and it works with the example. That it is relatively slow is no problem, since the input files are not that large.
    – Jadzia
    Nov 9 at 23:10
















  • Hm. We ought to put together some pages for answers at idownvotedbecau.se ...
    – Paul Hodges
    Nov 9 at 22:21










  • Thank you very much for your answer. This is a nice solution, and it works with the example. That it is relatively slow is no problem, since the input files are not that large.
    – Jadzia
    Nov 9 at 23:10















Hm. We ought to put together some pages for answers at idownvotedbecau.se ...
– Paul Hodges
Nov 9 at 22:21




Hm. We ought to put together some pages for answers at idownvotedbecau.se ...
– Paul Hodges
Nov 9 at 22:21












Thank you very much for your answer. This is a nice solution, and it works with the example. That it is relatively slow is no problem, since the input files are not that large.
– Jadzia
Nov 9 at 23:10




Thank you very much for your answer. This is a nice solution, and it works with the example. That it is relatively slow is no problem, since the input files are not that large.
– Jadzia
Nov 9 at 23:10












up vote
2
down vote













you can use perl for this



#!/usr/bin/perl
my $first='firstword123';
my $str='abc';

while (<DATA>)
next if not /^$first/;
print "$first";
print ",$_" for ($_ =~ /$str:d+/g);


__DATA__
firstword123,abc:123,cde:234,abc:345,def:456


out:



firstword123,abc:123,abc:345





share|improve this answer




















  • Thank you very much for your nice answer.
    – Jadzia
    Nov 9 at 23:12














up vote
2
down vote













you can use perl for this



#!/usr/bin/perl
my $first='firstword123';
my $str='abc';

while (<DATA>)
next if not /^$first/;
print "$first";
print ",$_" for ($_ =~ /$str:d+/g);


__DATA__
firstword123,abc:123,cde:234,abc:345,def:456


out:



firstword123,abc:123,abc:345





share|improve this answer




















  • Thank you very much for your nice answer.
    – Jadzia
    Nov 9 at 23:12












up vote
2
down vote










up vote
2
down vote









you can use perl for this



#!/usr/bin/perl
my $first='firstword123';
my $str='abc';

while (<DATA>)
next if not /^$first/;
print "$first";
print ",$_" for ($_ =~ /$str:d+/g);


__DATA__
firstword123,abc:123,cde:234,abc:345,def:456


out:



firstword123,abc:123,abc:345





share|improve this answer












you can use perl for this



#!/usr/bin/perl
my $first='firstword123';
my $str='abc';

while (<DATA>)
next if not /^$first/;
print "$first";
print ",$_" for ($_ =~ /$str:d+/g);


__DATA__
firstword123,abc:123,cde:234,abc:345,def:456


out:



firstword123,abc:123,abc:345






share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 9 at 21:52









nlsdkd

814




814











  • Thank you very much for your nice answer.
    – Jadzia
    Nov 9 at 23:12
















  • Thank you very much for your nice answer.
    – Jadzia
    Nov 9 at 23:12















Thank you very much for your nice answer.
– Jadzia
Nov 9 at 23:12




Thank you very much for your nice answer.
– Jadzia
Nov 9 at 23:12

















 

draft saved


draft discarded















































 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53230237%2fextracting-partially-repeating-patterns-in-lines-of-text-file%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

How to how show current date and time by default on contact form 7 in WordPress without taking input from user in datetimepicker

Syphilis

Darth Vader #20