Extracting partially repeating patterns in lines of text file

up vote
-1
down vote

favorite

Given a text file of the form:

firstword<number1>,<string1>:<number2>,<string2>:<number3>,<string>:<number4>...
firstword<number1>,<string1>:<number2>,<string2>:<number3>,<string>:<number4>...
firstword<number1>,<string1>:<number2>,<string2>:<number3>,<string>:<number4>...
...

where each line can differ from each other, and can have any number of string:number pairs. "firstword" is always the same. The contents of the strings and numbers can change, e.g. numbers could be "12345", string could be "abc" (without the quotes).

In addition, a line can have multiple times the same string (how many times is unknown and different per line), each with a different associated number. For example:

firstword123,abc:123,cde:234,abc:345,def:456

If one now wants to only extract the first word and number (in this case firstword123), as well as all string:number pairs in a line for a specific string, how can one do this? In the above example, if one choses for the string the value "abc", then the extracted line should look like:

firstword123,abc:123,abc:345

I am looking for a solution which works with Bash (and possibly other commands).

asked Nov 9 at 17:01

Jadzia

331310

Huh? Surely grep "^firstword123," yourFile
– Mark Setchell
Nov 9 at 17:41

what have you tried so far? did you try grep or sed?
– Nikos M.
Nov 9 at 17:44

@NikosM.: Yes, I have tried grep and sed, but nothing else. The problem is the variable number of repeats per line of the specific string of which I want to get the string:number patterns.
– Jadzia
Nov 9 at 17:50

@MarkSetchell: The problem with a simple grep is that it does not filter out the string:number pairs which do not match the specific string I want to select. See the example in my post, there the extracted line is shorter than the input line.
– Jadzia
Nov 9 at 17:51

you match the variable number of repeats in a loop using either grep or sed, there is no single-line solution
– Nikos M.
Nov 9 at 17:53

|
show 1 more comment

up vote
-1
down vote

favorite

Given a text file of the form:

firstword<number1>,<string1>:<number2>,<string2>:<number3>,<string>:<number4>...
firstword<number1>,<string1>:<number2>,<string2>:<number3>,<string>:<number4>...
firstword<number1>,<string1>:<number2>,<string2>:<number3>,<string>:<number4>...
...

In addition, a line can have multiple times the same string (how many times is unknown and different per line), each with a different associated number. For example:

firstword123,abc:123,cde:234,abc:345,def:456

firstword123,abc:123,abc:345

I am looking for a solution which works with Bash (and possibly other commands).

asked Nov 9 at 17:01

Jadzia

331310

Huh? Surely grep "^firstword123," yourFile
– Mark Setchell
Nov 9 at 17:41

what have you tried so far? did you try grep or sed?
– Nikos M.
Nov 9 at 17:44

@NikosM.: Yes, I have tried grep and sed, but nothing else. The problem is the variable number of repeats per line of the specific string of which I want to get the string:number patterns.
– Jadzia
Nov 9 at 17:50

@MarkSetchell: The problem with a simple grep is that it does not filter out the string:number pairs which do not match the specific string I want to select. See the example in my post, there the extracted line is shorter than the input line.
– Jadzia
Nov 9 at 17:51

you match the variable number of repeats in a loop using either grep or sed, there is no single-line solution
– Nikos M.
Nov 9 at 17:53

|
show 1 more comment

up vote
-1
down vote

favorite

Given a text file of the form:

firstword<number1>,<string1>:<number2>,<string2>:<number3>,<string>:<number4>...
firstword<number1>,<string1>:<number2>,<string2>:<number3>,<string>:<number4>...
firstword<number1>,<string1>:<number2>,<string2>:<number3>,<string>:<number4>...
...

In addition, a line can have multiple times the same string (how many times is unknown and different per line), each with a different associated number. For example:

firstword123,abc:123,cde:234,abc:345,def:456

firstword123,abc:123,abc:345

I am looking for a solution which works with Bash (and possibly other commands).

asked Nov 9 at 17:01

Jadzia

331310

Given a text file of the form:

firstword<number1>,<string1>:<number2>,<string2>:<number3>,<string>:<number4>...
firstword<number1>,<string1>:<number2>,<string2>:<number3>,<string>:<number4>...
firstword<number1>,<string1>:<number2>,<string2>:<number3>,<string>:<number4>...
...

In addition, a line can have multiple times the same string (how many times is unknown and different per line), each with a different associated number. For example:

firstword123,abc:123,cde:234,abc:345,def:456

firstword123,abc:123,abc:345

I am looking for a solution which works with Bash (and possibly other commands).

regex linux bash command-line

asked Nov 9 at 17:01

Jadzia

331310

asked Nov 9 at 17:01

Jadzia

331310

asked Nov 9 at 17:01

Jadzia

331310

asked Nov 9 at 17:01

Jadzia

331310

asked Nov 9 at 17:01

Jadzia

331310

Huh? Surely grep "^firstword123," yourFile
– Mark Setchell
Nov 9 at 17:41

what have you tried so far? did you try grep or sed?
– Nikos M.
Nov 9 at 17:44

@NikosM.: Yes, I have tried grep and sed, but nothing else. The problem is the variable number of repeats per line of the specific string of which I want to get the string:number patterns.
– Jadzia
Nov 9 at 17:50

@MarkSetchell: The problem with a simple grep is that it does not filter out the string:number pairs which do not match the specific string I want to select. See the example in my post, there the extracted line is shorter than the input line.
– Jadzia
Nov 9 at 17:51

you match the variable number of repeats in a loop using either grep or sed, there is no single-line solution
– Nikos M.
Nov 9 at 17:53

|
show 1 more comment

Huh? Surely grep "^firstword123," yourFile
– Mark Setchell
Nov 9 at 17:41

what have you tried so far? did you try grep or sed?
– Nikos M.
Nov 9 at 17:44

@NikosM.: Yes, I have tried grep and sed, but nothing else. The problem is the variable number of repeats per line of the specific string of which I want to get the string:number patterns.
– Jadzia
Nov 9 at 17:50

@MarkSetchell: The problem with a simple grep is that it does not filter out the string:number pairs which do not match the specific string I want to select. See the example in my post, there the extracted line is shorter than the input line.
– Jadzia
Nov 9 at 17:51

you match the variable number of repeats in a loop using either grep or sed, there is no single-line solution
– Nikos M.
Nov 9 at 17:53

Huh? Surely grep "^firstword123," yourFile
– Mark Setchell
Nov 9 at 17:41

what have you tried so far? did you try grep or sed?
– Nikos M.
Nov 9 at 17:44

@NikosM.: Yes, I have tried grep and sed, but nothing else. The problem is the variable number of repeats per line of the specific string of which I want to get the string:number patterns.
– Jadzia
Nov 9 at 17:50

@MarkSetchell: The problem with a simple grep is that it does not filter out the string:number pairs which do not match the specific string I want to select. See the example in my post, there the extracted line is shorter than the input line.
– Jadzia
Nov 9 at 17:51

you match the variable number of repeats in a loop using either grep or sed, there is no single-line solution
– Nikos M.
Nov 9 at 17:53

|
show 1 more comment

2 Answers
2

active

oldest

votes

up vote
1
down vote

accepted

Not a one-liner, but an all-bash solution. If you need faster code we can write something in awk or perl...

$: cat keyscan
#! /bin/env bash

key="$1"
while read line
do start=$line//,*/
 line=$line#$start
 line=$line#,
 while [[ -n "$line" ]]
 do case "$line" in
 $key:[0-9]*) lead="$line//,*/"
 start="$start,$lead"
 line="$line#$lead"
 line="$line#," ;;
 *,*) line="$line#*," ;;
 *) line='' ;;
 esac
 done
 printf "$startn"
done

$: cat data
firstword123,abc:123,cde:234,abc:345,def:456

$: ./keyscan abc < data
firstword123,abc:123,abc:345

$: ./keyscan def < data
firstword123,def:456

$: ./keyscan cde < data
firstword123,cde:234

It will not be fast because it has a processing loop on every line of input, but it works on the sample line of data you gave.

answered Nov 9 at 18:38

Paul Hodges

2,1591320

Hm. We ought to put together some pages for answers at idownvotedbecau.se ...
– Paul Hodges
Nov 9 at 22:21

Thank you very much for your answer. This is a nice solution, and it works with the example. That it is relatively slow is no problem, since the input files are not that large.
– Jadzia
Nov 9 at 23:10

add a comment |

up vote
2
down vote

you can use perl for this

#!/usr/bin/perl
my $first='firstword123';
my $str='abc';

while (<DATA>) 
 next if not /^$first/;
 print "$first";
 print ",$_" for ($_ =~ /$str:d+/g);


__DATA__
firstword123,abc:123,cde:234,abc:345,def:456

out:

firstword123,abc:123,abc:345

answered Nov 9 at 21:52

nlsdkd

814

Thank you very much for your nice answer.
– Jadzia
Nov 9 at 23:12

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53230237%2fextracting-partially-repeating-patterns-in-lines-of-text-file%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
1
down vote

accepted

Not a one-liner, but an all-bash solution. If you need faster code we can write something in awk or perl...

$: cat keyscan
#! /bin/env bash

key="$1"
while read line
do start=$line//,*/
 line=$line#$start
 line=$line#,
 while [[ -n "$line" ]]
 do case "$line" in
 $key:[0-9]*) lead="$line//,*/"
 start="$start,$lead"
 line="$line#$lead"
 line="$line#," ;;
 *,*) line="$line#*," ;;
 *) line='' ;;
 esac
 done
 printf "$startn"
done

$: cat data
firstword123,abc:123,cde:234,abc:345,def:456

$: ./keyscan abc < data
firstword123,abc:123,abc:345

$: ./keyscan def < data
firstword123,def:456

$: ./keyscan cde < data
firstword123,cde:234

It will not be fast because it has a processing loop on every line of input, but it works on the sample line of data you gave.

answered Nov 9 at 18:38

Paul Hodges

2,1591320

Hm. We ought to put together some pages for answers at idownvotedbecau.se ...
– Paul Hodges
Nov 9 at 22:21

Thank you very much for your answer. This is a nice solution, and it works with the example. That it is relatively slow is no problem, since the input files are not that large.
– Jadzia
Nov 9 at 23:10

add a comment |

up vote
1
down vote

accepted

Not a one-liner, but an all-bash solution. If you need faster code we can write something in awk or perl...

$: cat keyscan
#! /bin/env bash

key="$1"
while read line
do start=$line//,*/
 line=$line#$start
 line=$line#,
 while [[ -n "$line" ]]
 do case "$line" in
 $key:[0-9]*) lead="$line//,*/"
 start="$start,$lead"
 line="$line#$lead"
 line="$line#," ;;
 *,*) line="$line#*," ;;
 *) line='' ;;
 esac
 done
 printf "$startn"
done

$: cat data
firstword123,abc:123,cde:234,abc:345,def:456

$: ./keyscan abc < data
firstword123,abc:123,abc:345

$: ./keyscan def < data
firstword123,def:456

$: ./keyscan cde < data
firstword123,cde:234

It will not be fast because it has a processing loop on every line of input, but it works on the sample line of data you gave.

answered Nov 9 at 18:38

Paul Hodges

2,1591320

Hm. We ought to put together some pages for answers at idownvotedbecau.se ...
– Paul Hodges
Nov 9 at 22:21

Thank you very much for your answer. This is a nice solution, and it works with the example. That it is relatively slow is no problem, since the input files are not that large.
– Jadzia
Nov 9 at 23:10

add a comment |

up vote
1
down vote

accepted

Not a one-liner, but an all-bash solution. If you need faster code we can write something in awk or perl...

$: cat keyscan
#! /bin/env bash

key="$1"
while read line
do start=$line//,*/
 line=$line#$start
 line=$line#,
 while [[ -n "$line" ]]
 do case "$line" in
 $key:[0-9]*) lead="$line//,*/"
 start="$start,$lead"
 line="$line#$lead"
 line="$line#," ;;
 *,*) line="$line#*," ;;
 *) line='' ;;
 esac
 done
 printf "$startn"
done

$: cat data
firstword123,abc:123,cde:234,abc:345,def:456

$: ./keyscan abc < data
firstword123,abc:123,abc:345

$: ./keyscan def < data
firstword123,def:456

$: ./keyscan cde < data
firstword123,cde:234

It will not be fast because it has a processing loop on every line of input, but it works on the sample line of data you gave.

answered Nov 9 at 18:38

Paul Hodges

2,1591320

Not a one-liner, but an all-bash solution. If you need faster code we can write something in awk or perl...

$: cat keyscan
#! /bin/env bash

key="$1"
while read line
do start=$line//,*/
 line=$line#$start
 line=$line#,
 while [[ -n "$line" ]]
 do case "$line" in
 $key:[0-9]*) lead="$line//,*/"
 start="$start,$lead"
 line="$line#$lead"
 line="$line#," ;;
 *,*) line="$line#*," ;;
 *) line='' ;;
 esac
 done
 printf "$startn"
done

$: cat data
firstword123,abc:123,cde:234,abc:345,def:456

$: ./keyscan abc < data
firstword123,abc:123,abc:345

$: ./keyscan def < data
firstword123,def:456

$: ./keyscan cde < data
firstword123,cde:234

It will not be fast because it has a processing loop on every line of input, but it works on the sample line of data you gave.

answered Nov 9 at 18:38

Paul Hodges

2,1591320

answered Nov 9 at 18:38

Paul Hodges

2,1591320

answered Nov 9 at 18:38

Paul Hodges

2,1591320

answered Nov 9 at 18:38

Paul Hodges

2,1591320

Hm. We ought to put together some pages for answers at idownvotedbecau.se ...
– Paul Hodges
Nov 9 at 22:21

Thank you very much for your answer. This is a nice solution, and it works with the example. That it is relatively slow is no problem, since the input files are not that large.
– Jadzia
Nov 9 at 23:10

add a comment |

Hm. We ought to put together some pages for answers at idownvotedbecau.se ...
– Paul Hodges
Nov 9 at 22:21

Thank you very much for your answer. This is a nice solution, and it works with the example. That it is relatively slow is no problem, since the input files are not that large.
– Jadzia
Nov 9 at 23:10

Hm. We ought to put together some pages for answers at idownvotedbecau.se ...
– Paul Hodges
Nov 9 at 22:21

Thank you very much for your answer. This is a nice solution, and it works with the example. That it is relatively slow is no problem, since the input files are not that large.
– Jadzia
Nov 9 at 23:10

add a comment |

up vote
2
down vote

you can use perl for this

#!/usr/bin/perl
my $first='firstword123';
my $str='abc';

while (<DATA>) 
 next if not /^$first/;
 print "$first";
 print ",$_" for ($_ =~ /$str:d+/g);


__DATA__
firstword123,abc:123,cde:234,abc:345,def:456

out:

firstword123,abc:123,abc:345

answered Nov 9 at 21:52

nlsdkd

814

Thank you very much for your nice answer.
– Jadzia
Nov 9 at 23:12

add a comment |

up vote
2
down vote

you can use perl for this

#!/usr/bin/perl
my $first='firstword123';
my $str='abc';

while (<DATA>) 
 next if not /^$first/;
 print "$first";
 print ",$_" for ($_ =~ /$str:d+/g);


__DATA__
firstword123,abc:123,cde:234,abc:345,def:456

out:

firstword123,abc:123,abc:345

answered Nov 9 at 21:52

nlsdkd

814

Thank you very much for your nice answer.
– Jadzia
Nov 9 at 23:12

add a comment |

up vote
2
down vote

you can use perl for this

#!/usr/bin/perl
my $first='firstword123';
my $str='abc';

while (<DATA>) 
 next if not /^$first/;
 print "$first";
 print ",$_" for ($_ =~ /$str:d+/g);


__DATA__
firstword123,abc:123,cde:234,abc:345,def:456

out:

firstword123,abc:123,abc:345

answered Nov 9 at 21:52

nlsdkd

814

you can use perl for this

#!/usr/bin/perl
my $first='firstword123';
my $str='abc';

while (<DATA>) 
 next if not /^$first/;
 print "$first";
 print ",$_" for ($_ =~ /$str:d+/g);


__DATA__
firstword123,abc:123,cde:234,abc:345,def:456

out:

firstword123,abc:123,abc:345

answered Nov 9 at 21:52

nlsdkd

814

answered Nov 9 at 21:52

nlsdkd

814

answered Nov 9 at 21:52

nlsdkd

814

answered Nov 9 at 21:52

nlsdkd

814

Thank you very much for your nice answer.
– Jadzia
Nov 9 at 23:12

add a comment |

Thank you very much for your nice answer.
– Jadzia
Nov 9 at 23:12

Thank you very much for your nice answer.
– Jadzia
Nov 9 at 23:12

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Pfthb