Extracting partially repeating patterns in lines of text file
up vote
-1
down vote
favorite
Given a text file of the form:
firstword<number1>,<string1>:<number2>,<string2>:<number3>,<string>:<number4>...
firstword<number1>,<string1>:<number2>,<string2>:<number3>,<string>:<number4>...
firstword<number1>,<string1>:<number2>,<string2>:<number3>,<string>:<number4>...
...
where each line can differ from each other, and can have any number of string:number pairs. "firstword" is always the same. The contents of the strings and numbers can change, e.g. numbers could be "12345", string could be "abc" (without the quotes).
In addition, a line can have multiple times the same string (how many times is unknown and different per line), each with a different associated number. For example:
firstword123,abc:123,cde:234,abc:345,def:456
If one now wants to only extract the first word and number (in this case firstword123), as well as all string:number pairs in a line for a specific string, how can one do this? In the above example, if one choses for the string the value "abc", then the extracted line should look like:
firstword123,abc:123,abc:345
I am looking for a solution which works with Bash (and possibly other commands).
regex linux bash command-line
|
show 1 more comment
up vote
-1
down vote
favorite
Given a text file of the form:
firstword<number1>,<string1>:<number2>,<string2>:<number3>,<string>:<number4>...
firstword<number1>,<string1>:<number2>,<string2>:<number3>,<string>:<number4>...
firstword<number1>,<string1>:<number2>,<string2>:<number3>,<string>:<number4>...
...
where each line can differ from each other, and can have any number of string:number pairs. "firstword" is always the same. The contents of the strings and numbers can change, e.g. numbers could be "12345", string could be "abc" (without the quotes).
In addition, a line can have multiple times the same string (how many times is unknown and different per line), each with a different associated number. For example:
firstword123,abc:123,cde:234,abc:345,def:456
If one now wants to only extract the first word and number (in this case firstword123), as well as all string:number pairs in a line for a specific string, how can one do this? In the above example, if one choses for the string the value "abc", then the extracted line should look like:
firstword123,abc:123,abc:345
I am looking for a solution which works with Bash (and possibly other commands).
regex linux bash command-line
Huh? Surelygrep "^firstword123," yourFile
– Mark Setchell
Nov 9 at 17:41
what have you tried so far? did you try grep or sed?
– Nikos M.
Nov 9 at 17:44
@NikosM.: Yes, I have tried grep and sed, but nothing else. The problem is the variable number of repeats per line of the specific string of which I want to get the string:number patterns.
– Jadzia
Nov 9 at 17:50
@MarkSetchell: The problem with a simple grep is that it does not filter out the string:number pairs which do not match the specific string I want to select. See the example in my post, there the extracted line is shorter than the input line.
– Jadzia
Nov 9 at 17:51
you match the variable number of repeats in a loop using either grep or sed, there is no single-line solution
– Nikos M.
Nov 9 at 17:53
|
show 1 more comment
up vote
-1
down vote
favorite
up vote
-1
down vote
favorite
Given a text file of the form:
firstword<number1>,<string1>:<number2>,<string2>:<number3>,<string>:<number4>...
firstword<number1>,<string1>:<number2>,<string2>:<number3>,<string>:<number4>...
firstword<number1>,<string1>:<number2>,<string2>:<number3>,<string>:<number4>...
...
where each line can differ from each other, and can have any number of string:number pairs. "firstword" is always the same. The contents of the strings and numbers can change, e.g. numbers could be "12345", string could be "abc" (without the quotes).
In addition, a line can have multiple times the same string (how many times is unknown and different per line), each with a different associated number. For example:
firstword123,abc:123,cde:234,abc:345,def:456
If one now wants to only extract the first word and number (in this case firstword123), as well as all string:number pairs in a line for a specific string, how can one do this? In the above example, if one choses for the string the value "abc", then the extracted line should look like:
firstword123,abc:123,abc:345
I am looking for a solution which works with Bash (and possibly other commands).
regex linux bash command-line
Given a text file of the form:
firstword<number1>,<string1>:<number2>,<string2>:<number3>,<string>:<number4>...
firstword<number1>,<string1>:<number2>,<string2>:<number3>,<string>:<number4>...
firstword<number1>,<string1>:<number2>,<string2>:<number3>,<string>:<number4>...
...
where each line can differ from each other, and can have any number of string:number pairs. "firstword" is always the same. The contents of the strings and numbers can change, e.g. numbers could be "12345", string could be "abc" (without the quotes).
In addition, a line can have multiple times the same string (how many times is unknown and different per line), each with a different associated number. For example:
firstword123,abc:123,cde:234,abc:345,def:456
If one now wants to only extract the first word and number (in this case firstword123), as well as all string:number pairs in a line for a specific string, how can one do this? In the above example, if one choses for the string the value "abc", then the extracted line should look like:
firstword123,abc:123,abc:345
I am looking for a solution which works with Bash (and possibly other commands).
regex linux bash command-line
regex linux bash command-line
asked Nov 9 at 17:01
Jadzia
331310
331310
Huh? Surelygrep "^firstword123," yourFile
– Mark Setchell
Nov 9 at 17:41
what have you tried so far? did you try grep or sed?
– Nikos M.
Nov 9 at 17:44
@NikosM.: Yes, I have tried grep and sed, but nothing else. The problem is the variable number of repeats per line of the specific string of which I want to get the string:number patterns.
– Jadzia
Nov 9 at 17:50
@MarkSetchell: The problem with a simple grep is that it does not filter out the string:number pairs which do not match the specific string I want to select. See the example in my post, there the extracted line is shorter than the input line.
– Jadzia
Nov 9 at 17:51
you match the variable number of repeats in a loop using either grep or sed, there is no single-line solution
– Nikos M.
Nov 9 at 17:53
|
show 1 more comment
Huh? Surelygrep "^firstword123," yourFile
– Mark Setchell
Nov 9 at 17:41
what have you tried so far? did you try grep or sed?
– Nikos M.
Nov 9 at 17:44
@NikosM.: Yes, I have tried grep and sed, but nothing else. The problem is the variable number of repeats per line of the specific string of which I want to get the string:number patterns.
– Jadzia
Nov 9 at 17:50
@MarkSetchell: The problem with a simple grep is that it does not filter out the string:number pairs which do not match the specific string I want to select. See the example in my post, there the extracted line is shorter than the input line.
– Jadzia
Nov 9 at 17:51
you match the variable number of repeats in a loop using either grep or sed, there is no single-line solution
– Nikos M.
Nov 9 at 17:53
Huh? Surely
grep "^firstword123," yourFile
– Mark Setchell
Nov 9 at 17:41
Huh? Surely
grep "^firstword123," yourFile
– Mark Setchell
Nov 9 at 17:41
what have you tried so far? did you try grep or sed?
– Nikos M.
Nov 9 at 17:44
what have you tried so far? did you try grep or sed?
– Nikos M.
Nov 9 at 17:44
@NikosM.: Yes, I have tried grep and sed, but nothing else. The problem is the variable number of repeats per line of the specific string of which I want to get the string:number patterns.
– Jadzia
Nov 9 at 17:50
@NikosM.: Yes, I have tried grep and sed, but nothing else. The problem is the variable number of repeats per line of the specific string of which I want to get the string:number patterns.
– Jadzia
Nov 9 at 17:50
@MarkSetchell: The problem with a simple grep is that it does not filter out the string:number pairs which do not match the specific string I want to select. See the example in my post, there the extracted line is shorter than the input line.
– Jadzia
Nov 9 at 17:51
@MarkSetchell: The problem with a simple grep is that it does not filter out the string:number pairs which do not match the specific string I want to select. See the example in my post, there the extracted line is shorter than the input line.
– Jadzia
Nov 9 at 17:51
you match the variable number of repeats in a loop using either grep or sed, there is no single-line solution
– Nikos M.
Nov 9 at 17:53
you match the variable number of repeats in a loop using either grep or sed, there is no single-line solution
– Nikos M.
Nov 9 at 17:53
|
show 1 more comment
2 Answers
2
active
oldest
votes
up vote
1
down vote
accepted
Not a one-liner, but an all-bash solution. If you need faster code we can write something in awk
or perl
...
$: cat keyscan
#! /bin/env bash
key="$1"
while read line
do start=$line//,*/
line=$line#$start
line=$line#,
while [[ -n "$line" ]]
do case "$line" in
$key:[0-9]*) lead="$line//,*/"
start="$start,$lead"
line="$line#$lead"
line="$line#," ;;
*,*) line="$line#*," ;;
*) line='' ;;
esac
done
printf "$startn"
done
$: cat data
firstword123,abc:123,cde:234,abc:345,def:456
$: ./keyscan abc < data
firstword123,abc:123,abc:345
$: ./keyscan def < data
firstword123,def:456
$: ./keyscan cde < data
firstword123,cde:234
It will not be fast because it has a processing loop on every line of input, but it works on the sample line of data you gave.
Hm. We ought to put together some pages for answers at idownvotedbecau.se ...
– Paul Hodges
Nov 9 at 22:21
Thank you very much for your answer. This is a nice solution, and it works with the example. That it is relatively slow is no problem, since the input files are not that large.
– Jadzia
Nov 9 at 23:10
add a comment |
up vote
2
down vote
you can use perl for this
#!/usr/bin/perl
my $first='firstword123';
my $str='abc';
while (<DATA>)
next if not /^$first/;
print "$first";
print ",$_" for ($_ =~ /$str:d+/g);
__DATA__
firstword123,abc:123,cde:234,abc:345,def:456
out:
firstword123,abc:123,abc:345
Thank you very much for your nice answer.
– Jadzia
Nov 9 at 23:12
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
Not a one-liner, but an all-bash solution. If you need faster code we can write something in awk
or perl
...
$: cat keyscan
#! /bin/env bash
key="$1"
while read line
do start=$line//,*/
line=$line#$start
line=$line#,
while [[ -n "$line" ]]
do case "$line" in
$key:[0-9]*) lead="$line//,*/"
start="$start,$lead"
line="$line#$lead"
line="$line#," ;;
*,*) line="$line#*," ;;
*) line='' ;;
esac
done
printf "$startn"
done
$: cat data
firstword123,abc:123,cde:234,abc:345,def:456
$: ./keyscan abc < data
firstword123,abc:123,abc:345
$: ./keyscan def < data
firstword123,def:456
$: ./keyscan cde < data
firstword123,cde:234
It will not be fast because it has a processing loop on every line of input, but it works on the sample line of data you gave.
Hm. We ought to put together some pages for answers at idownvotedbecau.se ...
– Paul Hodges
Nov 9 at 22:21
Thank you very much for your answer. This is a nice solution, and it works with the example. That it is relatively slow is no problem, since the input files are not that large.
– Jadzia
Nov 9 at 23:10
add a comment |
up vote
1
down vote
accepted
Not a one-liner, but an all-bash solution. If you need faster code we can write something in awk
or perl
...
$: cat keyscan
#! /bin/env bash
key="$1"
while read line
do start=$line//,*/
line=$line#$start
line=$line#,
while [[ -n "$line" ]]
do case "$line" in
$key:[0-9]*) lead="$line//,*/"
start="$start,$lead"
line="$line#$lead"
line="$line#," ;;
*,*) line="$line#*," ;;
*) line='' ;;
esac
done
printf "$startn"
done
$: cat data
firstword123,abc:123,cde:234,abc:345,def:456
$: ./keyscan abc < data
firstword123,abc:123,abc:345
$: ./keyscan def < data
firstword123,def:456
$: ./keyscan cde < data
firstword123,cde:234
It will not be fast because it has a processing loop on every line of input, but it works on the sample line of data you gave.
Hm. We ought to put together some pages for answers at idownvotedbecau.se ...
– Paul Hodges
Nov 9 at 22:21
Thank you very much for your answer. This is a nice solution, and it works with the example. That it is relatively slow is no problem, since the input files are not that large.
– Jadzia
Nov 9 at 23:10
add a comment |
up vote
1
down vote
accepted
up vote
1
down vote
accepted
Not a one-liner, but an all-bash solution. If you need faster code we can write something in awk
or perl
...
$: cat keyscan
#! /bin/env bash
key="$1"
while read line
do start=$line//,*/
line=$line#$start
line=$line#,
while [[ -n "$line" ]]
do case "$line" in
$key:[0-9]*) lead="$line//,*/"
start="$start,$lead"
line="$line#$lead"
line="$line#," ;;
*,*) line="$line#*," ;;
*) line='' ;;
esac
done
printf "$startn"
done
$: cat data
firstword123,abc:123,cde:234,abc:345,def:456
$: ./keyscan abc < data
firstword123,abc:123,abc:345
$: ./keyscan def < data
firstword123,def:456
$: ./keyscan cde < data
firstword123,cde:234
It will not be fast because it has a processing loop on every line of input, but it works on the sample line of data you gave.
Not a one-liner, but an all-bash solution. If you need faster code we can write something in awk
or perl
...
$: cat keyscan
#! /bin/env bash
key="$1"
while read line
do start=$line//,*/
line=$line#$start
line=$line#,
while [[ -n "$line" ]]
do case "$line" in
$key:[0-9]*) lead="$line//,*/"
start="$start,$lead"
line="$line#$lead"
line="$line#," ;;
*,*) line="$line#*," ;;
*) line='' ;;
esac
done
printf "$startn"
done
$: cat data
firstword123,abc:123,cde:234,abc:345,def:456
$: ./keyscan abc < data
firstword123,abc:123,abc:345
$: ./keyscan def < data
firstword123,def:456
$: ./keyscan cde < data
firstword123,cde:234
It will not be fast because it has a processing loop on every line of input, but it works on the sample line of data you gave.
answered Nov 9 at 18:38
Paul Hodges
2,1591320
2,1591320
Hm. We ought to put together some pages for answers at idownvotedbecau.se ...
– Paul Hodges
Nov 9 at 22:21
Thank you very much for your answer. This is a nice solution, and it works with the example. That it is relatively slow is no problem, since the input files are not that large.
– Jadzia
Nov 9 at 23:10
add a comment |
Hm. We ought to put together some pages for answers at idownvotedbecau.se ...
– Paul Hodges
Nov 9 at 22:21
Thank you very much for your answer. This is a nice solution, and it works with the example. That it is relatively slow is no problem, since the input files are not that large.
– Jadzia
Nov 9 at 23:10
Hm. We ought to put together some pages for answers at idownvotedbecau.se ...
– Paul Hodges
Nov 9 at 22:21
Hm. We ought to put together some pages for answers at idownvotedbecau.se ...
– Paul Hodges
Nov 9 at 22:21
Thank you very much for your answer. This is a nice solution, and it works with the example. That it is relatively slow is no problem, since the input files are not that large.
– Jadzia
Nov 9 at 23:10
Thank you very much for your answer. This is a nice solution, and it works with the example. That it is relatively slow is no problem, since the input files are not that large.
– Jadzia
Nov 9 at 23:10
add a comment |
up vote
2
down vote
you can use perl for this
#!/usr/bin/perl
my $first='firstword123';
my $str='abc';
while (<DATA>)
next if not /^$first/;
print "$first";
print ",$_" for ($_ =~ /$str:d+/g);
__DATA__
firstword123,abc:123,cde:234,abc:345,def:456
out:
firstword123,abc:123,abc:345
Thank you very much for your nice answer.
– Jadzia
Nov 9 at 23:12
add a comment |
up vote
2
down vote
you can use perl for this
#!/usr/bin/perl
my $first='firstword123';
my $str='abc';
while (<DATA>)
next if not /^$first/;
print "$first";
print ",$_" for ($_ =~ /$str:d+/g);
__DATA__
firstword123,abc:123,cde:234,abc:345,def:456
out:
firstword123,abc:123,abc:345
Thank you very much for your nice answer.
– Jadzia
Nov 9 at 23:12
add a comment |
up vote
2
down vote
up vote
2
down vote
you can use perl for this
#!/usr/bin/perl
my $first='firstword123';
my $str='abc';
while (<DATA>)
next if not /^$first/;
print "$first";
print ",$_" for ($_ =~ /$str:d+/g);
__DATA__
firstword123,abc:123,cde:234,abc:345,def:456
out:
firstword123,abc:123,abc:345
you can use perl for this
#!/usr/bin/perl
my $first='firstword123';
my $str='abc';
while (<DATA>)
next if not /^$first/;
print "$first";
print ",$_" for ($_ =~ /$str:d+/g);
__DATA__
firstword123,abc:123,cde:234,abc:345,def:456
out:
firstword123,abc:123,abc:345
answered Nov 9 at 21:52
nlsdkd
814
814
Thank you very much for your nice answer.
– Jadzia
Nov 9 at 23:12
add a comment |
Thank you very much for your nice answer.
– Jadzia
Nov 9 at 23:12
Thank you very much for your nice answer.
– Jadzia
Nov 9 at 23:12
Thank you very much for your nice answer.
– Jadzia
Nov 9 at 23:12
add a comment |
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53230237%2fextracting-partially-repeating-patterns-in-lines-of-text-file%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Huh? Surely
grep "^firstword123," yourFile
– Mark Setchell
Nov 9 at 17:41
what have you tried so far? did you try grep or sed?
– Nikos M.
Nov 9 at 17:44
@NikosM.: Yes, I have tried grep and sed, but nothing else. The problem is the variable number of repeats per line of the specific string of which I want to get the string:number patterns.
– Jadzia
Nov 9 at 17:50
@MarkSetchell: The problem with a simple grep is that it does not filter out the string:number pairs which do not match the specific string I want to select. See the example in my post, there the extracted line is shorter than the input line.
– Jadzia
Nov 9 at 17:51
you match the variable number of repeats in a loop using either grep or sed, there is no single-line solution
– Nikos M.
Nov 9 at 17:53