Concatenating CSV files in bash preserving the header only once
Imagine I have a directory containing many subdirectories each containing some number of CSV files with the same structure (same number of columns and all containing the same header).
I am aware that I can run from the parent folder something like
find ./ -name '*.csv' -exec cat ; > ~/Desktop/result.csv
And this will work fine, expect for the fact that the header is repeated each time (once for each file).
I'm also aware that I can do something like sed 1d <filename>
or tail -n +<N+1> <filename>
to skip the first line of a file.
But in my case, it seems a bit more specialised. I want to preserve the header once for the first file and then skip the header for every file after that.
Is anyone aware of a way to achieve this using standard Unix tools (like find, head, tail, sed, awk etc.) and bash?
For example input files
/folder1
/file1.csv
/file2.csv
/folder2
/file1.csv
Where each file has header:
A,B,C
and each file has one data row 1,2,3
The desired output would be:
A,B,C
1,2,3
1,2,3
1,2,3
Marked As Duplicate
I feel this is different to other questions like this and this specifically because those solutions reference file1 and file2 in the solution. My question asks about a directory structure with an arbitrary number of files where I would not want to type out each file one by one.
bash awk sed cat unix-head
add a comment |
Imagine I have a directory containing many subdirectories each containing some number of CSV files with the same structure (same number of columns and all containing the same header).
I am aware that I can run from the parent folder something like
find ./ -name '*.csv' -exec cat ; > ~/Desktop/result.csv
And this will work fine, expect for the fact that the header is repeated each time (once for each file).
I'm also aware that I can do something like sed 1d <filename>
or tail -n +<N+1> <filename>
to skip the first line of a file.
But in my case, it seems a bit more specialised. I want to preserve the header once for the first file and then skip the header for every file after that.
Is anyone aware of a way to achieve this using standard Unix tools (like find, head, tail, sed, awk etc.) and bash?
For example input files
/folder1
/file1.csv
/file2.csv
/folder2
/file1.csv
Where each file has header:
A,B,C
and each file has one data row 1,2,3
The desired output would be:
A,B,C
1,2,3
1,2,3
1,2,3
Marked As Duplicate
I feel this is different to other questions like this and this specifically because those solutions reference file1 and file2 in the solution. My question asks about a directory structure with an arbitrary number of files where I would not want to type out each file one by one.
bash awk sed cat unix-head
1
once for the first file
which file is first? Or it makes no difference from which file the header is taken?
– Kamil Cuk
Nov 14 '18 at 19:21
Makes no difference in this case :) all files contain the same header and I don't mind which comes first.
– David
Nov 15 '18 at 9:22
None of the linked questions are exact dup of this problem hence reopening.
– anubhava
Nov 15 '18 at 10:38
add a comment |
Imagine I have a directory containing many subdirectories each containing some number of CSV files with the same structure (same number of columns and all containing the same header).
I am aware that I can run from the parent folder something like
find ./ -name '*.csv' -exec cat ; > ~/Desktop/result.csv
And this will work fine, expect for the fact that the header is repeated each time (once for each file).
I'm also aware that I can do something like sed 1d <filename>
or tail -n +<N+1> <filename>
to skip the first line of a file.
But in my case, it seems a bit more specialised. I want to preserve the header once for the first file and then skip the header for every file after that.
Is anyone aware of a way to achieve this using standard Unix tools (like find, head, tail, sed, awk etc.) and bash?
For example input files
/folder1
/file1.csv
/file2.csv
/folder2
/file1.csv
Where each file has header:
A,B,C
and each file has one data row 1,2,3
The desired output would be:
A,B,C
1,2,3
1,2,3
1,2,3
Marked As Duplicate
I feel this is different to other questions like this and this specifically because those solutions reference file1 and file2 in the solution. My question asks about a directory structure with an arbitrary number of files where I would not want to type out each file one by one.
bash awk sed cat unix-head
Imagine I have a directory containing many subdirectories each containing some number of CSV files with the same structure (same number of columns and all containing the same header).
I am aware that I can run from the parent folder something like
find ./ -name '*.csv' -exec cat ; > ~/Desktop/result.csv
And this will work fine, expect for the fact that the header is repeated each time (once for each file).
I'm also aware that I can do something like sed 1d <filename>
or tail -n +<N+1> <filename>
to skip the first line of a file.
But in my case, it seems a bit more specialised. I want to preserve the header once for the first file and then skip the header for every file after that.
Is anyone aware of a way to achieve this using standard Unix tools (like find, head, tail, sed, awk etc.) and bash?
For example input files
/folder1
/file1.csv
/file2.csv
/folder2
/file1.csv
Where each file has header:
A,B,C
and each file has one data row 1,2,3
The desired output would be:
A,B,C
1,2,3
1,2,3
1,2,3
Marked As Duplicate
I feel this is different to other questions like this and this specifically because those solutions reference file1 and file2 in the solution. My question asks about a directory structure with an arbitrary number of files where I would not want to type out each file one by one.
bash awk sed cat unix-head
bash awk sed cat unix-head
edited Nov 15 '18 at 9:27
David
asked Nov 14 '18 at 19:13
DavidDavid
1,34741544
1,34741544
1
once for the first file
which file is first? Or it makes no difference from which file the header is taken?
– Kamil Cuk
Nov 14 '18 at 19:21
Makes no difference in this case :) all files contain the same header and I don't mind which comes first.
– David
Nov 15 '18 at 9:22
None of the linked questions are exact dup of this problem hence reopening.
– anubhava
Nov 15 '18 at 10:38
add a comment |
1
once for the first file
which file is first? Or it makes no difference from which file the header is taken?
– Kamil Cuk
Nov 14 '18 at 19:21
Makes no difference in this case :) all files contain the same header and I don't mind which comes first.
– David
Nov 15 '18 at 9:22
None of the linked questions are exact dup of this problem hence reopening.
– anubhava
Nov 15 '18 at 10:38
1
1
once for the first file
which file is first? Or it makes no difference from which file the header is taken?– Kamil Cuk
Nov 14 '18 at 19:21
once for the first file
which file is first? Or it makes no difference from which file the header is taken?– Kamil Cuk
Nov 14 '18 at 19:21
Makes no difference in this case :) all files contain the same header and I don't mind which comes first.
– David
Nov 15 '18 at 9:22
Makes no difference in this case :) all files contain the same header and I don't mind which comes first.
– David
Nov 15 '18 at 9:22
None of the linked questions are exact dup of this problem hence reopening.
– anubhava
Nov 15 '18 at 10:38
None of the linked questions are exact dup of this problem hence reopening.
– anubhava
Nov 15 '18 at 10:38
add a comment |
2 Answers
2
active
oldest
votes
You may use this find + xargs + awk
:
find . -name '*.csv' -print0 | xargs -0 awk 'NR==1 || FNR>1'
NR==1 || FNR>1
condition will be true for very first line in combined output or for every non-first line.
1
This assumes thatxargs
can process all the files using a single call toawk
.
– chepner
Nov 14 '18 at 20:12
Yes that's right.
– anubhava
Nov 14 '18 at 20:17
add a comment |
$
> cat real-daily-wages-in-pounds-engla.tsv;
> tail -n+2 real-daily-wages-in-pounds-engla.tsv;
> | cat
You can pipe the output of multiple commands through cat
. tail -n+2
selects all lines from a file, except the first.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53307255%2fconcatenating-csv-files-in-bash-preserving-the-header-only-once%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
You may use this find + xargs + awk
:
find . -name '*.csv' -print0 | xargs -0 awk 'NR==1 || FNR>1'
NR==1 || FNR>1
condition will be true for very first line in combined output or for every non-first line.
1
This assumes thatxargs
can process all the files using a single call toawk
.
– chepner
Nov 14 '18 at 20:12
Yes that's right.
– anubhava
Nov 14 '18 at 20:17
add a comment |
You may use this find + xargs + awk
:
find . -name '*.csv' -print0 | xargs -0 awk 'NR==1 || FNR>1'
NR==1 || FNR>1
condition will be true for very first line in combined output or for every non-first line.
1
This assumes thatxargs
can process all the files using a single call toawk
.
– chepner
Nov 14 '18 at 20:12
Yes that's right.
– anubhava
Nov 14 '18 at 20:17
add a comment |
You may use this find + xargs + awk
:
find . -name '*.csv' -print0 | xargs -0 awk 'NR==1 || FNR>1'
NR==1 || FNR>1
condition will be true for very first line in combined output or for every non-first line.
You may use this find + xargs + awk
:
find . -name '*.csv' -print0 | xargs -0 awk 'NR==1 || FNR>1'
NR==1 || FNR>1
condition will be true for very first line in combined output or for every non-first line.
answered Nov 14 '18 at 19:22
anubhavaanubhava
532k47330408
532k47330408
1
This assumes thatxargs
can process all the files using a single call toawk
.
– chepner
Nov 14 '18 at 20:12
Yes that's right.
– anubhava
Nov 14 '18 at 20:17
add a comment |
1
This assumes thatxargs
can process all the files using a single call toawk
.
– chepner
Nov 14 '18 at 20:12
Yes that's right.
– anubhava
Nov 14 '18 at 20:17
1
1
This assumes that
xargs
can process all the files using a single call to awk
.– chepner
Nov 14 '18 at 20:12
This assumes that
xargs
can process all the files using a single call to awk
.– chepner
Nov 14 '18 at 20:12
Yes that's right.
– anubhava
Nov 14 '18 at 20:17
Yes that's right.
– anubhava
Nov 14 '18 at 20:17
add a comment |
$
> cat real-daily-wages-in-pounds-engla.tsv;
> tail -n+2 real-daily-wages-in-pounds-engla.tsv;
> | cat
You can pipe the output of multiple commands through cat
. tail -n+2
selects all lines from a file, except the first.
add a comment |
$
> cat real-daily-wages-in-pounds-engla.tsv;
> tail -n+2 real-daily-wages-in-pounds-engla.tsv;
> | cat
You can pipe the output of multiple commands through cat
. tail -n+2
selects all lines from a file, except the first.
add a comment |
$
> cat real-daily-wages-in-pounds-engla.tsv;
> tail -n+2 real-daily-wages-in-pounds-engla.tsv;
> | cat
You can pipe the output of multiple commands through cat
. tail -n+2
selects all lines from a file, except the first.
$
> cat real-daily-wages-in-pounds-engla.tsv;
> tail -n+2 real-daily-wages-in-pounds-engla.tsv;
> | cat
You can pipe the output of multiple commands through cat
. tail -n+2
selects all lines from a file, except the first.
answered Nov 14 '18 at 19:21
MarkMark
4819
4819
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53307255%2fconcatenating-csv-files-in-bash-preserving-the-header-only-once%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
once for the first file
which file is first? Or it makes no difference from which file the header is taken?– Kamil Cuk
Nov 14 '18 at 19:21
Makes no difference in this case :) all files contain the same header and I don't mind which comes first.
– David
Nov 15 '18 at 9:22
None of the linked questions are exact dup of this problem hence reopening.
– anubhava
Nov 15 '18 at 10:38