Concatenating CSV files in bash preserving the header only once










1















Imagine I have a directory containing many subdirectories each containing some number of CSV files with the same structure (same number of columns and all containing the same header).



I am aware that I can run from the parent folder something like



find ./ -name '*.csv' -exec cat ; > ~/Desktop/result.csv


And this will work fine, expect for the fact that the header is repeated each time (once for each file).



I'm also aware that I can do something like sed 1d <filename> or tail -n +<N+1> <filename> to skip the first line of a file.



But in my case, it seems a bit more specialised. I want to preserve the header once for the first file and then skip the header for every file after that.



Is anyone aware of a way to achieve this using standard Unix tools (like find, head, tail, sed, awk etc.) and bash?



For example input files



 /folder1
/file1.csv
/file2.csv
/folder2
/file1.csv


Where each file has header:



A,B,C and each file has one data row 1,2,3



The desired output would be:



A,B,C
1,2,3
1,2,3
1,2,3


Marked As Duplicate



I feel this is different to other questions like this and this specifically because those solutions reference file1 and file2 in the solution. My question asks about a directory structure with an arbitrary number of files where I would not want to type out each file one by one.










share|improve this question



















  • 1





    once for the first file which file is first? Or it makes no difference from which file the header is taken?

    – Kamil Cuk
    Nov 14 '18 at 19:21











  • Makes no difference in this case :) all files contain the same header and I don't mind which comes first.

    – David
    Nov 15 '18 at 9:22











  • None of the linked questions are exact dup of this problem hence reopening.

    – anubhava
    Nov 15 '18 at 10:38















1















Imagine I have a directory containing many subdirectories each containing some number of CSV files with the same structure (same number of columns and all containing the same header).



I am aware that I can run from the parent folder something like



find ./ -name '*.csv' -exec cat ; > ~/Desktop/result.csv


And this will work fine, expect for the fact that the header is repeated each time (once for each file).



I'm also aware that I can do something like sed 1d <filename> or tail -n +<N+1> <filename> to skip the first line of a file.



But in my case, it seems a bit more specialised. I want to preserve the header once for the first file and then skip the header for every file after that.



Is anyone aware of a way to achieve this using standard Unix tools (like find, head, tail, sed, awk etc.) and bash?



For example input files



 /folder1
/file1.csv
/file2.csv
/folder2
/file1.csv


Where each file has header:



A,B,C and each file has one data row 1,2,3



The desired output would be:



A,B,C
1,2,3
1,2,3
1,2,3


Marked As Duplicate



I feel this is different to other questions like this and this specifically because those solutions reference file1 and file2 in the solution. My question asks about a directory structure with an arbitrary number of files where I would not want to type out each file one by one.










share|improve this question



















  • 1





    once for the first file which file is first? Or it makes no difference from which file the header is taken?

    – Kamil Cuk
    Nov 14 '18 at 19:21











  • Makes no difference in this case :) all files contain the same header and I don't mind which comes first.

    – David
    Nov 15 '18 at 9:22











  • None of the linked questions are exact dup of this problem hence reopening.

    – anubhava
    Nov 15 '18 at 10:38













1












1








1








Imagine I have a directory containing many subdirectories each containing some number of CSV files with the same structure (same number of columns and all containing the same header).



I am aware that I can run from the parent folder something like



find ./ -name '*.csv' -exec cat ; > ~/Desktop/result.csv


And this will work fine, expect for the fact that the header is repeated each time (once for each file).



I'm also aware that I can do something like sed 1d <filename> or tail -n +<N+1> <filename> to skip the first line of a file.



But in my case, it seems a bit more specialised. I want to preserve the header once for the first file and then skip the header for every file after that.



Is anyone aware of a way to achieve this using standard Unix tools (like find, head, tail, sed, awk etc.) and bash?



For example input files



 /folder1
/file1.csv
/file2.csv
/folder2
/file1.csv


Where each file has header:



A,B,C and each file has one data row 1,2,3



The desired output would be:



A,B,C
1,2,3
1,2,3
1,2,3


Marked As Duplicate



I feel this is different to other questions like this and this specifically because those solutions reference file1 and file2 in the solution. My question asks about a directory structure with an arbitrary number of files where I would not want to type out each file one by one.










share|improve this question
















Imagine I have a directory containing many subdirectories each containing some number of CSV files with the same structure (same number of columns and all containing the same header).



I am aware that I can run from the parent folder something like



find ./ -name '*.csv' -exec cat ; > ~/Desktop/result.csv


And this will work fine, expect for the fact that the header is repeated each time (once for each file).



I'm also aware that I can do something like sed 1d <filename> or tail -n +<N+1> <filename> to skip the first line of a file.



But in my case, it seems a bit more specialised. I want to preserve the header once for the first file and then skip the header for every file after that.



Is anyone aware of a way to achieve this using standard Unix tools (like find, head, tail, sed, awk etc.) and bash?



For example input files



 /folder1
/file1.csv
/file2.csv
/folder2
/file1.csv


Where each file has header:



A,B,C and each file has one data row 1,2,3



The desired output would be:



A,B,C
1,2,3
1,2,3
1,2,3


Marked As Duplicate



I feel this is different to other questions like this and this specifically because those solutions reference file1 and file2 in the solution. My question asks about a directory structure with an arbitrary number of files where I would not want to type out each file one by one.







bash awk sed cat unix-head






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 15 '18 at 9:27







David

















asked Nov 14 '18 at 19:13









DavidDavid

1,34741544




1,34741544







  • 1





    once for the first file which file is first? Or it makes no difference from which file the header is taken?

    – Kamil Cuk
    Nov 14 '18 at 19:21











  • Makes no difference in this case :) all files contain the same header and I don't mind which comes first.

    – David
    Nov 15 '18 at 9:22











  • None of the linked questions are exact dup of this problem hence reopening.

    – anubhava
    Nov 15 '18 at 10:38












  • 1





    once for the first file which file is first? Or it makes no difference from which file the header is taken?

    – Kamil Cuk
    Nov 14 '18 at 19:21











  • Makes no difference in this case :) all files contain the same header and I don't mind which comes first.

    – David
    Nov 15 '18 at 9:22











  • None of the linked questions are exact dup of this problem hence reopening.

    – anubhava
    Nov 15 '18 at 10:38







1




1





once for the first file which file is first? Or it makes no difference from which file the header is taken?

– Kamil Cuk
Nov 14 '18 at 19:21





once for the first file which file is first? Or it makes no difference from which file the header is taken?

– Kamil Cuk
Nov 14 '18 at 19:21













Makes no difference in this case :) all files contain the same header and I don't mind which comes first.

– David
Nov 15 '18 at 9:22





Makes no difference in this case :) all files contain the same header and I don't mind which comes first.

– David
Nov 15 '18 at 9:22













None of the linked questions are exact dup of this problem hence reopening.

– anubhava
Nov 15 '18 at 10:38





None of the linked questions are exact dup of this problem hence reopening.

– anubhava
Nov 15 '18 at 10:38












2 Answers
2






active

oldest

votes


















6














You may use this find + xargs + awk:



find . -name '*.csv' -print0 | xargs -0 awk 'NR==1 || FNR>1'


NR==1 || FNR>1 condition will be true for very first line in combined output or for every non-first line.






share|improve this answer


















  • 1





    This assumes that xargs can process all the files using a single call to awk.

    – chepner
    Nov 14 '18 at 20:12











  • Yes that's right.

    – anubhava
    Nov 14 '18 at 20:17


















0














$ 
> cat real-daily-wages-in-pounds-engla.tsv;
> tail -n+2 real-daily-wages-in-pounds-engla.tsv;
> | cat


You can pipe the output of multiple commands through cat. tail -n+2 selects all lines from a file, except the first.






share|improve this answer






















    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53307255%2fconcatenating-csv-files-in-bash-preserving-the-header-only-once%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    6














    You may use this find + xargs + awk:



    find . -name '*.csv' -print0 | xargs -0 awk 'NR==1 || FNR>1'


    NR==1 || FNR>1 condition will be true for very first line in combined output or for every non-first line.






    share|improve this answer


















    • 1





      This assumes that xargs can process all the files using a single call to awk.

      – chepner
      Nov 14 '18 at 20:12











    • Yes that's right.

      – anubhava
      Nov 14 '18 at 20:17















    6














    You may use this find + xargs + awk:



    find . -name '*.csv' -print0 | xargs -0 awk 'NR==1 || FNR>1'


    NR==1 || FNR>1 condition will be true for very first line in combined output or for every non-first line.






    share|improve this answer


















    • 1





      This assumes that xargs can process all the files using a single call to awk.

      – chepner
      Nov 14 '18 at 20:12











    • Yes that's right.

      – anubhava
      Nov 14 '18 at 20:17













    6












    6








    6







    You may use this find + xargs + awk:



    find . -name '*.csv' -print0 | xargs -0 awk 'NR==1 || FNR>1'


    NR==1 || FNR>1 condition will be true for very first line in combined output or for every non-first line.






    share|improve this answer













    You may use this find + xargs + awk:



    find . -name '*.csv' -print0 | xargs -0 awk 'NR==1 || FNR>1'


    NR==1 || FNR>1 condition will be true for very first line in combined output or for every non-first line.







    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Nov 14 '18 at 19:22









    anubhavaanubhava

    532k47330408




    532k47330408







    • 1





      This assumes that xargs can process all the files using a single call to awk.

      – chepner
      Nov 14 '18 at 20:12











    • Yes that's right.

      – anubhava
      Nov 14 '18 at 20:17












    • 1





      This assumes that xargs can process all the files using a single call to awk.

      – chepner
      Nov 14 '18 at 20:12











    • Yes that's right.

      – anubhava
      Nov 14 '18 at 20:17







    1




    1





    This assumes that xargs can process all the files using a single call to awk.

    – chepner
    Nov 14 '18 at 20:12





    This assumes that xargs can process all the files using a single call to awk.

    – chepner
    Nov 14 '18 at 20:12













    Yes that's right.

    – anubhava
    Nov 14 '18 at 20:17





    Yes that's right.

    – anubhava
    Nov 14 '18 at 20:17













    0














    $ 
    > cat real-daily-wages-in-pounds-engla.tsv;
    > tail -n+2 real-daily-wages-in-pounds-engla.tsv;
    > | cat


    You can pipe the output of multiple commands through cat. tail -n+2 selects all lines from a file, except the first.






    share|improve this answer



























      0














      $ 
      > cat real-daily-wages-in-pounds-engla.tsv;
      > tail -n+2 real-daily-wages-in-pounds-engla.tsv;
      > | cat


      You can pipe the output of multiple commands through cat. tail -n+2 selects all lines from a file, except the first.






      share|improve this answer

























        0












        0








        0







        $ 
        > cat real-daily-wages-in-pounds-engla.tsv;
        > tail -n+2 real-daily-wages-in-pounds-engla.tsv;
        > | cat


        You can pipe the output of multiple commands through cat. tail -n+2 selects all lines from a file, except the first.






        share|improve this answer













        $ 
        > cat real-daily-wages-in-pounds-engla.tsv;
        > tail -n+2 real-daily-wages-in-pounds-engla.tsv;
        > | cat


        You can pipe the output of multiple commands through cat. tail -n+2 selects all lines from a file, except the first.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 14 '18 at 19:21









        MarkMark

        4819




        4819



























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53307255%2fconcatenating-csv-files-in-bash-preserving-the-header-only-once%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Use pre created SQLite database for Android project in kotlin

            Darth Vader #20

            Ondo