Error when converting .gprobs files from Impute2 to PLINK format



.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








1















I have a set of .gprobs files that I need to import into Plink. However, I keep getting the same error -- a problem in a specific line, even after I removed that line and the lines around it.



The data: I concatenated all 22 chromosome .gprobs files. To do so, I did replace the '---' at the beginning of the individual .gprobs files with the corresponding chromosome number (so now each line starts CHR SNP BP A1 A2...) . I also removed the SNPs that weren't imputed well (INFO scores below 0.7)



Code:



plink --gen data_chrALL.gprobs_chrcol_below0.7inforemoved --sample data_chr1.sample --out data_chrALL.gprobs_plink


The error message:



--data: 13404k variants converted.Error: Line 13404781 of .gen file has fewer tokens than expected.


As I said above, I removed that specific line and reran it, and got the same exact error message. I tried removing the lines above and below (in case the numbering was off by a header or something?) but again, same exact error.



Any thoughts or suggestions would be greatly appreciated!!! I'm not sure if this is the best place to post this, but I'm in desperate need of help.










share|improve this question
























  • how did you remove the SNPs that weren't imputed well? Probably you can use qctools to remove those SNPs and then try again

    – Peter Chung
    Dec 13 '18 at 8:21

















1















I have a set of .gprobs files that I need to import into Plink. However, I keep getting the same error -- a problem in a specific line, even after I removed that line and the lines around it.



The data: I concatenated all 22 chromosome .gprobs files. To do so, I did replace the '---' at the beginning of the individual .gprobs files with the corresponding chromosome number (so now each line starts CHR SNP BP A1 A2...) . I also removed the SNPs that weren't imputed well (INFO scores below 0.7)



Code:



plink --gen data_chrALL.gprobs_chrcol_below0.7inforemoved --sample data_chr1.sample --out data_chrALL.gprobs_plink


The error message:



--data: 13404k variants converted.Error: Line 13404781 of .gen file has fewer tokens than expected.


As I said above, I removed that specific line and reran it, and got the same exact error message. I tried removing the lines above and below (in case the numbering was off by a header or something?) but again, same exact error.



Any thoughts or suggestions would be greatly appreciated!!! I'm not sure if this is the best place to post this, but I'm in desperate need of help.










share|improve this question
























  • how did you remove the SNPs that weren't imputed well? Probably you can use qctools to remove those SNPs and then try again

    – Peter Chung
    Dec 13 '18 at 8:21













1












1








1








I have a set of .gprobs files that I need to import into Plink. However, I keep getting the same error -- a problem in a specific line, even after I removed that line and the lines around it.



The data: I concatenated all 22 chromosome .gprobs files. To do so, I did replace the '---' at the beginning of the individual .gprobs files with the corresponding chromosome number (so now each line starts CHR SNP BP A1 A2...) . I also removed the SNPs that weren't imputed well (INFO scores below 0.7)



Code:



plink --gen data_chrALL.gprobs_chrcol_below0.7inforemoved --sample data_chr1.sample --out data_chrALL.gprobs_plink


The error message:



--data: 13404k variants converted.Error: Line 13404781 of .gen file has fewer tokens than expected.


As I said above, I removed that specific line and reran it, and got the same exact error message. I tried removing the lines above and below (in case the numbering was off by a header or something?) but again, same exact error.



Any thoughts or suggestions would be greatly appreciated!!! I'm not sure if this is the best place to post this, but I'm in desperate need of help.










share|improve this question
















I have a set of .gprobs files that I need to import into Plink. However, I keep getting the same error -- a problem in a specific line, even after I removed that line and the lines around it.



The data: I concatenated all 22 chromosome .gprobs files. To do so, I did replace the '---' at the beginning of the individual .gprobs files with the corresponding chromosome number (so now each line starts CHR SNP BP A1 A2...) . I also removed the SNPs that weren't imputed well (INFO scores below 0.7)



Code:



plink --gen data_chrALL.gprobs_chrcol_below0.7inforemoved --sample data_chr1.sample --out data_chrALL.gprobs_plink


The error message:



--data: 13404k variants converted.Error: Line 13404781 of .gen file has fewer tokens than expected.


As I said above, I removed that specific line and reran it, and got the same exact error message. I tried removing the lines above and below (in case the numbering was off by a header or something?) but again, same exact error.



Any thoughts or suggestions would be greatly appreciated!!! I'm not sure if this is the best place to post this, but I'm in desperate need of help.







bioinformatics imputation genome






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 15 '18 at 8:12









oliv

8,5701331




8,5701331










asked Nov 15 '18 at 7:09









womurwomur

62




62












  • how did you remove the SNPs that weren't imputed well? Probably you can use qctools to remove those SNPs and then try again

    – Peter Chung
    Dec 13 '18 at 8:21

















  • how did you remove the SNPs that weren't imputed well? Probably you can use qctools to remove those SNPs and then try again

    – Peter Chung
    Dec 13 '18 at 8:21
















how did you remove the SNPs that weren't imputed well? Probably you can use qctools to remove those SNPs and then try again

– Peter Chung
Dec 13 '18 at 8:21





how did you remove the SNPs that weren't imputed well? Probably you can use qctools to remove those SNPs and then try again

– Peter Chung
Dec 13 '18 at 8:21












1 Answer
1






active

oldest

votes


















1














Plink is trying to tell you that it expects a certain number of items on each line (3N+5 fields where N is the number of samples) and on some lines it doesn´t see them. So,



(1) First of all, I would try to compare the lines causing errors to the ones which do not to see that the number of tockens/columns is actually the same, that it is correct and that there are no extra spaces or special characters which could cause escaping or misreading of the line. Also I would check which variants are causing troubles: maybe they are multiallelic or indels or something else and Plink doesn´t know how to deal with them. Or maybe there are no minor allele homozygotes at all for that variant and it is expressed in incorrect manner.



(2) I would check the specifications for the input files, both .gen and .sample to see that they are correct. As the files originate from Impute2 there might be some subtle differences.



(3) I would also update Plink version. From the code it seems that you are using either version 1.07 or 1.09. 1.x versions cannot represent probabilities and will make hard-calls so your lose a lot of information because of that. Plink 2.0 can utilize the probabilities and also should have better support for them. You will still be able to use hard-calls if you want.






share|improve this answer

























    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53314114%2ferror-when-converting-gprobs-files-from-impute2-to-plink-format%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    Plink is trying to tell you that it expects a certain number of items on each line (3N+5 fields where N is the number of samples) and on some lines it doesn´t see them. So,



    (1) First of all, I would try to compare the lines causing errors to the ones which do not to see that the number of tockens/columns is actually the same, that it is correct and that there are no extra spaces or special characters which could cause escaping or misreading of the line. Also I would check which variants are causing troubles: maybe they are multiallelic or indels or something else and Plink doesn´t know how to deal with them. Or maybe there are no minor allele homozygotes at all for that variant and it is expressed in incorrect manner.



    (2) I would check the specifications for the input files, both .gen and .sample to see that they are correct. As the files originate from Impute2 there might be some subtle differences.



    (3) I would also update Plink version. From the code it seems that you are using either version 1.07 or 1.09. 1.x versions cannot represent probabilities and will make hard-calls so your lose a lot of information because of that. Plink 2.0 can utilize the probabilities and also should have better support for them. You will still be able to use hard-calls if you want.






    share|improve this answer





























      1














      Plink is trying to tell you that it expects a certain number of items on each line (3N+5 fields where N is the number of samples) and on some lines it doesn´t see them. So,



      (1) First of all, I would try to compare the lines causing errors to the ones which do not to see that the number of tockens/columns is actually the same, that it is correct and that there are no extra spaces or special characters which could cause escaping or misreading of the line. Also I would check which variants are causing troubles: maybe they are multiallelic or indels or something else and Plink doesn´t know how to deal with them. Or maybe there are no minor allele homozygotes at all for that variant and it is expressed in incorrect manner.



      (2) I would check the specifications for the input files, both .gen and .sample to see that they are correct. As the files originate from Impute2 there might be some subtle differences.



      (3) I would also update Plink version. From the code it seems that you are using either version 1.07 or 1.09. 1.x versions cannot represent probabilities and will make hard-calls so your lose a lot of information because of that. Plink 2.0 can utilize the probabilities and also should have better support for them. You will still be able to use hard-calls if you want.






      share|improve this answer



























        1












        1








        1







        Plink is trying to tell you that it expects a certain number of items on each line (3N+5 fields where N is the number of samples) and on some lines it doesn´t see them. So,



        (1) First of all, I would try to compare the lines causing errors to the ones which do not to see that the number of tockens/columns is actually the same, that it is correct and that there are no extra spaces or special characters which could cause escaping or misreading of the line. Also I would check which variants are causing troubles: maybe they are multiallelic or indels or something else and Plink doesn´t know how to deal with them. Or maybe there are no minor allele homozygotes at all for that variant and it is expressed in incorrect manner.



        (2) I would check the specifications for the input files, both .gen and .sample to see that they are correct. As the files originate from Impute2 there might be some subtle differences.



        (3) I would also update Plink version. From the code it seems that you are using either version 1.07 or 1.09. 1.x versions cannot represent probabilities and will make hard-calls so your lose a lot of information because of that. Plink 2.0 can utilize the probabilities and also should have better support for them. You will still be able to use hard-calls if you want.






        share|improve this answer















        Plink is trying to tell you that it expects a certain number of items on each line (3N+5 fields where N is the number of samples) and on some lines it doesn´t see them. So,



        (1) First of all, I would try to compare the lines causing errors to the ones which do not to see that the number of tockens/columns is actually the same, that it is correct and that there are no extra spaces or special characters which could cause escaping or misreading of the line. Also I would check which variants are causing troubles: maybe they are multiallelic or indels or something else and Plink doesn´t know how to deal with them. Or maybe there are no minor allele homozygotes at all for that variant and it is expressed in incorrect manner.



        (2) I would check the specifications for the input files, both .gen and .sample to see that they are correct. As the files originate from Impute2 there might be some subtle differences.



        (3) I would also update Plink version. From the code it seems that you are using either version 1.07 or 1.09. 1.x versions cannot represent probabilities and will make hard-calls so your lose a lot of information because of that. Plink 2.0 can utilize the probabilities and also should have better support for them. You will still be able to use hard-calls if you want.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Mar 12 at 1:29

























        answered Mar 12 at 0:55









        OkaOka

        5126




        5126





























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53314114%2ferror-when-converting-gprobs-files-from-impute2-to-plink-format%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            How to how show current date and time by default on contact form 7 in WordPress without taking input from user in datetimepicker

            Syphilis

            Darth Vader #20