Error when converting .gprobs files from Impute2 to PLINK format
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
I have a set of .gprobs files that I need to import into Plink. However, I keep getting the same error -- a problem in a specific line, even after I removed that line and the lines around it.
The data: I concatenated all 22 chromosome .gprobs files. To do so, I did replace the '---' at the beginning of the individual .gprobs files with the corresponding chromosome number (so now each line starts CHR SNP BP A1 A2...) . I also removed the SNPs that weren't imputed well (INFO scores below 0.7)
Code:
plink --gen data_chrALL.gprobs_chrcol_below0.7inforemoved --sample data_chr1.sample --out data_chrALL.gprobs_plink
The error message:
--data: 13404k variants converted.Error: Line 13404781 of .gen file has fewer tokens than expected.
As I said above, I removed that specific line and reran it, and got the same exact error message. I tried removing the lines above and below (in case the numbering was off by a header or something?) but again, same exact error.
Any thoughts or suggestions would be greatly appreciated!!! I'm not sure if this is the best place to post this, but I'm in desperate need of help.
bioinformatics imputation genome
add a comment |
I have a set of .gprobs files that I need to import into Plink. However, I keep getting the same error -- a problem in a specific line, even after I removed that line and the lines around it.
The data: I concatenated all 22 chromosome .gprobs files. To do so, I did replace the '---' at the beginning of the individual .gprobs files with the corresponding chromosome number (so now each line starts CHR SNP BP A1 A2...) . I also removed the SNPs that weren't imputed well (INFO scores below 0.7)
Code:
plink --gen data_chrALL.gprobs_chrcol_below0.7inforemoved --sample data_chr1.sample --out data_chrALL.gprobs_plink
The error message:
--data: 13404k variants converted.Error: Line 13404781 of .gen file has fewer tokens than expected.
As I said above, I removed that specific line and reran it, and got the same exact error message. I tried removing the lines above and below (in case the numbering was off by a header or something?) but again, same exact error.
Any thoughts or suggestions would be greatly appreciated!!! I'm not sure if this is the best place to post this, but I'm in desperate need of help.
bioinformatics imputation genome
how did you remove the SNPs that weren't imputed well? Probably you can use qctools to remove those SNPs and then try again
– Peter Chung
Dec 13 '18 at 8:21
add a comment |
I have a set of .gprobs files that I need to import into Plink. However, I keep getting the same error -- a problem in a specific line, even after I removed that line and the lines around it.
The data: I concatenated all 22 chromosome .gprobs files. To do so, I did replace the '---' at the beginning of the individual .gprobs files with the corresponding chromosome number (so now each line starts CHR SNP BP A1 A2...) . I also removed the SNPs that weren't imputed well (INFO scores below 0.7)
Code:
plink --gen data_chrALL.gprobs_chrcol_below0.7inforemoved --sample data_chr1.sample --out data_chrALL.gprobs_plink
The error message:
--data: 13404k variants converted.Error: Line 13404781 of .gen file has fewer tokens than expected.
As I said above, I removed that specific line and reran it, and got the same exact error message. I tried removing the lines above and below (in case the numbering was off by a header or something?) but again, same exact error.
Any thoughts or suggestions would be greatly appreciated!!! I'm not sure if this is the best place to post this, but I'm in desperate need of help.
bioinformatics imputation genome
I have a set of .gprobs files that I need to import into Plink. However, I keep getting the same error -- a problem in a specific line, even after I removed that line and the lines around it.
The data: I concatenated all 22 chromosome .gprobs files. To do so, I did replace the '---' at the beginning of the individual .gprobs files with the corresponding chromosome number (so now each line starts CHR SNP BP A1 A2...) . I also removed the SNPs that weren't imputed well (INFO scores below 0.7)
Code:
plink --gen data_chrALL.gprobs_chrcol_below0.7inforemoved --sample data_chr1.sample --out data_chrALL.gprobs_plink
The error message:
--data: 13404k variants converted.Error: Line 13404781 of .gen file has fewer tokens than expected.
As I said above, I removed that specific line and reran it, and got the same exact error message. I tried removing the lines above and below (in case the numbering was off by a header or something?) but again, same exact error.
Any thoughts or suggestions would be greatly appreciated!!! I'm not sure if this is the best place to post this, but I'm in desperate need of help.
bioinformatics imputation genome
bioinformatics imputation genome
edited Nov 15 '18 at 8:12
oliv
8,5701331
8,5701331
asked Nov 15 '18 at 7:09
womurwomur
62
62
how did you remove the SNPs that weren't imputed well? Probably you can use qctools to remove those SNPs and then try again
– Peter Chung
Dec 13 '18 at 8:21
add a comment |
how did you remove the SNPs that weren't imputed well? Probably you can use qctools to remove those SNPs and then try again
– Peter Chung
Dec 13 '18 at 8:21
how did you remove the SNPs that weren't imputed well? Probably you can use qctools to remove those SNPs and then try again
– Peter Chung
Dec 13 '18 at 8:21
how did you remove the SNPs that weren't imputed well? Probably you can use qctools to remove those SNPs and then try again
– Peter Chung
Dec 13 '18 at 8:21
add a comment |
1 Answer
1
active
oldest
votes
Plink is trying to tell you that it expects a certain number of items on each line (3N+5 fields where N is the number of samples) and on some lines it doesn´t see them. So,
(1) First of all, I would try to compare the lines causing errors to the ones which do not to see that the number of tockens/columns is actually the same, that it is correct and that there are no extra spaces or special characters which could cause escaping or misreading of the line. Also I would check which variants are causing troubles: maybe they are multiallelic or indels or something else and Plink doesn´t know how to deal with them. Or maybe there are no minor allele homozygotes at all for that variant and it is expressed in incorrect manner.
(2) I would check the specifications for the input files, both .gen and .sample to see that they are correct. As the files originate from Impute2 there might be some subtle differences.
(3) I would also update Plink version. From the code it seems that you are using either version 1.07 or 1.09. 1.x versions cannot represent probabilities and will make hard-calls so your lose a lot of information because of that. Plink 2.0 can utilize the probabilities and also should have better support for them. You will still be able to use hard-calls if you want.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53314114%2ferror-when-converting-gprobs-files-from-impute2-to-plink-format%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Plink is trying to tell you that it expects a certain number of items on each line (3N+5 fields where N is the number of samples) and on some lines it doesn´t see them. So,
(1) First of all, I would try to compare the lines causing errors to the ones which do not to see that the number of tockens/columns is actually the same, that it is correct and that there are no extra spaces or special characters which could cause escaping or misreading of the line. Also I would check which variants are causing troubles: maybe they are multiallelic or indels or something else and Plink doesn´t know how to deal with them. Or maybe there are no minor allele homozygotes at all for that variant and it is expressed in incorrect manner.
(2) I would check the specifications for the input files, both .gen and .sample to see that they are correct. As the files originate from Impute2 there might be some subtle differences.
(3) I would also update Plink version. From the code it seems that you are using either version 1.07 or 1.09. 1.x versions cannot represent probabilities and will make hard-calls so your lose a lot of information because of that. Plink 2.0 can utilize the probabilities and also should have better support for them. You will still be able to use hard-calls if you want.
add a comment |
Plink is trying to tell you that it expects a certain number of items on each line (3N+5 fields where N is the number of samples) and on some lines it doesn´t see them. So,
(1) First of all, I would try to compare the lines causing errors to the ones which do not to see that the number of tockens/columns is actually the same, that it is correct and that there are no extra spaces or special characters which could cause escaping or misreading of the line. Also I would check which variants are causing troubles: maybe they are multiallelic or indels or something else and Plink doesn´t know how to deal with them. Or maybe there are no minor allele homozygotes at all for that variant and it is expressed in incorrect manner.
(2) I would check the specifications for the input files, both .gen and .sample to see that they are correct. As the files originate from Impute2 there might be some subtle differences.
(3) I would also update Plink version. From the code it seems that you are using either version 1.07 or 1.09. 1.x versions cannot represent probabilities and will make hard-calls so your lose a lot of information because of that. Plink 2.0 can utilize the probabilities and also should have better support for them. You will still be able to use hard-calls if you want.
add a comment |
Plink is trying to tell you that it expects a certain number of items on each line (3N+5 fields where N is the number of samples) and on some lines it doesn´t see them. So,
(1) First of all, I would try to compare the lines causing errors to the ones which do not to see that the number of tockens/columns is actually the same, that it is correct and that there are no extra spaces or special characters which could cause escaping or misreading of the line. Also I would check which variants are causing troubles: maybe they are multiallelic or indels or something else and Plink doesn´t know how to deal with them. Or maybe there are no minor allele homozygotes at all for that variant and it is expressed in incorrect manner.
(2) I would check the specifications for the input files, both .gen and .sample to see that they are correct. As the files originate from Impute2 there might be some subtle differences.
(3) I would also update Plink version. From the code it seems that you are using either version 1.07 or 1.09. 1.x versions cannot represent probabilities and will make hard-calls so your lose a lot of information because of that. Plink 2.0 can utilize the probabilities and also should have better support for them. You will still be able to use hard-calls if you want.
Plink is trying to tell you that it expects a certain number of items on each line (3N+5 fields where N is the number of samples) and on some lines it doesn´t see them. So,
(1) First of all, I would try to compare the lines causing errors to the ones which do not to see that the number of tockens/columns is actually the same, that it is correct and that there are no extra spaces or special characters which could cause escaping or misreading of the line. Also I would check which variants are causing troubles: maybe they are multiallelic or indels or something else and Plink doesn´t know how to deal with them. Or maybe there are no minor allele homozygotes at all for that variant and it is expressed in incorrect manner.
(2) I would check the specifications for the input files, both .gen and .sample to see that they are correct. As the files originate from Impute2 there might be some subtle differences.
(3) I would also update Plink version. From the code it seems that you are using either version 1.07 or 1.09. 1.x versions cannot represent probabilities and will make hard-calls so your lose a lot of information because of that. Plink 2.0 can utilize the probabilities and also should have better support for them. You will still be able to use hard-calls if you want.
edited Mar 12 at 1:29
answered Mar 12 at 0:55
OkaOka
5126
5126
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53314114%2ferror-when-converting-gprobs-files-from-impute2-to-plink-format%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
how did you remove the SNPs that weren't imputed well? Probably you can use qctools to remove those SNPs and then try again
– Peter Chung
Dec 13 '18 at 8:21