Getting estimate and p-value into dataframe
I am fairly new to R. My data looks something like this (only with 9000 columns and 66 rows)
Time <- c(0, 6.4, 8.6, 15.2, 19.4, 28.1, 42.6, 73, 73, 85, 88, 88, 88, 88, 88)
ID1 <- c(55030, 54539, 54937, 48897, 58160, 54686, 55393, 47191, 39805, 37601, 51328, 28882, 45587, 60061, 31892, 28670)
ID2 <- c(20485, 11907, 10571, 20974, 10462, 11149, 20970, NA, NA, 9295, NA, 8714, 24446, 10748, 9037, 11859)
ID3 <- c(93914, 44482, 43705, 51144, 49485, 43908, 44324, 37342, 18872, 39660,61673, 43837, 36528, 44738, 41648, 11100)
DF <- data.frame (Time, ID1, ID2, ID3)
I want to get a data frame that looks like this :
ID1, rho, p-value
ID2, rho, p-value
...
The rho and the p-value would be the results from a cor.test (spearman) with Time and each ID
Among other things I've tried this:
results <- data.frame(ID="", Estimate="", P.value="")
estimates = numeric(16)
pvalues = numeric(16)
for (i in 2:4)
test <- cor.test(DF[,1], DF[,i])
estimates[i] = test$estimate
pvalues[i] = test$p.value
And R gives me the following error:
Error: object 'test' not found
I've also tried:
result <- do.call(rbind,lapply(2:4, function(x)
cor.result<-cor.test(DF[,1],DF[,x])
pvalue <- cor.result$p.value
estimate <- cor.result$estimate
return(data.frame(pvalue = pvalue, estimate = estimate))
)
)
And R gives me a similar error
Error: object 'cor.result' not found
I'm sure it's an easy fix but I can't seem to figure it out. Any help is more than welcome.
This is what I got after running
dput(head(SmallDataset[,1:5]))
structure(list(Species = c("Human.hsapiens", "Chimpanzee.ptroglodytes",
"Gorilla.ggorilla", "Orangutan.pabelii", "Gibbon.nleucogenys",
"Macaque.mmulatta"), Time = c(0, 6.4, 8.61, 15.2, 19.43, 28.1
), ID1 = c(55030, 54539, 54937, 48897, 58160, 54686), ID2 = c(20485,
11907, 10571, 20974, 10462, 11149), ID3 = c(93914, 44482, 43705,
51144, 49485, 43908)), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
r
add a comment |
I am fairly new to R. My data looks something like this (only with 9000 columns and 66 rows)
Time <- c(0, 6.4, 8.6, 15.2, 19.4, 28.1, 42.6, 73, 73, 85, 88, 88, 88, 88, 88)
ID1 <- c(55030, 54539, 54937, 48897, 58160, 54686, 55393, 47191, 39805, 37601, 51328, 28882, 45587, 60061, 31892, 28670)
ID2 <- c(20485, 11907, 10571, 20974, 10462, 11149, 20970, NA, NA, 9295, NA, 8714, 24446, 10748, 9037, 11859)
ID3 <- c(93914, 44482, 43705, 51144, 49485, 43908, 44324, 37342, 18872, 39660,61673, 43837, 36528, 44738, 41648, 11100)
DF <- data.frame (Time, ID1, ID2, ID3)
I want to get a data frame that looks like this :
ID1, rho, p-value
ID2, rho, p-value
...
The rho and the p-value would be the results from a cor.test (spearman) with Time and each ID
Among other things I've tried this:
results <- data.frame(ID="", Estimate="", P.value="")
estimates = numeric(16)
pvalues = numeric(16)
for (i in 2:4)
test <- cor.test(DF[,1], DF[,i])
estimates[i] = test$estimate
pvalues[i] = test$p.value
And R gives me the following error:
Error: object 'test' not found
I've also tried:
result <- do.call(rbind,lapply(2:4, function(x)
cor.result<-cor.test(DF[,1],DF[,x])
pvalue <- cor.result$p.value
estimate <- cor.result$estimate
return(data.frame(pvalue = pvalue, estimate = estimate))
)
)
And R gives me a similar error
Error: object 'cor.result' not found
I'm sure it's an easy fix but I can't seem to figure it out. Any help is more than welcome.
This is what I got after running
dput(head(SmallDataset[,1:5]))
structure(list(Species = c("Human.hsapiens", "Chimpanzee.ptroglodytes",
"Gorilla.ggorilla", "Orangutan.pabelii", "Gibbon.nleucogenys",
"Macaque.mmulatta"), Time = c(0, 6.4, 8.61, 15.2, 19.43, 28.1
), ID1 = c(55030, 54539, 54937, 48897, 58160, 54686), ID2 = c(20485,
11907, 10571, 20974, 10462, 11149), ID3 = c(93914, 44482, 43705,
51144, 49485, 43908)), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
r
You are trying to calculate correlations between your ID variables in columns 2 to 4 of DF and Time? Is that correct?
– Cleland
Nov 14 '18 at 16:10
I am trying to correlate the first column with the rest, as in 1st and 2nd, 1st and 3rd, 1st and 4th
– Yaiza95
Nov 14 '18 at 16:14
add a comment |
I am fairly new to R. My data looks something like this (only with 9000 columns and 66 rows)
Time <- c(0, 6.4, 8.6, 15.2, 19.4, 28.1, 42.6, 73, 73, 85, 88, 88, 88, 88, 88)
ID1 <- c(55030, 54539, 54937, 48897, 58160, 54686, 55393, 47191, 39805, 37601, 51328, 28882, 45587, 60061, 31892, 28670)
ID2 <- c(20485, 11907, 10571, 20974, 10462, 11149, 20970, NA, NA, 9295, NA, 8714, 24446, 10748, 9037, 11859)
ID3 <- c(93914, 44482, 43705, 51144, 49485, 43908, 44324, 37342, 18872, 39660,61673, 43837, 36528, 44738, 41648, 11100)
DF <- data.frame (Time, ID1, ID2, ID3)
I want to get a data frame that looks like this :
ID1, rho, p-value
ID2, rho, p-value
...
The rho and the p-value would be the results from a cor.test (spearman) with Time and each ID
Among other things I've tried this:
results <- data.frame(ID="", Estimate="", P.value="")
estimates = numeric(16)
pvalues = numeric(16)
for (i in 2:4)
test <- cor.test(DF[,1], DF[,i])
estimates[i] = test$estimate
pvalues[i] = test$p.value
And R gives me the following error:
Error: object 'test' not found
I've also tried:
result <- do.call(rbind,lapply(2:4, function(x)
cor.result<-cor.test(DF[,1],DF[,x])
pvalue <- cor.result$p.value
estimate <- cor.result$estimate
return(data.frame(pvalue = pvalue, estimate = estimate))
)
)
And R gives me a similar error
Error: object 'cor.result' not found
I'm sure it's an easy fix but I can't seem to figure it out. Any help is more than welcome.
This is what I got after running
dput(head(SmallDataset[,1:5]))
structure(list(Species = c("Human.hsapiens", "Chimpanzee.ptroglodytes",
"Gorilla.ggorilla", "Orangutan.pabelii", "Gibbon.nleucogenys",
"Macaque.mmulatta"), Time = c(0, 6.4, 8.61, 15.2, 19.43, 28.1
), ID1 = c(55030, 54539, 54937, 48897, 58160, 54686), ID2 = c(20485,
11907, 10571, 20974, 10462, 11149), ID3 = c(93914, 44482, 43705,
51144, 49485, 43908)), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
r
I am fairly new to R. My data looks something like this (only with 9000 columns and 66 rows)
Time <- c(0, 6.4, 8.6, 15.2, 19.4, 28.1, 42.6, 73, 73, 85, 88, 88, 88, 88, 88)
ID1 <- c(55030, 54539, 54937, 48897, 58160, 54686, 55393, 47191, 39805, 37601, 51328, 28882, 45587, 60061, 31892, 28670)
ID2 <- c(20485, 11907, 10571, 20974, 10462, 11149, 20970, NA, NA, 9295, NA, 8714, 24446, 10748, 9037, 11859)
ID3 <- c(93914, 44482, 43705, 51144, 49485, 43908, 44324, 37342, 18872, 39660,61673, 43837, 36528, 44738, 41648, 11100)
DF <- data.frame (Time, ID1, ID2, ID3)
I want to get a data frame that looks like this :
ID1, rho, p-value
ID2, rho, p-value
...
The rho and the p-value would be the results from a cor.test (spearman) with Time and each ID
Among other things I've tried this:
results <- data.frame(ID="", Estimate="", P.value="")
estimates = numeric(16)
pvalues = numeric(16)
for (i in 2:4)
test <- cor.test(DF[,1], DF[,i])
estimates[i] = test$estimate
pvalues[i] = test$p.value
And R gives me the following error:
Error: object 'test' not found
I've also tried:
result <- do.call(rbind,lapply(2:4, function(x)
cor.result<-cor.test(DF[,1],DF[,x])
pvalue <- cor.result$p.value
estimate <- cor.result$estimate
return(data.frame(pvalue = pvalue, estimate = estimate))
)
)
And R gives me a similar error
Error: object 'cor.result' not found
I'm sure it's an easy fix but I can't seem to figure it out. Any help is more than welcome.
This is what I got after running
dput(head(SmallDataset[,1:5]))
structure(list(Species = c("Human.hsapiens", "Chimpanzee.ptroglodytes",
"Gorilla.ggorilla", "Orangutan.pabelii", "Gibbon.nleucogenys",
"Macaque.mmulatta"), Time = c(0, 6.4, 8.61, 15.2, 19.43, 28.1
), ID1 = c(55030, 54539, 54937, 48897, 58160, 54686), ID2 = c(20485,
11907, 10571, 20974, 10462, 11149), ID3 = c(93914, 44482, 43705,
51144, 49485, 43908)), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
r
r
edited Nov 14 '18 at 20:36
Yaiza95
asked Nov 14 '18 at 16:01
Yaiza95Yaiza95
134
134
You are trying to calculate correlations between your ID variables in columns 2 to 4 of DF and Time? Is that correct?
– Cleland
Nov 14 '18 at 16:10
I am trying to correlate the first column with the rest, as in 1st and 2nd, 1st and 3rd, 1st and 4th
– Yaiza95
Nov 14 '18 at 16:14
add a comment |
You are trying to calculate correlations between your ID variables in columns 2 to 4 of DF and Time? Is that correct?
– Cleland
Nov 14 '18 at 16:10
I am trying to correlate the first column with the rest, as in 1st and 2nd, 1st and 3rd, 1st and 4th
– Yaiza95
Nov 14 '18 at 16:14
You are trying to calculate correlations between your ID variables in columns 2 to 4 of DF and Time? Is that correct?
– Cleland
Nov 14 '18 at 16:10
You are trying to calculate correlations between your ID variables in columns 2 to 4 of DF and Time? Is that correct?
– Cleland
Nov 14 '18 at 16:10
I am trying to correlate the first column with the rest, as in 1st and 2nd, 1st and 3rd, 1st and 4th
– Yaiza95
Nov 14 '18 at 16:14
I am trying to correlate the first column with the rest, as in 1st and 2nd, 1st and 3rd, 1st and 4th
– Yaiza95
Nov 14 '18 at 16:14
add a comment |
2 Answers
2
active
oldest
votes
My solution involves defining a function within a lapply call
##
library(dplyr)
###Create dataframe
Time <- c(0, 6.4, 8.6, 15.2, 19.4, 28.1, 42.6, 73, 73, 85, 88, 88, 88, 88, 88, 89)
ID1 <- c(55030, 54539, 54937, 48897, 58160, 54686, 55393, 47191, 39805, 37601, 51328, 28882, 45587, 60061, 31892, 28670)
ID2 <- c(20485, 11907, 10571, 20974, 10462, 11149, 20970, NA, NA, 9295, NA, 8714, 24446, 10748, 9037, 11859)
ID3 <- c(93914, 44482, 43705, 51144, 49485, 43908, 44324, 37342, 18872, 39660,61673, 43837, 36528, 44738, 41648, 11100)
DF <- data.frame (Time, ID1, ID2, ID3)
##Run the correlations
l2 <- lapply(2:4, function(i)cor.test(DF$Time, DF[,i]))
##Define function to extract p_value and coefficients
l3 <- lapply(l2, function(i)
return(tibble(estimate = i$estimate,
p_value = i$p.value))
)
##Create a dataframe with information
l4 <- bind_rows(l3) %>% mutate(ID = paste0("ID", 1:3)) ##Data frame with info
l4
You could also use thetidy
function from thebroom
package to extract the estimates and p.values.sapply(2:4, function(i) cor.test(DF$Time, DF[,i]) %>% tidy() %>% select(estimate, p.value) ) %>% t() %>% as.data.frame() %>% mutate(ID = paste0("ID", 1:3))
– Jordo82
Nov 14 '18 at 16:24
Will edit to reflect that. Thanks for flagging @Parfait
– Harro Cyranka
Nov 14 '18 at 16:25
Thank you, it works on the small DF, but when I try to apply it to the larger one I get this error: 'x' and 'y' must have the same length , even though if I ask for the length of both elements it says it's the same length
– Yaiza95
Nov 14 '18 at 20:03
Where is it specifically breaking? When you run the correlations? When you extract the coefficients? Or when you create the last data frame
– Harro Cyranka
Nov 14 '18 at 20:09
It breaks when I run l2: l2 <- lapply(3:11, function(i)cor.test(SmallDataset$Time, SmallDataset[,i])) Traceback: Error in cor.test.default(SmallDataset$Time, SmallDataset[, i]) : 'x' and 'y' must have the same length 5. stop("'x' and 'y' must have the same length") 4. cor.test.default(SmallDataset$Time, SmallDataset[, i]) 3. cor.test(SmallDataset$Time, SmallDataset[, i]) 2. FUN(X[[i]], ...) 1. lapply(3:11, function(i) cor.test(SmallDataset$Time, SmallDataset[, i]))
– Yaiza95
Nov 14 '18 at 20:39
add a comment |
Consider building a list of data frames witih lapply
(an iteration function similar to for
but builds a list of objects of equal length as input). Afterwards, row bind all data frame elements together:
results <- lapply(2:4, function(i)
test <- cor.test(DF[,1], DF[,i])
data.frame(ID = names(DF)[i],
estimate = unname(test$estimate),
pvalues = unname(test$p.value))
)
final_df <- do.call(rbind, results)
final_df
# ID estimate pvalues
# 1 ID1 -0.6238591 0.009805341
# 2 ID2 -0.2270515 0.455676037
# 3 ID3 -0.4964092 0.050481533
NOTE: Your posted data for Time is missing an observation and cannot immediately be cast into data.frame()
with other vectors. To resolve, I supplemented a 6th 88 at end:
Time <- c(0, 6.4, 8.6, 15.2, 19.4, 28.1, 42.6, 73, 73, 85, 88, 88, 88, 88, 88, 88)
Using posted SmallDataset:
SmallDataset <- structure(...)
results <- lapply(3:5, function(i)
test <- cor.test(SmallDataset$Time, SmallDataset[,i])
data.frame(ID = names(SmallDataset)[i],
estimate = unname(test$estimate),
pvalues = unname(test$p.value))
)
final_df <- do.call(rbind, results)
final_df
# ID estimate pvalues
# 1 ID1 0.03251407 0.9512461
# 2 ID2 -0.41733336 0.4103428
# 3 ID3 -0.60732484 0.2010166
Thank you, but when I try it on the larger dataframe I get this : Error in cor.test.default(SmallDataset[, 2], SmallDataset[, i]) : 'x' must be a numeric vector. Even though all vectors are numeric
– Yaiza95
Nov 14 '18 at 20:04
Please edit your post with a sample of SmallDataset in your post (first few rows and cols):dput(head(SmallDataset[,1:5]))
. It will look like gobbledygook but we know how to use it. We can help format in your post as well.
– Parfait
Nov 14 '18 at 20:20
Done, I edited the original post
– Yaiza95
Nov 14 '18 at 20:37
I am unable to reproduce any issue with the small sample. See update. Did you properly replace allDF
withSmallDataset
? Be sure names and column numbers are correct.
– Parfait
Nov 14 '18 at 20:44
So DF I made manually from the SmallDataset. So maybe there the type of data changes. SmallDataset is a 66 lines and 11 column frame. I triple checked all names and columns and I still get the same error
– Yaiza95
Nov 14 '18 at 20:51
|
show 7 more comments
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53304266%2fgetting-estimate-and-p-value-into-dataframe%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
My solution involves defining a function within a lapply call
##
library(dplyr)
###Create dataframe
Time <- c(0, 6.4, 8.6, 15.2, 19.4, 28.1, 42.6, 73, 73, 85, 88, 88, 88, 88, 88, 89)
ID1 <- c(55030, 54539, 54937, 48897, 58160, 54686, 55393, 47191, 39805, 37601, 51328, 28882, 45587, 60061, 31892, 28670)
ID2 <- c(20485, 11907, 10571, 20974, 10462, 11149, 20970, NA, NA, 9295, NA, 8714, 24446, 10748, 9037, 11859)
ID3 <- c(93914, 44482, 43705, 51144, 49485, 43908, 44324, 37342, 18872, 39660,61673, 43837, 36528, 44738, 41648, 11100)
DF <- data.frame (Time, ID1, ID2, ID3)
##Run the correlations
l2 <- lapply(2:4, function(i)cor.test(DF$Time, DF[,i]))
##Define function to extract p_value and coefficients
l3 <- lapply(l2, function(i)
return(tibble(estimate = i$estimate,
p_value = i$p.value))
)
##Create a dataframe with information
l4 <- bind_rows(l3) %>% mutate(ID = paste0("ID", 1:3)) ##Data frame with info
l4
You could also use thetidy
function from thebroom
package to extract the estimates and p.values.sapply(2:4, function(i) cor.test(DF$Time, DF[,i]) %>% tidy() %>% select(estimate, p.value) ) %>% t() %>% as.data.frame() %>% mutate(ID = paste0("ID", 1:3))
– Jordo82
Nov 14 '18 at 16:24
Will edit to reflect that. Thanks for flagging @Parfait
– Harro Cyranka
Nov 14 '18 at 16:25
Thank you, it works on the small DF, but when I try to apply it to the larger one I get this error: 'x' and 'y' must have the same length , even though if I ask for the length of both elements it says it's the same length
– Yaiza95
Nov 14 '18 at 20:03
Where is it specifically breaking? When you run the correlations? When you extract the coefficients? Or when you create the last data frame
– Harro Cyranka
Nov 14 '18 at 20:09
It breaks when I run l2: l2 <- lapply(3:11, function(i)cor.test(SmallDataset$Time, SmallDataset[,i])) Traceback: Error in cor.test.default(SmallDataset$Time, SmallDataset[, i]) : 'x' and 'y' must have the same length 5. stop("'x' and 'y' must have the same length") 4. cor.test.default(SmallDataset$Time, SmallDataset[, i]) 3. cor.test(SmallDataset$Time, SmallDataset[, i]) 2. FUN(X[[i]], ...) 1. lapply(3:11, function(i) cor.test(SmallDataset$Time, SmallDataset[, i]))
– Yaiza95
Nov 14 '18 at 20:39
add a comment |
My solution involves defining a function within a lapply call
##
library(dplyr)
###Create dataframe
Time <- c(0, 6.4, 8.6, 15.2, 19.4, 28.1, 42.6, 73, 73, 85, 88, 88, 88, 88, 88, 89)
ID1 <- c(55030, 54539, 54937, 48897, 58160, 54686, 55393, 47191, 39805, 37601, 51328, 28882, 45587, 60061, 31892, 28670)
ID2 <- c(20485, 11907, 10571, 20974, 10462, 11149, 20970, NA, NA, 9295, NA, 8714, 24446, 10748, 9037, 11859)
ID3 <- c(93914, 44482, 43705, 51144, 49485, 43908, 44324, 37342, 18872, 39660,61673, 43837, 36528, 44738, 41648, 11100)
DF <- data.frame (Time, ID1, ID2, ID3)
##Run the correlations
l2 <- lapply(2:4, function(i)cor.test(DF$Time, DF[,i]))
##Define function to extract p_value and coefficients
l3 <- lapply(l2, function(i)
return(tibble(estimate = i$estimate,
p_value = i$p.value))
)
##Create a dataframe with information
l4 <- bind_rows(l3) %>% mutate(ID = paste0("ID", 1:3)) ##Data frame with info
l4
You could also use thetidy
function from thebroom
package to extract the estimates and p.values.sapply(2:4, function(i) cor.test(DF$Time, DF[,i]) %>% tidy() %>% select(estimate, p.value) ) %>% t() %>% as.data.frame() %>% mutate(ID = paste0("ID", 1:3))
– Jordo82
Nov 14 '18 at 16:24
Will edit to reflect that. Thanks for flagging @Parfait
– Harro Cyranka
Nov 14 '18 at 16:25
Thank you, it works on the small DF, but when I try to apply it to the larger one I get this error: 'x' and 'y' must have the same length , even though if I ask for the length of both elements it says it's the same length
– Yaiza95
Nov 14 '18 at 20:03
Where is it specifically breaking? When you run the correlations? When you extract the coefficients? Or when you create the last data frame
– Harro Cyranka
Nov 14 '18 at 20:09
It breaks when I run l2: l2 <- lapply(3:11, function(i)cor.test(SmallDataset$Time, SmallDataset[,i])) Traceback: Error in cor.test.default(SmallDataset$Time, SmallDataset[, i]) : 'x' and 'y' must have the same length 5. stop("'x' and 'y' must have the same length") 4. cor.test.default(SmallDataset$Time, SmallDataset[, i]) 3. cor.test(SmallDataset$Time, SmallDataset[, i]) 2. FUN(X[[i]], ...) 1. lapply(3:11, function(i) cor.test(SmallDataset$Time, SmallDataset[, i]))
– Yaiza95
Nov 14 '18 at 20:39
add a comment |
My solution involves defining a function within a lapply call
##
library(dplyr)
###Create dataframe
Time <- c(0, 6.4, 8.6, 15.2, 19.4, 28.1, 42.6, 73, 73, 85, 88, 88, 88, 88, 88, 89)
ID1 <- c(55030, 54539, 54937, 48897, 58160, 54686, 55393, 47191, 39805, 37601, 51328, 28882, 45587, 60061, 31892, 28670)
ID2 <- c(20485, 11907, 10571, 20974, 10462, 11149, 20970, NA, NA, 9295, NA, 8714, 24446, 10748, 9037, 11859)
ID3 <- c(93914, 44482, 43705, 51144, 49485, 43908, 44324, 37342, 18872, 39660,61673, 43837, 36528, 44738, 41648, 11100)
DF <- data.frame (Time, ID1, ID2, ID3)
##Run the correlations
l2 <- lapply(2:4, function(i)cor.test(DF$Time, DF[,i]))
##Define function to extract p_value and coefficients
l3 <- lapply(l2, function(i)
return(tibble(estimate = i$estimate,
p_value = i$p.value))
)
##Create a dataframe with information
l4 <- bind_rows(l3) %>% mutate(ID = paste0("ID", 1:3)) ##Data frame with info
l4
My solution involves defining a function within a lapply call
##
library(dplyr)
###Create dataframe
Time <- c(0, 6.4, 8.6, 15.2, 19.4, 28.1, 42.6, 73, 73, 85, 88, 88, 88, 88, 88, 89)
ID1 <- c(55030, 54539, 54937, 48897, 58160, 54686, 55393, 47191, 39805, 37601, 51328, 28882, 45587, 60061, 31892, 28670)
ID2 <- c(20485, 11907, 10571, 20974, 10462, 11149, 20970, NA, NA, 9295, NA, 8714, 24446, 10748, 9037, 11859)
ID3 <- c(93914, 44482, 43705, 51144, 49485, 43908, 44324, 37342, 18872, 39660,61673, 43837, 36528, 44738, 41648, 11100)
DF <- data.frame (Time, ID1, ID2, ID3)
##Run the correlations
l2 <- lapply(2:4, function(i)cor.test(DF$Time, DF[,i]))
##Define function to extract p_value and coefficients
l3 <- lapply(l2, function(i)
return(tibble(estimate = i$estimate,
p_value = i$p.value))
)
##Create a dataframe with information
l4 <- bind_rows(l3) %>% mutate(ID = paste0("ID", 1:3)) ##Data frame with info
l4
edited Nov 14 '18 at 16:26
answered Nov 14 '18 at 16:12
Harro CyrankaHarro Cyranka
1,6181615
1,6181615
You could also use thetidy
function from thebroom
package to extract the estimates and p.values.sapply(2:4, function(i) cor.test(DF$Time, DF[,i]) %>% tidy() %>% select(estimate, p.value) ) %>% t() %>% as.data.frame() %>% mutate(ID = paste0("ID", 1:3))
– Jordo82
Nov 14 '18 at 16:24
Will edit to reflect that. Thanks for flagging @Parfait
– Harro Cyranka
Nov 14 '18 at 16:25
Thank you, it works on the small DF, but when I try to apply it to the larger one I get this error: 'x' and 'y' must have the same length , even though if I ask for the length of both elements it says it's the same length
– Yaiza95
Nov 14 '18 at 20:03
Where is it specifically breaking? When you run the correlations? When you extract the coefficients? Or when you create the last data frame
– Harro Cyranka
Nov 14 '18 at 20:09
It breaks when I run l2: l2 <- lapply(3:11, function(i)cor.test(SmallDataset$Time, SmallDataset[,i])) Traceback: Error in cor.test.default(SmallDataset$Time, SmallDataset[, i]) : 'x' and 'y' must have the same length 5. stop("'x' and 'y' must have the same length") 4. cor.test.default(SmallDataset$Time, SmallDataset[, i]) 3. cor.test(SmallDataset$Time, SmallDataset[, i]) 2. FUN(X[[i]], ...) 1. lapply(3:11, function(i) cor.test(SmallDataset$Time, SmallDataset[, i]))
– Yaiza95
Nov 14 '18 at 20:39
add a comment |
You could also use thetidy
function from thebroom
package to extract the estimates and p.values.sapply(2:4, function(i) cor.test(DF$Time, DF[,i]) %>% tidy() %>% select(estimate, p.value) ) %>% t() %>% as.data.frame() %>% mutate(ID = paste0("ID", 1:3))
– Jordo82
Nov 14 '18 at 16:24
Will edit to reflect that. Thanks for flagging @Parfait
– Harro Cyranka
Nov 14 '18 at 16:25
Thank you, it works on the small DF, but when I try to apply it to the larger one I get this error: 'x' and 'y' must have the same length , even though if I ask for the length of both elements it says it's the same length
– Yaiza95
Nov 14 '18 at 20:03
Where is it specifically breaking? When you run the correlations? When you extract the coefficients? Or when you create the last data frame
– Harro Cyranka
Nov 14 '18 at 20:09
It breaks when I run l2: l2 <- lapply(3:11, function(i)cor.test(SmallDataset$Time, SmallDataset[,i])) Traceback: Error in cor.test.default(SmallDataset$Time, SmallDataset[, i]) : 'x' and 'y' must have the same length 5. stop("'x' and 'y' must have the same length") 4. cor.test.default(SmallDataset$Time, SmallDataset[, i]) 3. cor.test(SmallDataset$Time, SmallDataset[, i]) 2. FUN(X[[i]], ...) 1. lapply(3:11, function(i) cor.test(SmallDataset$Time, SmallDataset[, i]))
– Yaiza95
Nov 14 '18 at 20:39
You could also use the
tidy
function from the broom
package to extract the estimates and p.values. sapply(2:4, function(i) cor.test(DF$Time, DF[,i]) %>% tidy() %>% select(estimate, p.value) ) %>% t() %>% as.data.frame() %>% mutate(ID = paste0("ID", 1:3))
– Jordo82
Nov 14 '18 at 16:24
You could also use the
tidy
function from the broom
package to extract the estimates and p.values. sapply(2:4, function(i) cor.test(DF$Time, DF[,i]) %>% tidy() %>% select(estimate, p.value) ) %>% t() %>% as.data.frame() %>% mutate(ID = paste0("ID", 1:3))
– Jordo82
Nov 14 '18 at 16:24
Will edit to reflect that. Thanks for flagging @Parfait
– Harro Cyranka
Nov 14 '18 at 16:25
Will edit to reflect that. Thanks for flagging @Parfait
– Harro Cyranka
Nov 14 '18 at 16:25
Thank you, it works on the small DF, but when I try to apply it to the larger one I get this error: 'x' and 'y' must have the same length , even though if I ask for the length of both elements it says it's the same length
– Yaiza95
Nov 14 '18 at 20:03
Thank you, it works on the small DF, but when I try to apply it to the larger one I get this error: 'x' and 'y' must have the same length , even though if I ask for the length of both elements it says it's the same length
– Yaiza95
Nov 14 '18 at 20:03
Where is it specifically breaking? When you run the correlations? When you extract the coefficients? Or when you create the last data frame
– Harro Cyranka
Nov 14 '18 at 20:09
Where is it specifically breaking? When you run the correlations? When you extract the coefficients? Or when you create the last data frame
– Harro Cyranka
Nov 14 '18 at 20:09
It breaks when I run l2: l2 <- lapply(3:11, function(i)cor.test(SmallDataset$Time, SmallDataset[,i])) Traceback: Error in cor.test.default(SmallDataset$Time, SmallDataset[, i]) : 'x' and 'y' must have the same length 5. stop("'x' and 'y' must have the same length") 4. cor.test.default(SmallDataset$Time, SmallDataset[, i]) 3. cor.test(SmallDataset$Time, SmallDataset[, i]) 2. FUN(X[[i]], ...) 1. lapply(3:11, function(i) cor.test(SmallDataset$Time, SmallDataset[, i]))
– Yaiza95
Nov 14 '18 at 20:39
It breaks when I run l2: l2 <- lapply(3:11, function(i)cor.test(SmallDataset$Time, SmallDataset[,i])) Traceback: Error in cor.test.default(SmallDataset$Time, SmallDataset[, i]) : 'x' and 'y' must have the same length 5. stop("'x' and 'y' must have the same length") 4. cor.test.default(SmallDataset$Time, SmallDataset[, i]) 3. cor.test(SmallDataset$Time, SmallDataset[, i]) 2. FUN(X[[i]], ...) 1. lapply(3:11, function(i) cor.test(SmallDataset$Time, SmallDataset[, i]))
– Yaiza95
Nov 14 '18 at 20:39
add a comment |
Consider building a list of data frames witih lapply
(an iteration function similar to for
but builds a list of objects of equal length as input). Afterwards, row bind all data frame elements together:
results <- lapply(2:4, function(i)
test <- cor.test(DF[,1], DF[,i])
data.frame(ID = names(DF)[i],
estimate = unname(test$estimate),
pvalues = unname(test$p.value))
)
final_df <- do.call(rbind, results)
final_df
# ID estimate pvalues
# 1 ID1 -0.6238591 0.009805341
# 2 ID2 -0.2270515 0.455676037
# 3 ID3 -0.4964092 0.050481533
NOTE: Your posted data for Time is missing an observation and cannot immediately be cast into data.frame()
with other vectors. To resolve, I supplemented a 6th 88 at end:
Time <- c(0, 6.4, 8.6, 15.2, 19.4, 28.1, 42.6, 73, 73, 85, 88, 88, 88, 88, 88, 88)
Using posted SmallDataset:
SmallDataset <- structure(...)
results <- lapply(3:5, function(i)
test <- cor.test(SmallDataset$Time, SmallDataset[,i])
data.frame(ID = names(SmallDataset)[i],
estimate = unname(test$estimate),
pvalues = unname(test$p.value))
)
final_df <- do.call(rbind, results)
final_df
# ID estimate pvalues
# 1 ID1 0.03251407 0.9512461
# 2 ID2 -0.41733336 0.4103428
# 3 ID3 -0.60732484 0.2010166
Thank you, but when I try it on the larger dataframe I get this : Error in cor.test.default(SmallDataset[, 2], SmallDataset[, i]) : 'x' must be a numeric vector. Even though all vectors are numeric
– Yaiza95
Nov 14 '18 at 20:04
Please edit your post with a sample of SmallDataset in your post (first few rows and cols):dput(head(SmallDataset[,1:5]))
. It will look like gobbledygook but we know how to use it. We can help format in your post as well.
– Parfait
Nov 14 '18 at 20:20
Done, I edited the original post
– Yaiza95
Nov 14 '18 at 20:37
I am unable to reproduce any issue with the small sample. See update. Did you properly replace allDF
withSmallDataset
? Be sure names and column numbers are correct.
– Parfait
Nov 14 '18 at 20:44
So DF I made manually from the SmallDataset. So maybe there the type of data changes. SmallDataset is a 66 lines and 11 column frame. I triple checked all names and columns and I still get the same error
– Yaiza95
Nov 14 '18 at 20:51
|
show 7 more comments
Consider building a list of data frames witih lapply
(an iteration function similar to for
but builds a list of objects of equal length as input). Afterwards, row bind all data frame elements together:
results <- lapply(2:4, function(i)
test <- cor.test(DF[,1], DF[,i])
data.frame(ID = names(DF)[i],
estimate = unname(test$estimate),
pvalues = unname(test$p.value))
)
final_df <- do.call(rbind, results)
final_df
# ID estimate pvalues
# 1 ID1 -0.6238591 0.009805341
# 2 ID2 -0.2270515 0.455676037
# 3 ID3 -0.4964092 0.050481533
NOTE: Your posted data for Time is missing an observation and cannot immediately be cast into data.frame()
with other vectors. To resolve, I supplemented a 6th 88 at end:
Time <- c(0, 6.4, 8.6, 15.2, 19.4, 28.1, 42.6, 73, 73, 85, 88, 88, 88, 88, 88, 88)
Using posted SmallDataset:
SmallDataset <- structure(...)
results <- lapply(3:5, function(i)
test <- cor.test(SmallDataset$Time, SmallDataset[,i])
data.frame(ID = names(SmallDataset)[i],
estimate = unname(test$estimate),
pvalues = unname(test$p.value))
)
final_df <- do.call(rbind, results)
final_df
# ID estimate pvalues
# 1 ID1 0.03251407 0.9512461
# 2 ID2 -0.41733336 0.4103428
# 3 ID3 -0.60732484 0.2010166
Thank you, but when I try it on the larger dataframe I get this : Error in cor.test.default(SmallDataset[, 2], SmallDataset[, i]) : 'x' must be a numeric vector. Even though all vectors are numeric
– Yaiza95
Nov 14 '18 at 20:04
Please edit your post with a sample of SmallDataset in your post (first few rows and cols):dput(head(SmallDataset[,1:5]))
. It will look like gobbledygook but we know how to use it. We can help format in your post as well.
– Parfait
Nov 14 '18 at 20:20
Done, I edited the original post
– Yaiza95
Nov 14 '18 at 20:37
I am unable to reproduce any issue with the small sample. See update. Did you properly replace allDF
withSmallDataset
? Be sure names and column numbers are correct.
– Parfait
Nov 14 '18 at 20:44
So DF I made manually from the SmallDataset. So maybe there the type of data changes. SmallDataset is a 66 lines and 11 column frame. I triple checked all names and columns and I still get the same error
– Yaiza95
Nov 14 '18 at 20:51
|
show 7 more comments
Consider building a list of data frames witih lapply
(an iteration function similar to for
but builds a list of objects of equal length as input). Afterwards, row bind all data frame elements together:
results <- lapply(2:4, function(i)
test <- cor.test(DF[,1], DF[,i])
data.frame(ID = names(DF)[i],
estimate = unname(test$estimate),
pvalues = unname(test$p.value))
)
final_df <- do.call(rbind, results)
final_df
# ID estimate pvalues
# 1 ID1 -0.6238591 0.009805341
# 2 ID2 -0.2270515 0.455676037
# 3 ID3 -0.4964092 0.050481533
NOTE: Your posted data for Time is missing an observation and cannot immediately be cast into data.frame()
with other vectors. To resolve, I supplemented a 6th 88 at end:
Time <- c(0, 6.4, 8.6, 15.2, 19.4, 28.1, 42.6, 73, 73, 85, 88, 88, 88, 88, 88, 88)
Using posted SmallDataset:
SmallDataset <- structure(...)
results <- lapply(3:5, function(i)
test <- cor.test(SmallDataset$Time, SmallDataset[,i])
data.frame(ID = names(SmallDataset)[i],
estimate = unname(test$estimate),
pvalues = unname(test$p.value))
)
final_df <- do.call(rbind, results)
final_df
# ID estimate pvalues
# 1 ID1 0.03251407 0.9512461
# 2 ID2 -0.41733336 0.4103428
# 3 ID3 -0.60732484 0.2010166
Consider building a list of data frames witih lapply
(an iteration function similar to for
but builds a list of objects of equal length as input). Afterwards, row bind all data frame elements together:
results <- lapply(2:4, function(i)
test <- cor.test(DF[,1], DF[,i])
data.frame(ID = names(DF)[i],
estimate = unname(test$estimate),
pvalues = unname(test$p.value))
)
final_df <- do.call(rbind, results)
final_df
# ID estimate pvalues
# 1 ID1 -0.6238591 0.009805341
# 2 ID2 -0.2270515 0.455676037
# 3 ID3 -0.4964092 0.050481533
NOTE: Your posted data for Time is missing an observation and cannot immediately be cast into data.frame()
with other vectors. To resolve, I supplemented a 6th 88 at end:
Time <- c(0, 6.4, 8.6, 15.2, 19.4, 28.1, 42.6, 73, 73, 85, 88, 88, 88, 88, 88, 88)
Using posted SmallDataset:
SmallDataset <- structure(...)
results <- lapply(3:5, function(i)
test <- cor.test(SmallDataset$Time, SmallDataset[,i])
data.frame(ID = names(SmallDataset)[i],
estimate = unname(test$estimate),
pvalues = unname(test$p.value))
)
final_df <- do.call(rbind, results)
final_df
# ID estimate pvalues
# 1 ID1 0.03251407 0.9512461
# 2 ID2 -0.41733336 0.4103428
# 3 ID3 -0.60732484 0.2010166
edited Nov 14 '18 at 20:42
answered Nov 14 '18 at 16:17
ParfaitParfait
52.7k94471
52.7k94471
Thank you, but when I try it on the larger dataframe I get this : Error in cor.test.default(SmallDataset[, 2], SmallDataset[, i]) : 'x' must be a numeric vector. Even though all vectors are numeric
– Yaiza95
Nov 14 '18 at 20:04
Please edit your post with a sample of SmallDataset in your post (first few rows and cols):dput(head(SmallDataset[,1:5]))
. It will look like gobbledygook but we know how to use it. We can help format in your post as well.
– Parfait
Nov 14 '18 at 20:20
Done, I edited the original post
– Yaiza95
Nov 14 '18 at 20:37
I am unable to reproduce any issue with the small sample. See update. Did you properly replace allDF
withSmallDataset
? Be sure names and column numbers are correct.
– Parfait
Nov 14 '18 at 20:44
So DF I made manually from the SmallDataset. So maybe there the type of data changes. SmallDataset is a 66 lines and 11 column frame. I triple checked all names and columns and I still get the same error
– Yaiza95
Nov 14 '18 at 20:51
|
show 7 more comments
Thank you, but when I try it on the larger dataframe I get this : Error in cor.test.default(SmallDataset[, 2], SmallDataset[, i]) : 'x' must be a numeric vector. Even though all vectors are numeric
– Yaiza95
Nov 14 '18 at 20:04
Please edit your post with a sample of SmallDataset in your post (first few rows and cols):dput(head(SmallDataset[,1:5]))
. It will look like gobbledygook but we know how to use it. We can help format in your post as well.
– Parfait
Nov 14 '18 at 20:20
Done, I edited the original post
– Yaiza95
Nov 14 '18 at 20:37
I am unable to reproduce any issue with the small sample. See update. Did you properly replace allDF
withSmallDataset
? Be sure names and column numbers are correct.
– Parfait
Nov 14 '18 at 20:44
So DF I made manually from the SmallDataset. So maybe there the type of data changes. SmallDataset is a 66 lines and 11 column frame. I triple checked all names and columns and I still get the same error
– Yaiza95
Nov 14 '18 at 20:51
Thank you, but when I try it on the larger dataframe I get this : Error in cor.test.default(SmallDataset[, 2], SmallDataset[, i]) : 'x' must be a numeric vector. Even though all vectors are numeric
– Yaiza95
Nov 14 '18 at 20:04
Thank you, but when I try it on the larger dataframe I get this : Error in cor.test.default(SmallDataset[, 2], SmallDataset[, i]) : 'x' must be a numeric vector. Even though all vectors are numeric
– Yaiza95
Nov 14 '18 at 20:04
Please edit your post with a sample of SmallDataset in your post (first few rows and cols):
dput(head(SmallDataset[,1:5]))
. It will look like gobbledygook but we know how to use it. We can help format in your post as well.– Parfait
Nov 14 '18 at 20:20
Please edit your post with a sample of SmallDataset in your post (first few rows and cols):
dput(head(SmallDataset[,1:5]))
. It will look like gobbledygook but we know how to use it. We can help format in your post as well.– Parfait
Nov 14 '18 at 20:20
Done, I edited the original post
– Yaiza95
Nov 14 '18 at 20:37
Done, I edited the original post
– Yaiza95
Nov 14 '18 at 20:37
I am unable to reproduce any issue with the small sample. See update. Did you properly replace all
DF
with SmallDataset
? Be sure names and column numbers are correct.– Parfait
Nov 14 '18 at 20:44
I am unable to reproduce any issue with the small sample. See update. Did you properly replace all
DF
with SmallDataset
? Be sure names and column numbers are correct.– Parfait
Nov 14 '18 at 20:44
So DF I made manually from the SmallDataset. So maybe there the type of data changes. SmallDataset is a 66 lines and 11 column frame. I triple checked all names and columns and I still get the same error
– Yaiza95
Nov 14 '18 at 20:51
So DF I made manually from the SmallDataset. So maybe there the type of data changes. SmallDataset is a 66 lines and 11 column frame. I triple checked all names and columns and I still get the same error
– Yaiza95
Nov 14 '18 at 20:51
|
show 7 more comments
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53304266%2fgetting-estimate-and-p-value-into-dataframe%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
You are trying to calculate correlations between your ID variables in columns 2 to 4 of DF and Time? Is that correct?
– Cleland
Nov 14 '18 at 16:10
I am trying to correlate the first column with the rest, as in 1st and 2nd, 1st and 3rd, 1st and 4th
– Yaiza95
Nov 14 '18 at 16:14