Read first 1000 lines from very big JSON Lines file (R)
I'm working with a big JSON Lines file (80GB) and want to stream in the first 1000 lines to check the data structure. What worked with a sample file of 100 rows of this dataset is to use the stream_in function of jsonlite.
I've tried the following code for the 80GB file, but it doesn't work:
data <- stream_in(con=file("logfiles-data")[1:1000])
How can I fix this?
Thanks in advance!
r json bigdata logfile
|
show 1 more comment
I'm working with a big JSON Lines file (80GB) and want to stream in the first 1000 lines to check the data structure. What worked with a sample file of 100 rows of this dataset is to use the stream_in function of jsonlite.
I've tried the following code for the 80GB file, but it doesn't work:
data <- stream_in(con=file("logfiles-data")[1:1000])
How can I fix this?
Thanks in advance!
r json bigdata logfile
Just a side note: Are you sure it's JSON? From the description, in particular the logfile and your use of the word "rows," it sounds more likely it's JSON Lines.
– T.J. Crowder
Nov 13 '18 at 9:08
You're right, it seems to be JSON LINES. Thank you.
– Scijens
Nov 13 '18 at 9:19
Have you tried setting thepagesize
(number of lines to read/write from/to the connection per iteration) parameter tostream_in
?
– R. Schifini
Nov 13 '18 at 14:04
I'd usereadLines(n=1000)
then usetextConnection()
to pass that tostream_in()
– Michael Bird
Nov 13 '18 at 14:05
Thank you all! @Schifini: So will it automatically make only one iteration when I run it once?
– Scijens
Nov 14 '18 at 13:30
|
show 1 more comment
I'm working with a big JSON Lines file (80GB) and want to stream in the first 1000 lines to check the data structure. What worked with a sample file of 100 rows of this dataset is to use the stream_in function of jsonlite.
I've tried the following code for the 80GB file, but it doesn't work:
data <- stream_in(con=file("logfiles-data")[1:1000])
How can I fix this?
Thanks in advance!
r json bigdata logfile
I'm working with a big JSON Lines file (80GB) and want to stream in the first 1000 lines to check the data structure. What worked with a sample file of 100 rows of this dataset is to use the stream_in function of jsonlite.
I've tried the following code for the 80GB file, but it doesn't work:
data <- stream_in(con=file("logfiles-data")[1:1000])
How can I fix this?
Thanks in advance!
r json bigdata logfile
r json bigdata logfile
edited Nov 13 '18 at 13:51
JJJ
29.1k147591
29.1k147591
asked Nov 13 '18 at 9:06
ScijensScijens
456
456
Just a side note: Are you sure it's JSON? From the description, in particular the logfile and your use of the word "rows," it sounds more likely it's JSON Lines.
– T.J. Crowder
Nov 13 '18 at 9:08
You're right, it seems to be JSON LINES. Thank you.
– Scijens
Nov 13 '18 at 9:19
Have you tried setting thepagesize
(number of lines to read/write from/to the connection per iteration) parameter tostream_in
?
– R. Schifini
Nov 13 '18 at 14:04
I'd usereadLines(n=1000)
then usetextConnection()
to pass that tostream_in()
– Michael Bird
Nov 13 '18 at 14:05
Thank you all! @Schifini: So will it automatically make only one iteration when I run it once?
– Scijens
Nov 14 '18 at 13:30
|
show 1 more comment
Just a side note: Are you sure it's JSON? From the description, in particular the logfile and your use of the word "rows," it sounds more likely it's JSON Lines.
– T.J. Crowder
Nov 13 '18 at 9:08
You're right, it seems to be JSON LINES. Thank you.
– Scijens
Nov 13 '18 at 9:19
Have you tried setting thepagesize
(number of lines to read/write from/to the connection per iteration) parameter tostream_in
?
– R. Schifini
Nov 13 '18 at 14:04
I'd usereadLines(n=1000)
then usetextConnection()
to pass that tostream_in()
– Michael Bird
Nov 13 '18 at 14:05
Thank you all! @Schifini: So will it automatically make only one iteration when I run it once?
– Scijens
Nov 14 '18 at 13:30
Just a side note: Are you sure it's JSON? From the description, in particular the logfile and your use of the word "rows," it sounds more likely it's JSON Lines.
– T.J. Crowder
Nov 13 '18 at 9:08
Just a side note: Are you sure it's JSON? From the description, in particular the logfile and your use of the word "rows," it sounds more likely it's JSON Lines.
– T.J. Crowder
Nov 13 '18 at 9:08
You're right, it seems to be JSON LINES. Thank you.
– Scijens
Nov 13 '18 at 9:19
You're right, it seems to be JSON LINES. Thank you.
– Scijens
Nov 13 '18 at 9:19
Have you tried setting the
pagesize
(number of lines to read/write from/to the connection per iteration) parameter to stream_in
?– R. Schifini
Nov 13 '18 at 14:04
Have you tried setting the
pagesize
(number of lines to read/write from/to the connection per iteration) parameter to stream_in
?– R. Schifini
Nov 13 '18 at 14:04
I'd use
readLines(n=1000)
then use textConnection()
to pass that to stream_in()
– Michael Bird
Nov 13 '18 at 14:05
I'd use
readLines(n=1000)
then use textConnection()
to pass that to stream_in()
– Michael Bird
Nov 13 '18 at 14:05
Thank you all! @Schifini: So will it automatically make only one iteration when I run it once?
– Scijens
Nov 14 '18 at 13:30
Thank you all! @Schifini: So will it automatically make only one iteration when I run it once?
– Scijens
Nov 14 '18 at 13:30
|
show 1 more comment
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53277351%2fread-first-1000-lines-from-very-big-json-lines-file-r%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53277351%2fread-first-1000-lines-from-very-big-json-lines-file-r%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Just a side note: Are you sure it's JSON? From the description, in particular the logfile and your use of the word "rows," it sounds more likely it's JSON Lines.
– T.J. Crowder
Nov 13 '18 at 9:08
You're right, it seems to be JSON LINES. Thank you.
– Scijens
Nov 13 '18 at 9:19
Have you tried setting the
pagesize
(number of lines to read/write from/to the connection per iteration) parameter tostream_in
?– R. Schifini
Nov 13 '18 at 14:04
I'd use
readLines(n=1000)
then usetextConnection()
to pass that tostream_in()
– Michael Bird
Nov 13 '18 at 14:05
Thank you all! @Schifini: So will it automatically make only one iteration when I run it once?
– Scijens
Nov 14 '18 at 13:30