Read first 1000 lines from very big JSON Lines file (R)










0















I'm working with a big JSON Lines file (80GB) and want to stream in the first 1000 lines to check the data structure. What worked with a sample file of 100 rows of this dataset is to use the stream_in function of jsonlite.



I've tried the following code for the 80GB file, but it doesn't work:



data <- stream_in(con=file("logfiles-data")[1:1000])


How can I fix this?



Thanks in advance!










share|improve this question
























  • Just a side note: Are you sure it's JSON? From the description, in particular the logfile and your use of the word "rows," it sounds more likely it's JSON Lines.

    – T.J. Crowder
    Nov 13 '18 at 9:08












  • You're right, it seems to be JSON LINES. Thank you.

    – Scijens
    Nov 13 '18 at 9:19











  • Have you tried setting the pagesize (number of lines to read/write from/to the connection per iteration) parameter to stream_in?

    – R. Schifini
    Nov 13 '18 at 14:04












  • I'd use readLines(n=1000) then use textConnection() to pass that to stream_in()

    – Michael Bird
    Nov 13 '18 at 14:05











  • Thank you all! @Schifini: So will it automatically make only one iteration when I run it once?

    – Scijens
    Nov 14 '18 at 13:30















0















I'm working with a big JSON Lines file (80GB) and want to stream in the first 1000 lines to check the data structure. What worked with a sample file of 100 rows of this dataset is to use the stream_in function of jsonlite.



I've tried the following code for the 80GB file, but it doesn't work:



data <- stream_in(con=file("logfiles-data")[1:1000])


How can I fix this?



Thanks in advance!










share|improve this question
























  • Just a side note: Are you sure it's JSON? From the description, in particular the logfile and your use of the word "rows," it sounds more likely it's JSON Lines.

    – T.J. Crowder
    Nov 13 '18 at 9:08












  • You're right, it seems to be JSON LINES. Thank you.

    – Scijens
    Nov 13 '18 at 9:19











  • Have you tried setting the pagesize (number of lines to read/write from/to the connection per iteration) parameter to stream_in?

    – R. Schifini
    Nov 13 '18 at 14:04












  • I'd use readLines(n=1000) then use textConnection() to pass that to stream_in()

    – Michael Bird
    Nov 13 '18 at 14:05











  • Thank you all! @Schifini: So will it automatically make only one iteration when I run it once?

    – Scijens
    Nov 14 '18 at 13:30













0












0








0








I'm working with a big JSON Lines file (80GB) and want to stream in the first 1000 lines to check the data structure. What worked with a sample file of 100 rows of this dataset is to use the stream_in function of jsonlite.



I've tried the following code for the 80GB file, but it doesn't work:



data <- stream_in(con=file("logfiles-data")[1:1000])


How can I fix this?



Thanks in advance!










share|improve this question
















I'm working with a big JSON Lines file (80GB) and want to stream in the first 1000 lines to check the data structure. What worked with a sample file of 100 rows of this dataset is to use the stream_in function of jsonlite.



I've tried the following code for the 80GB file, but it doesn't work:



data <- stream_in(con=file("logfiles-data")[1:1000])


How can I fix this?



Thanks in advance!







r json bigdata logfile






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 13 '18 at 13:51









JJJ

29.1k147591




29.1k147591










asked Nov 13 '18 at 9:06









ScijensScijens

456




456












  • Just a side note: Are you sure it's JSON? From the description, in particular the logfile and your use of the word "rows," it sounds more likely it's JSON Lines.

    – T.J. Crowder
    Nov 13 '18 at 9:08












  • You're right, it seems to be JSON LINES. Thank you.

    – Scijens
    Nov 13 '18 at 9:19











  • Have you tried setting the pagesize (number of lines to read/write from/to the connection per iteration) parameter to stream_in?

    – R. Schifini
    Nov 13 '18 at 14:04












  • I'd use readLines(n=1000) then use textConnection() to pass that to stream_in()

    – Michael Bird
    Nov 13 '18 at 14:05











  • Thank you all! @Schifini: So will it automatically make only one iteration when I run it once?

    – Scijens
    Nov 14 '18 at 13:30

















  • Just a side note: Are you sure it's JSON? From the description, in particular the logfile and your use of the word "rows," it sounds more likely it's JSON Lines.

    – T.J. Crowder
    Nov 13 '18 at 9:08












  • You're right, it seems to be JSON LINES. Thank you.

    – Scijens
    Nov 13 '18 at 9:19











  • Have you tried setting the pagesize (number of lines to read/write from/to the connection per iteration) parameter to stream_in?

    – R. Schifini
    Nov 13 '18 at 14:04












  • I'd use readLines(n=1000) then use textConnection() to pass that to stream_in()

    – Michael Bird
    Nov 13 '18 at 14:05











  • Thank you all! @Schifini: So will it automatically make only one iteration when I run it once?

    – Scijens
    Nov 14 '18 at 13:30
















Just a side note: Are you sure it's JSON? From the description, in particular the logfile and your use of the word "rows," it sounds more likely it's JSON Lines.

– T.J. Crowder
Nov 13 '18 at 9:08






Just a side note: Are you sure it's JSON? From the description, in particular the logfile and your use of the word "rows," it sounds more likely it's JSON Lines.

– T.J. Crowder
Nov 13 '18 at 9:08














You're right, it seems to be JSON LINES. Thank you.

– Scijens
Nov 13 '18 at 9:19





You're right, it seems to be JSON LINES. Thank you.

– Scijens
Nov 13 '18 at 9:19













Have you tried setting the pagesize (number of lines to read/write from/to the connection per iteration) parameter to stream_in?

– R. Schifini
Nov 13 '18 at 14:04






Have you tried setting the pagesize (number of lines to read/write from/to the connection per iteration) parameter to stream_in?

– R. Schifini
Nov 13 '18 at 14:04














I'd use readLines(n=1000) then use textConnection() to pass that to stream_in()

– Michael Bird
Nov 13 '18 at 14:05





I'd use readLines(n=1000) then use textConnection() to pass that to stream_in()

– Michael Bird
Nov 13 '18 at 14:05













Thank you all! @Schifini: So will it automatically make only one iteration when I run it once?

– Scijens
Nov 14 '18 at 13:30





Thank you all! @Schifini: So will it automatically make only one iteration when I run it once?

– Scijens
Nov 14 '18 at 13:30












0






active

oldest

votes











Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53277351%2fread-first-1000-lines-from-very-big-json-lines-file-r%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53277351%2fread-first-1000-lines-from-very-big-json-lines-file-r%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Use pre created SQLite database for Android project in kotlin

Darth Vader #20

Ondo