IO Write errors for sufficiently large files









up vote
1
down vote

favorite












I'm working on an EC2 instance with 500 GB RAM, a 500 GB mounted drive used as cache and mounted S3 buckets through s3fs. I'm trying to merge numerous large (~130 GB) csvs into a single file in the mounted bucket. No matter what solution I've tried (C, C++, R, bash) after the merged file gets to ~100 GB in size (cache still not full) I get some variant of "Write error: Operation not supported" which typically occurs after 2 or 3 of the smaller files are merged. I've exhausted my know-how and am not sure how to proceed with these file merges.



Sample code (R):



library(data.table)

file1 <- fread('file1.csv', header = True, sep = ',')
fwrite(file1, 'merged.csv', append = True)
so on and so forth ...


Bash:



cat *.csv >> ../merged.csv









share|improve this question























  • Could the error be related to the used file system (FS)? Some FS have a maximum file size < partition size, however I have never heard of a 100GB limit.
    – Socowi
    Nov 9 at 16:19











  • Yes that's what's odd to me, I don't believe there's any such limitation as I'm able to read files 150+GB in size into memory to process, but for some reason merging these files is very fragile. At first I thought that maybe bash is too fragile for this which is why I tried more heavy duty options like C++ and R but they all give similar errors which I believe are related to writing into the S3 bucket.
    – africabytoto
    Nov 9 at 19:28










  • There is no way to really, properly "mount" a bucket -- there are only hacks, like s3fs and goofys. How are you doing it?
    – Michael - sqlbot
    Nov 10 at 1:12










  • @Michael-sqlbot good catch, I'm mounting using s3fs. Edited the original post to reflect that addition.
    – africabytoto
    Nov 10 at 3:56






  • 1




    iirc, s3fs requires available temp space as large as the file you are creating, to stage the file it is uploading.
    – Michael - sqlbot
    Nov 10 at 4:05














up vote
1
down vote

favorite












I'm working on an EC2 instance with 500 GB RAM, a 500 GB mounted drive used as cache and mounted S3 buckets through s3fs. I'm trying to merge numerous large (~130 GB) csvs into a single file in the mounted bucket. No matter what solution I've tried (C, C++, R, bash) after the merged file gets to ~100 GB in size (cache still not full) I get some variant of "Write error: Operation not supported" which typically occurs after 2 or 3 of the smaller files are merged. I've exhausted my know-how and am not sure how to proceed with these file merges.



Sample code (R):



library(data.table)

file1 <- fread('file1.csv', header = True, sep = ',')
fwrite(file1, 'merged.csv', append = True)
so on and so forth ...


Bash:



cat *.csv >> ../merged.csv









share|improve this question























  • Could the error be related to the used file system (FS)? Some FS have a maximum file size < partition size, however I have never heard of a 100GB limit.
    – Socowi
    Nov 9 at 16:19











  • Yes that's what's odd to me, I don't believe there's any such limitation as I'm able to read files 150+GB in size into memory to process, but for some reason merging these files is very fragile. At first I thought that maybe bash is too fragile for this which is why I tried more heavy duty options like C++ and R but they all give similar errors which I believe are related to writing into the S3 bucket.
    – africabytoto
    Nov 9 at 19:28










  • There is no way to really, properly "mount" a bucket -- there are only hacks, like s3fs and goofys. How are you doing it?
    – Michael - sqlbot
    Nov 10 at 1:12










  • @Michael-sqlbot good catch, I'm mounting using s3fs. Edited the original post to reflect that addition.
    – africabytoto
    Nov 10 at 3:56






  • 1




    iirc, s3fs requires available temp space as large as the file you are creating, to stage the file it is uploading.
    – Michael - sqlbot
    Nov 10 at 4:05












up vote
1
down vote

favorite









up vote
1
down vote

favorite











I'm working on an EC2 instance with 500 GB RAM, a 500 GB mounted drive used as cache and mounted S3 buckets through s3fs. I'm trying to merge numerous large (~130 GB) csvs into a single file in the mounted bucket. No matter what solution I've tried (C, C++, R, bash) after the merged file gets to ~100 GB in size (cache still not full) I get some variant of "Write error: Operation not supported" which typically occurs after 2 or 3 of the smaller files are merged. I've exhausted my know-how and am not sure how to proceed with these file merges.



Sample code (R):



library(data.table)

file1 <- fread('file1.csv', header = True, sep = ',')
fwrite(file1, 'merged.csv', append = True)
so on and so forth ...


Bash:



cat *.csv >> ../merged.csv









share|improve this question















I'm working on an EC2 instance with 500 GB RAM, a 500 GB mounted drive used as cache and mounted S3 buckets through s3fs. I'm trying to merge numerous large (~130 GB) csvs into a single file in the mounted bucket. No matter what solution I've tried (C, C++, R, bash) after the merged file gets to ~100 GB in size (cache still not full) I get some variant of "Write error: Operation not supported" which typically occurs after 2 or 3 of the smaller files are merged. I've exhausted my know-how and am not sure how to proceed with these file merges.



Sample code (R):



library(data.table)

file1 <- fread('file1.csv', header = True, sep = ',')
fwrite(file1, 'merged.csv', append = True)
so on and so forth ...


Bash:



cat *.csv >> ../merged.csv






r bash amazon-web-services amazon-s3 amazon-ec2






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 10 at 3:55

























asked Nov 9 at 15:18









africabytoto

62




62











  • Could the error be related to the used file system (FS)? Some FS have a maximum file size < partition size, however I have never heard of a 100GB limit.
    – Socowi
    Nov 9 at 16:19











  • Yes that's what's odd to me, I don't believe there's any such limitation as I'm able to read files 150+GB in size into memory to process, but for some reason merging these files is very fragile. At first I thought that maybe bash is too fragile for this which is why I tried more heavy duty options like C++ and R but they all give similar errors which I believe are related to writing into the S3 bucket.
    – africabytoto
    Nov 9 at 19:28










  • There is no way to really, properly "mount" a bucket -- there are only hacks, like s3fs and goofys. How are you doing it?
    – Michael - sqlbot
    Nov 10 at 1:12










  • @Michael-sqlbot good catch, I'm mounting using s3fs. Edited the original post to reflect that addition.
    – africabytoto
    Nov 10 at 3:56






  • 1




    iirc, s3fs requires available temp space as large as the file you are creating, to stage the file it is uploading.
    – Michael - sqlbot
    Nov 10 at 4:05
















  • Could the error be related to the used file system (FS)? Some FS have a maximum file size < partition size, however I have never heard of a 100GB limit.
    – Socowi
    Nov 9 at 16:19











  • Yes that's what's odd to me, I don't believe there's any such limitation as I'm able to read files 150+GB in size into memory to process, but for some reason merging these files is very fragile. At first I thought that maybe bash is too fragile for this which is why I tried more heavy duty options like C++ and R but they all give similar errors which I believe are related to writing into the S3 bucket.
    – africabytoto
    Nov 9 at 19:28










  • There is no way to really, properly "mount" a bucket -- there are only hacks, like s3fs and goofys. How are you doing it?
    – Michael - sqlbot
    Nov 10 at 1:12










  • @Michael-sqlbot good catch, I'm mounting using s3fs. Edited the original post to reflect that addition.
    – africabytoto
    Nov 10 at 3:56






  • 1




    iirc, s3fs requires available temp space as large as the file you are creating, to stage the file it is uploading.
    – Michael - sqlbot
    Nov 10 at 4:05















Could the error be related to the used file system (FS)? Some FS have a maximum file size < partition size, however I have never heard of a 100GB limit.
– Socowi
Nov 9 at 16:19





Could the error be related to the used file system (FS)? Some FS have a maximum file size < partition size, however I have never heard of a 100GB limit.
– Socowi
Nov 9 at 16:19













Yes that's what's odd to me, I don't believe there's any such limitation as I'm able to read files 150+GB in size into memory to process, but for some reason merging these files is very fragile. At first I thought that maybe bash is too fragile for this which is why I tried more heavy duty options like C++ and R but they all give similar errors which I believe are related to writing into the S3 bucket.
– africabytoto
Nov 9 at 19:28




Yes that's what's odd to me, I don't believe there's any such limitation as I'm able to read files 150+GB in size into memory to process, but for some reason merging these files is very fragile. At first I thought that maybe bash is too fragile for this which is why I tried more heavy duty options like C++ and R but they all give similar errors which I believe are related to writing into the S3 bucket.
– africabytoto
Nov 9 at 19:28












There is no way to really, properly "mount" a bucket -- there are only hacks, like s3fs and goofys. How are you doing it?
– Michael - sqlbot
Nov 10 at 1:12




There is no way to really, properly "mount" a bucket -- there are only hacks, like s3fs and goofys. How are you doing it?
– Michael - sqlbot
Nov 10 at 1:12












@Michael-sqlbot good catch, I'm mounting using s3fs. Edited the original post to reflect that addition.
– africabytoto
Nov 10 at 3:56




@Michael-sqlbot good catch, I'm mounting using s3fs. Edited the original post to reflect that addition.
– africabytoto
Nov 10 at 3:56




1




1




iirc, s3fs requires available temp space as large as the file you are creating, to stage the file it is uploading.
– Michael - sqlbot
Nov 10 at 4:05




iirc, s3fs requires available temp space as large as the file you are creating, to stage the file it is uploading.
– Michael - sqlbot
Nov 10 at 4:05

















active

oldest

votes











Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53228465%2fio-write-errors-for-sufficiently-large-files%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown






























active

oldest

votes













active

oldest

votes









active

oldest

votes






active

oldest

votes















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53228465%2fio-write-errors-for-sufficiently-large-files%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

How to how show current date and time by default on contact form 7 in WordPress without taking input from user in datetimepicker

Syphilis

Darth Vader #20