IO Write errors for sufficiently large files
up vote
1
down vote
favorite
I'm working on an EC2 instance with 500 GB RAM, a 500 GB mounted drive used as cache and mounted S3 buckets through s3fs. I'm trying to merge numerous large (~130 GB) csvs into a single file in the mounted bucket. No matter what solution I've tried (C, C++, R, bash) after the merged file gets to ~100 GB in size (cache still not full) I get some variant of "Write error: Operation not supported" which typically occurs after 2 or 3 of the smaller files are merged. I've exhausted my know-how and am not sure how to proceed with these file merges.
Sample code (R):
library(data.table)
file1 <- fread('file1.csv', header = True, sep = ',')
fwrite(file1, 'merged.csv', append = True)
so on and so forth ...
Bash:
cat *.csv >> ../merged.csv
r bash amazon-web-services amazon-s3 amazon-ec2
|
show 2 more comments
up vote
1
down vote
favorite
I'm working on an EC2 instance with 500 GB RAM, a 500 GB mounted drive used as cache and mounted S3 buckets through s3fs. I'm trying to merge numerous large (~130 GB) csvs into a single file in the mounted bucket. No matter what solution I've tried (C, C++, R, bash) after the merged file gets to ~100 GB in size (cache still not full) I get some variant of "Write error: Operation not supported" which typically occurs after 2 or 3 of the smaller files are merged. I've exhausted my know-how and am not sure how to proceed with these file merges.
Sample code (R):
library(data.table)
file1 <- fread('file1.csv', header = True, sep = ',')
fwrite(file1, 'merged.csv', append = True)
so on and so forth ...
Bash:
cat *.csv >> ../merged.csv
r bash amazon-web-services amazon-s3 amazon-ec2
Could the error be related to the used file system (FS)? Some FS have a maximum file size < partition size, however I have never heard of a 100GB limit.
– Socowi
Nov 9 at 16:19
Yes that's what's odd to me, I don't believe there's any such limitation as I'm able to read files 150+GB in size into memory to process, but for some reason merging these files is very fragile. At first I thought that maybe bash is too fragile for this which is why I tried more heavy duty options like C++ and R but they all give similar errors which I believe are related to writing into the S3 bucket.
– africabytoto
Nov 9 at 19:28
There is no way to really, properly "mount" a bucket -- there are only hacks, like s3fs and goofys. How are you doing it?
– Michael - sqlbot
Nov 10 at 1:12
@Michael-sqlbot good catch, I'm mounting using s3fs. Edited the original post to reflect that addition.
– africabytoto
Nov 10 at 3:56
1
iirc, s3fs requires available temp space as large as the file you are creating, to stage the file it is uploading.
– Michael - sqlbot
Nov 10 at 4:05
|
show 2 more comments
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I'm working on an EC2 instance with 500 GB RAM, a 500 GB mounted drive used as cache and mounted S3 buckets through s3fs. I'm trying to merge numerous large (~130 GB) csvs into a single file in the mounted bucket. No matter what solution I've tried (C, C++, R, bash) after the merged file gets to ~100 GB in size (cache still not full) I get some variant of "Write error: Operation not supported" which typically occurs after 2 or 3 of the smaller files are merged. I've exhausted my know-how and am not sure how to proceed with these file merges.
Sample code (R):
library(data.table)
file1 <- fread('file1.csv', header = True, sep = ',')
fwrite(file1, 'merged.csv', append = True)
so on and so forth ...
Bash:
cat *.csv >> ../merged.csv
r bash amazon-web-services amazon-s3 amazon-ec2
I'm working on an EC2 instance with 500 GB RAM, a 500 GB mounted drive used as cache and mounted S3 buckets through s3fs. I'm trying to merge numerous large (~130 GB) csvs into a single file in the mounted bucket. No matter what solution I've tried (C, C++, R, bash) after the merged file gets to ~100 GB in size (cache still not full) I get some variant of "Write error: Operation not supported" which typically occurs after 2 or 3 of the smaller files are merged. I've exhausted my know-how and am not sure how to proceed with these file merges.
Sample code (R):
library(data.table)
file1 <- fread('file1.csv', header = True, sep = ',')
fwrite(file1, 'merged.csv', append = True)
so on and so forth ...
Bash:
cat *.csv >> ../merged.csv
r bash amazon-web-services amazon-s3 amazon-ec2
r bash amazon-web-services amazon-s3 amazon-ec2
edited Nov 10 at 3:55
asked Nov 9 at 15:18
africabytoto
62
62
Could the error be related to the used file system (FS)? Some FS have a maximum file size < partition size, however I have never heard of a 100GB limit.
– Socowi
Nov 9 at 16:19
Yes that's what's odd to me, I don't believe there's any such limitation as I'm able to read files 150+GB in size into memory to process, but for some reason merging these files is very fragile. At first I thought that maybe bash is too fragile for this which is why I tried more heavy duty options like C++ and R but they all give similar errors which I believe are related to writing into the S3 bucket.
– africabytoto
Nov 9 at 19:28
There is no way to really, properly "mount" a bucket -- there are only hacks, like s3fs and goofys. How are you doing it?
– Michael - sqlbot
Nov 10 at 1:12
@Michael-sqlbot good catch, I'm mounting using s3fs. Edited the original post to reflect that addition.
– africabytoto
Nov 10 at 3:56
1
iirc, s3fs requires available temp space as large as the file you are creating, to stage the file it is uploading.
– Michael - sqlbot
Nov 10 at 4:05
|
show 2 more comments
Could the error be related to the used file system (FS)? Some FS have a maximum file size < partition size, however I have never heard of a 100GB limit.
– Socowi
Nov 9 at 16:19
Yes that's what's odd to me, I don't believe there's any such limitation as I'm able to read files 150+GB in size into memory to process, but for some reason merging these files is very fragile. At first I thought that maybe bash is too fragile for this which is why I tried more heavy duty options like C++ and R but they all give similar errors which I believe are related to writing into the S3 bucket.
– africabytoto
Nov 9 at 19:28
There is no way to really, properly "mount" a bucket -- there are only hacks, like s3fs and goofys. How are you doing it?
– Michael - sqlbot
Nov 10 at 1:12
@Michael-sqlbot good catch, I'm mounting using s3fs. Edited the original post to reflect that addition.
– africabytoto
Nov 10 at 3:56
1
iirc, s3fs requires available temp space as large as the file you are creating, to stage the file it is uploading.
– Michael - sqlbot
Nov 10 at 4:05
Could the error be related to the used file system (FS)? Some FS have a maximum file size < partition size, however I have never heard of a 100GB limit.
– Socowi
Nov 9 at 16:19
Could the error be related to the used file system (FS)? Some FS have a maximum file size < partition size, however I have never heard of a 100GB limit.
– Socowi
Nov 9 at 16:19
Yes that's what's odd to me, I don't believe there's any such limitation as I'm able to read files 150+GB in size into memory to process, but for some reason merging these files is very fragile. At first I thought that maybe bash is too fragile for this which is why I tried more heavy duty options like C++ and R but they all give similar errors which I believe are related to writing into the S3 bucket.
– africabytoto
Nov 9 at 19:28
Yes that's what's odd to me, I don't believe there's any such limitation as I'm able to read files 150+GB in size into memory to process, but for some reason merging these files is very fragile. At first I thought that maybe bash is too fragile for this which is why I tried more heavy duty options like C++ and R but they all give similar errors which I believe are related to writing into the S3 bucket.
– africabytoto
Nov 9 at 19:28
There is no way to really, properly "mount" a bucket -- there are only hacks, like s3fs and goofys. How are you doing it?
– Michael - sqlbot
Nov 10 at 1:12
There is no way to really, properly "mount" a bucket -- there are only hacks, like s3fs and goofys. How are you doing it?
– Michael - sqlbot
Nov 10 at 1:12
@Michael-sqlbot good catch, I'm mounting using s3fs. Edited the original post to reflect that addition.
– africabytoto
Nov 10 at 3:56
@Michael-sqlbot good catch, I'm mounting using s3fs. Edited the original post to reflect that addition.
– africabytoto
Nov 10 at 3:56
1
1
iirc, s3fs requires available temp space as large as the file you are creating, to stage the file it is uploading.
– Michael - sqlbot
Nov 10 at 4:05
iirc, s3fs requires available temp space as large as the file you are creating, to stage the file it is uploading.
– Michael - sqlbot
Nov 10 at 4:05
|
show 2 more comments
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53228465%2fio-write-errors-for-sufficiently-large-files%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Could the error be related to the used file system (FS)? Some FS have a maximum file size < partition size, however I have never heard of a 100GB limit.
– Socowi
Nov 9 at 16:19
Yes that's what's odd to me, I don't believe there's any such limitation as I'm able to read files 150+GB in size into memory to process, but for some reason merging these files is very fragile. At first I thought that maybe bash is too fragile for this which is why I tried more heavy duty options like C++ and R but they all give similar errors which I believe are related to writing into the S3 bucket.
– africabytoto
Nov 9 at 19:28
There is no way to really, properly "mount" a bucket -- there are only hacks, like s3fs and goofys. How are you doing it?
– Michael - sqlbot
Nov 10 at 1:12
@Michael-sqlbot good catch, I'm mounting using s3fs. Edited the original post to reflect that addition.
– africabytoto
Nov 10 at 3:56
1
iirc, s3fs requires available temp space as large as the file you are creating, to stage the file it is uploading.
– Michael - sqlbot
Nov 10 at 4:05