Efficient string concatenation in Python 3 [duplicate]
up vote
0
down vote
favorite
This question already has an answer here:
How slow is Python's string concatenation vs. str.join?
5 answers
I am writing a Python 3 code where the task is to open about 550 files in a directory, read their contents and append it to a string variable 'all_text' which will be say around millions of line long as a single line.
The inefficient code I was using till now is as follows-
all_text += str(file_content)
But then I read that using 'join()' method is efficient, so I tried the following code-
all_text = ''.join(file_content)
The problem with this code is that this is removing the previously held contents of 'all_text' variable and writing the current file's content only!
How do I get around this problem?
Thanks for your help!
python
marked as duplicate by stovfl, GhostCat, Matt Raines, mrpatg, Alexei Nov 10 at 16:09
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
add a comment |
up vote
0
down vote
favorite
This question already has an answer here:
How slow is Python's string concatenation vs. str.join?
5 answers
I am writing a Python 3 code where the task is to open about 550 files in a directory, read their contents and append it to a string variable 'all_text' which will be say around millions of line long as a single line.
The inefficient code I was using till now is as follows-
all_text += str(file_content)
But then I read that using 'join()' method is efficient, so I tried the following code-
all_text = ''.join(file_content)
The problem with this code is that this is removing the previously held contents of 'all_text' variable and writing the current file's content only!
How do I get around this problem?
Thanks for your help!
python
marked as duplicate by stovfl, GhostCat, Matt Raines, mrpatg, Alexei Nov 10 at 16:09
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
all_text += ''.join(file_content)
?
– DYZ
Nov 10 at 7:09
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
This question already has an answer here:
How slow is Python's string concatenation vs. str.join?
5 answers
I am writing a Python 3 code where the task is to open about 550 files in a directory, read their contents and append it to a string variable 'all_text' which will be say around millions of line long as a single line.
The inefficient code I was using till now is as follows-
all_text += str(file_content)
But then I read that using 'join()' method is efficient, so I tried the following code-
all_text = ''.join(file_content)
The problem with this code is that this is removing the previously held contents of 'all_text' variable and writing the current file's content only!
How do I get around this problem?
Thanks for your help!
python
This question already has an answer here:
How slow is Python's string concatenation vs. str.join?
5 answers
I am writing a Python 3 code where the task is to open about 550 files in a directory, read their contents and append it to a string variable 'all_text' which will be say around millions of line long as a single line.
The inefficient code I was using till now is as follows-
all_text += str(file_content)
But then I read that using 'join()' method is efficient, so I tried the following code-
all_text = ''.join(file_content)
The problem with this code is that this is removing the previously held contents of 'all_text' variable and writing the current file's content only!
How do I get around this problem?
Thanks for your help!
This question already has an answer here:
How slow is Python's string concatenation vs. str.join?
5 answers
python
python
asked Nov 10 at 7:07
Arun
115218
115218
marked as duplicate by stovfl, GhostCat, Matt Raines, mrpatg, Alexei Nov 10 at 16:09
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
marked as duplicate by stovfl, GhostCat, Matt Raines, mrpatg, Alexei Nov 10 at 16:09
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
all_text += ''.join(file_content)
?
– DYZ
Nov 10 at 7:09
add a comment |
all_text += ''.join(file_content)
?
– DYZ
Nov 10 at 7:09
all_text += ''.join(file_content)
?– DYZ
Nov 10 at 7:09
all_text += ''.join(file_content)
?– DYZ
Nov 10 at 7:09
add a comment |
1 Answer
1
active
oldest
votes
up vote
0
down vote
join() has a definition str.join(iterable) where iterable is a generator or a list or a set and so on. So it is helpful if you already have a list of strings read from the files and you are concatenating them using join.
For example
numList = ['1', '2', '3', '4']
seperator = ', '
print(seperator.join(numList))
numTuple = ('1', '2', '3', '4')
print(seperator.join(numTuple))
s1 = 'abc'
s2 = '123'
""" Each character of s2 is concatenated to the front of s1"""
print('s1.join(s2):', s1.join(s2))
""" Each character of s1 is concatenated to the front of s2"""
print('s2.join(s1):', s2.join(s1))
You can get all lines in a file using join like ''.join(readlines(f))
Now you can accomplish your task using join as follows using fileinput module
import fileinput
files= ['package-lock.json', 'sqldump.sql', 'felony.json', 'maindata.csv']
allfiles = fileinput.input(files)
all_text = ''.join(allfiles)
Refer to this answer to know the most efficient way to concat files into a string.
Suggestion: As you mentioned there would be millions of lines, did you consider the memory it is going to consume to store it in a variable? So it is better you do what you are planning to do on the fly while reading the lines instead of storing it in a variable.
I tried your code using 'fileinput' module but it gives me a MemoryError. My system has 8GB RAM and the combined size of the 550 files in the directory is around 2.3 GB. Any ideas how to avoid this?
– Arun
Nov 10 at 14:56
Could be that your machine has not enough free RAM or your OS has limited Python to a certain amount of RAM only. Nevertheless, as I suggested you better do whatever you are planning to do with those files on-the-fly rather than adding them up because opening file consumes memory and the string again consumes memory. So you're using a lot of memory which is inefficient in the first place.
– Mani Kumar Reddy Kancharla
Nov 12 at 9:45
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
join() has a definition str.join(iterable) where iterable is a generator or a list or a set and so on. So it is helpful if you already have a list of strings read from the files and you are concatenating them using join.
For example
numList = ['1', '2', '3', '4']
seperator = ', '
print(seperator.join(numList))
numTuple = ('1', '2', '3', '4')
print(seperator.join(numTuple))
s1 = 'abc'
s2 = '123'
""" Each character of s2 is concatenated to the front of s1"""
print('s1.join(s2):', s1.join(s2))
""" Each character of s1 is concatenated to the front of s2"""
print('s2.join(s1):', s2.join(s1))
You can get all lines in a file using join like ''.join(readlines(f))
Now you can accomplish your task using join as follows using fileinput module
import fileinput
files= ['package-lock.json', 'sqldump.sql', 'felony.json', 'maindata.csv']
allfiles = fileinput.input(files)
all_text = ''.join(allfiles)
Refer to this answer to know the most efficient way to concat files into a string.
Suggestion: As you mentioned there would be millions of lines, did you consider the memory it is going to consume to store it in a variable? So it is better you do what you are planning to do on the fly while reading the lines instead of storing it in a variable.
I tried your code using 'fileinput' module but it gives me a MemoryError. My system has 8GB RAM and the combined size of the 550 files in the directory is around 2.3 GB. Any ideas how to avoid this?
– Arun
Nov 10 at 14:56
Could be that your machine has not enough free RAM or your OS has limited Python to a certain amount of RAM only. Nevertheless, as I suggested you better do whatever you are planning to do with those files on-the-fly rather than adding them up because opening file consumes memory and the string again consumes memory. So you're using a lot of memory which is inefficient in the first place.
– Mani Kumar Reddy Kancharla
Nov 12 at 9:45
add a comment |
up vote
0
down vote
join() has a definition str.join(iterable) where iterable is a generator or a list or a set and so on. So it is helpful if you already have a list of strings read from the files and you are concatenating them using join.
For example
numList = ['1', '2', '3', '4']
seperator = ', '
print(seperator.join(numList))
numTuple = ('1', '2', '3', '4')
print(seperator.join(numTuple))
s1 = 'abc'
s2 = '123'
""" Each character of s2 is concatenated to the front of s1"""
print('s1.join(s2):', s1.join(s2))
""" Each character of s1 is concatenated to the front of s2"""
print('s2.join(s1):', s2.join(s1))
You can get all lines in a file using join like ''.join(readlines(f))
Now you can accomplish your task using join as follows using fileinput module
import fileinput
files= ['package-lock.json', 'sqldump.sql', 'felony.json', 'maindata.csv']
allfiles = fileinput.input(files)
all_text = ''.join(allfiles)
Refer to this answer to know the most efficient way to concat files into a string.
Suggestion: As you mentioned there would be millions of lines, did you consider the memory it is going to consume to store it in a variable? So it is better you do what you are planning to do on the fly while reading the lines instead of storing it in a variable.
I tried your code using 'fileinput' module but it gives me a MemoryError. My system has 8GB RAM and the combined size of the 550 files in the directory is around 2.3 GB. Any ideas how to avoid this?
– Arun
Nov 10 at 14:56
Could be that your machine has not enough free RAM or your OS has limited Python to a certain amount of RAM only. Nevertheless, as I suggested you better do whatever you are planning to do with those files on-the-fly rather than adding them up because opening file consumes memory and the string again consumes memory. So you're using a lot of memory which is inefficient in the first place.
– Mani Kumar Reddy Kancharla
Nov 12 at 9:45
add a comment |
up vote
0
down vote
up vote
0
down vote
join() has a definition str.join(iterable) where iterable is a generator or a list or a set and so on. So it is helpful if you already have a list of strings read from the files and you are concatenating them using join.
For example
numList = ['1', '2', '3', '4']
seperator = ', '
print(seperator.join(numList))
numTuple = ('1', '2', '3', '4')
print(seperator.join(numTuple))
s1 = 'abc'
s2 = '123'
""" Each character of s2 is concatenated to the front of s1"""
print('s1.join(s2):', s1.join(s2))
""" Each character of s1 is concatenated to the front of s2"""
print('s2.join(s1):', s2.join(s1))
You can get all lines in a file using join like ''.join(readlines(f))
Now you can accomplish your task using join as follows using fileinput module
import fileinput
files= ['package-lock.json', 'sqldump.sql', 'felony.json', 'maindata.csv']
allfiles = fileinput.input(files)
all_text = ''.join(allfiles)
Refer to this answer to know the most efficient way to concat files into a string.
Suggestion: As you mentioned there would be millions of lines, did you consider the memory it is going to consume to store it in a variable? So it is better you do what you are planning to do on the fly while reading the lines instead of storing it in a variable.
join() has a definition str.join(iterable) where iterable is a generator or a list or a set and so on. So it is helpful if you already have a list of strings read from the files and you are concatenating them using join.
For example
numList = ['1', '2', '3', '4']
seperator = ', '
print(seperator.join(numList))
numTuple = ('1', '2', '3', '4')
print(seperator.join(numTuple))
s1 = 'abc'
s2 = '123'
""" Each character of s2 is concatenated to the front of s1"""
print('s1.join(s2):', s1.join(s2))
""" Each character of s1 is concatenated to the front of s2"""
print('s2.join(s1):', s2.join(s1))
You can get all lines in a file using join like ''.join(readlines(f))
Now you can accomplish your task using join as follows using fileinput module
import fileinput
files= ['package-lock.json', 'sqldump.sql', 'felony.json', 'maindata.csv']
allfiles = fileinput.input(files)
all_text = ''.join(allfiles)
Refer to this answer to know the most efficient way to concat files into a string.
Suggestion: As you mentioned there would be millions of lines, did you consider the memory it is going to consume to store it in a variable? So it is better you do what you are planning to do on the fly while reading the lines instead of storing it in a variable.
edited Nov 10 at 7:38
answered Nov 10 at 7:23
Mani Kumar Reddy Kancharla
539
539
I tried your code using 'fileinput' module but it gives me a MemoryError. My system has 8GB RAM and the combined size of the 550 files in the directory is around 2.3 GB. Any ideas how to avoid this?
– Arun
Nov 10 at 14:56
Could be that your machine has not enough free RAM or your OS has limited Python to a certain amount of RAM only. Nevertheless, as I suggested you better do whatever you are planning to do with those files on-the-fly rather than adding them up because opening file consumes memory and the string again consumes memory. So you're using a lot of memory which is inefficient in the first place.
– Mani Kumar Reddy Kancharla
Nov 12 at 9:45
add a comment |
I tried your code using 'fileinput' module but it gives me a MemoryError. My system has 8GB RAM and the combined size of the 550 files in the directory is around 2.3 GB. Any ideas how to avoid this?
– Arun
Nov 10 at 14:56
Could be that your machine has not enough free RAM or your OS has limited Python to a certain amount of RAM only. Nevertheless, as I suggested you better do whatever you are planning to do with those files on-the-fly rather than adding them up because opening file consumes memory and the string again consumes memory. So you're using a lot of memory which is inefficient in the first place.
– Mani Kumar Reddy Kancharla
Nov 12 at 9:45
I tried your code using 'fileinput' module but it gives me a MemoryError. My system has 8GB RAM and the combined size of the 550 files in the directory is around 2.3 GB. Any ideas how to avoid this?
– Arun
Nov 10 at 14:56
I tried your code using 'fileinput' module but it gives me a MemoryError. My system has 8GB RAM and the combined size of the 550 files in the directory is around 2.3 GB. Any ideas how to avoid this?
– Arun
Nov 10 at 14:56
Could be that your machine has not enough free RAM or your OS has limited Python to a certain amount of RAM only. Nevertheless, as I suggested you better do whatever you are planning to do with those files on-the-fly rather than adding them up because opening file consumes memory and the string again consumes memory. So you're using a lot of memory which is inefficient in the first place.
– Mani Kumar Reddy Kancharla
Nov 12 at 9:45
Could be that your machine has not enough free RAM or your OS has limited Python to a certain amount of RAM only. Nevertheless, as I suggested you better do whatever you are planning to do with those files on-the-fly rather than adding them up because opening file consumes memory and the string again consumes memory. So you're using a lot of memory which is inefficient in the first place.
– Mani Kumar Reddy Kancharla
Nov 12 at 9:45
add a comment |
all_text += ''.join(file_content)
?– DYZ
Nov 10 at 7:09