Efficient string concatenation in Python 3 [duplicate]









up vote
0
down vote

favorite













This question already has an answer here:



  • How slow is Python's string concatenation vs. str.join?

    5 answers



I am writing a Python 3 code where the task is to open about 550 files in a directory, read their contents and append it to a string variable 'all_text' which will be say around millions of line long as a single line.



The inefficient code I was using till now is as follows-



all_text += str(file_content)


But then I read that using 'join()' method is efficient, so I tried the following code-



all_text = ''.join(file_content)


The problem with this code is that this is removing the previously held contents of 'all_text' variable and writing the current file's content only!



How do I get around this problem?



Thanks for your help!










share|improve this question













marked as duplicate by stovfl, GhostCat, Matt Raines, mrpatg, Alexei Nov 10 at 16:09


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.














  • all_text += ''.join(file_content)?
    – DYZ
    Nov 10 at 7:09














up vote
0
down vote

favorite













This question already has an answer here:



  • How slow is Python's string concatenation vs. str.join?

    5 answers



I am writing a Python 3 code where the task is to open about 550 files in a directory, read their contents and append it to a string variable 'all_text' which will be say around millions of line long as a single line.



The inefficient code I was using till now is as follows-



all_text += str(file_content)


But then I read that using 'join()' method is efficient, so I tried the following code-



all_text = ''.join(file_content)


The problem with this code is that this is removing the previously held contents of 'all_text' variable and writing the current file's content only!



How do I get around this problem?



Thanks for your help!










share|improve this question













marked as duplicate by stovfl, GhostCat, Matt Raines, mrpatg, Alexei Nov 10 at 16:09


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.














  • all_text += ''.join(file_content)?
    – DYZ
    Nov 10 at 7:09












up vote
0
down vote

favorite









up vote
0
down vote

favorite












This question already has an answer here:



  • How slow is Python's string concatenation vs. str.join?

    5 answers



I am writing a Python 3 code where the task is to open about 550 files in a directory, read their contents and append it to a string variable 'all_text' which will be say around millions of line long as a single line.



The inefficient code I was using till now is as follows-



all_text += str(file_content)


But then I read that using 'join()' method is efficient, so I tried the following code-



all_text = ''.join(file_content)


The problem with this code is that this is removing the previously held contents of 'all_text' variable and writing the current file's content only!



How do I get around this problem?



Thanks for your help!










share|improve this question














This question already has an answer here:



  • How slow is Python's string concatenation vs. str.join?

    5 answers



I am writing a Python 3 code where the task is to open about 550 files in a directory, read their contents and append it to a string variable 'all_text' which will be say around millions of line long as a single line.



The inefficient code I was using till now is as follows-



all_text += str(file_content)


But then I read that using 'join()' method is efficient, so I tried the following code-



all_text = ''.join(file_content)


The problem with this code is that this is removing the previously held contents of 'all_text' variable and writing the current file's content only!



How do I get around this problem?



Thanks for your help!





This question already has an answer here:



  • How slow is Python's string concatenation vs. str.join?

    5 answers







python






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 10 at 7:07









Arun

115218




115218




marked as duplicate by stovfl, GhostCat, Matt Raines, mrpatg, Alexei Nov 10 at 16:09


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.






marked as duplicate by stovfl, GhostCat, Matt Raines, mrpatg, Alexei Nov 10 at 16:09


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.













  • all_text += ''.join(file_content)?
    – DYZ
    Nov 10 at 7:09
















  • all_text += ''.join(file_content)?
    – DYZ
    Nov 10 at 7:09















all_text += ''.join(file_content)?
– DYZ
Nov 10 at 7:09




all_text += ''.join(file_content)?
– DYZ
Nov 10 at 7:09












1 Answer
1






active

oldest

votes

















up vote
0
down vote













join() has a definition str.join(iterable) where iterable is a generator or a list or a set and so on. So it is helpful if you already have a list of strings read from the files and you are concatenating them using join.
For example



numList = ['1', '2', '3', '4']
seperator = ', '
print(seperator.join(numList))

numTuple = ('1', '2', '3', '4')
print(seperator.join(numTuple))

s1 = 'abc'
s2 = '123'

""" Each character of s2 is concatenated to the front of s1"""
print('s1.join(s2):', s1.join(s2))

""" Each character of s1 is concatenated to the front of s2"""
print('s2.join(s1):', s2.join(s1))


You can get all lines in a file using join like ''.join(readlines(f))



Now you can accomplish your task using join as follows using fileinput module



import fileinput
files= ['package-lock.json', 'sqldump.sql', 'felony.json', 'maindata.csv']
allfiles = fileinput.input(files)
all_text = ''.join(allfiles)


Refer to this answer to know the most efficient way to concat files into a string.



Suggestion: As you mentioned there would be millions of lines, did you consider the memory it is going to consume to store it in a variable? So it is better you do what you are planning to do on the fly while reading the lines instead of storing it in a variable.






share|improve this answer






















  • I tried your code using 'fileinput' module but it gives me a MemoryError. My system has 8GB RAM and the combined size of the 550 files in the directory is around 2.3 GB. Any ideas how to avoid this?
    – Arun
    Nov 10 at 14:56










  • Could be that your machine has not enough free RAM or your OS has limited Python to a certain amount of RAM only. Nevertheless, as I suggested you better do whatever you are planning to do with those files on-the-fly rather than adding them up because opening file consumes memory and the string again consumes memory. So you're using a lot of memory which is inefficient in the first place.
    – Mani Kumar Reddy Kancharla
    Nov 12 at 9:45

















1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
0
down vote













join() has a definition str.join(iterable) where iterable is a generator or a list or a set and so on. So it is helpful if you already have a list of strings read from the files and you are concatenating them using join.
For example



numList = ['1', '2', '3', '4']
seperator = ', '
print(seperator.join(numList))

numTuple = ('1', '2', '3', '4')
print(seperator.join(numTuple))

s1 = 'abc'
s2 = '123'

""" Each character of s2 is concatenated to the front of s1"""
print('s1.join(s2):', s1.join(s2))

""" Each character of s1 is concatenated to the front of s2"""
print('s2.join(s1):', s2.join(s1))


You can get all lines in a file using join like ''.join(readlines(f))



Now you can accomplish your task using join as follows using fileinput module



import fileinput
files= ['package-lock.json', 'sqldump.sql', 'felony.json', 'maindata.csv']
allfiles = fileinput.input(files)
all_text = ''.join(allfiles)


Refer to this answer to know the most efficient way to concat files into a string.



Suggestion: As you mentioned there would be millions of lines, did you consider the memory it is going to consume to store it in a variable? So it is better you do what you are planning to do on the fly while reading the lines instead of storing it in a variable.






share|improve this answer






















  • I tried your code using 'fileinput' module but it gives me a MemoryError. My system has 8GB RAM and the combined size of the 550 files in the directory is around 2.3 GB. Any ideas how to avoid this?
    – Arun
    Nov 10 at 14:56










  • Could be that your machine has not enough free RAM or your OS has limited Python to a certain amount of RAM only. Nevertheless, as I suggested you better do whatever you are planning to do with those files on-the-fly rather than adding them up because opening file consumes memory and the string again consumes memory. So you're using a lot of memory which is inefficient in the first place.
    – Mani Kumar Reddy Kancharla
    Nov 12 at 9:45














up vote
0
down vote













join() has a definition str.join(iterable) where iterable is a generator or a list or a set and so on. So it is helpful if you already have a list of strings read from the files and you are concatenating them using join.
For example



numList = ['1', '2', '3', '4']
seperator = ', '
print(seperator.join(numList))

numTuple = ('1', '2', '3', '4')
print(seperator.join(numTuple))

s1 = 'abc'
s2 = '123'

""" Each character of s2 is concatenated to the front of s1"""
print('s1.join(s2):', s1.join(s2))

""" Each character of s1 is concatenated to the front of s2"""
print('s2.join(s1):', s2.join(s1))


You can get all lines in a file using join like ''.join(readlines(f))



Now you can accomplish your task using join as follows using fileinput module



import fileinput
files= ['package-lock.json', 'sqldump.sql', 'felony.json', 'maindata.csv']
allfiles = fileinput.input(files)
all_text = ''.join(allfiles)


Refer to this answer to know the most efficient way to concat files into a string.



Suggestion: As you mentioned there would be millions of lines, did you consider the memory it is going to consume to store it in a variable? So it is better you do what you are planning to do on the fly while reading the lines instead of storing it in a variable.






share|improve this answer






















  • I tried your code using 'fileinput' module but it gives me a MemoryError. My system has 8GB RAM and the combined size of the 550 files in the directory is around 2.3 GB. Any ideas how to avoid this?
    – Arun
    Nov 10 at 14:56










  • Could be that your machine has not enough free RAM or your OS has limited Python to a certain amount of RAM only. Nevertheless, as I suggested you better do whatever you are planning to do with those files on-the-fly rather than adding them up because opening file consumes memory and the string again consumes memory. So you're using a lot of memory which is inefficient in the first place.
    – Mani Kumar Reddy Kancharla
    Nov 12 at 9:45












up vote
0
down vote










up vote
0
down vote









join() has a definition str.join(iterable) where iterable is a generator or a list or a set and so on. So it is helpful if you already have a list of strings read from the files and you are concatenating them using join.
For example



numList = ['1', '2', '3', '4']
seperator = ', '
print(seperator.join(numList))

numTuple = ('1', '2', '3', '4')
print(seperator.join(numTuple))

s1 = 'abc'
s2 = '123'

""" Each character of s2 is concatenated to the front of s1"""
print('s1.join(s2):', s1.join(s2))

""" Each character of s1 is concatenated to the front of s2"""
print('s2.join(s1):', s2.join(s1))


You can get all lines in a file using join like ''.join(readlines(f))



Now you can accomplish your task using join as follows using fileinput module



import fileinput
files= ['package-lock.json', 'sqldump.sql', 'felony.json', 'maindata.csv']
allfiles = fileinput.input(files)
all_text = ''.join(allfiles)


Refer to this answer to know the most efficient way to concat files into a string.



Suggestion: As you mentioned there would be millions of lines, did you consider the memory it is going to consume to store it in a variable? So it is better you do what you are planning to do on the fly while reading the lines instead of storing it in a variable.






share|improve this answer














join() has a definition str.join(iterable) where iterable is a generator or a list or a set and so on. So it is helpful if you already have a list of strings read from the files and you are concatenating them using join.
For example



numList = ['1', '2', '3', '4']
seperator = ', '
print(seperator.join(numList))

numTuple = ('1', '2', '3', '4')
print(seperator.join(numTuple))

s1 = 'abc'
s2 = '123'

""" Each character of s2 is concatenated to the front of s1"""
print('s1.join(s2):', s1.join(s2))

""" Each character of s1 is concatenated to the front of s2"""
print('s2.join(s1):', s2.join(s1))


You can get all lines in a file using join like ''.join(readlines(f))



Now you can accomplish your task using join as follows using fileinput module



import fileinput
files= ['package-lock.json', 'sqldump.sql', 'felony.json', 'maindata.csv']
allfiles = fileinput.input(files)
all_text = ''.join(allfiles)


Refer to this answer to know the most efficient way to concat files into a string.



Suggestion: As you mentioned there would be millions of lines, did you consider the memory it is going to consume to store it in a variable? So it is better you do what you are planning to do on the fly while reading the lines instead of storing it in a variable.







share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 10 at 7:38

























answered Nov 10 at 7:23









Mani Kumar Reddy Kancharla

539




539











  • I tried your code using 'fileinput' module but it gives me a MemoryError. My system has 8GB RAM and the combined size of the 550 files in the directory is around 2.3 GB. Any ideas how to avoid this?
    – Arun
    Nov 10 at 14:56










  • Could be that your machine has not enough free RAM or your OS has limited Python to a certain amount of RAM only. Nevertheless, as I suggested you better do whatever you are planning to do with those files on-the-fly rather than adding them up because opening file consumes memory and the string again consumes memory. So you're using a lot of memory which is inefficient in the first place.
    – Mani Kumar Reddy Kancharla
    Nov 12 at 9:45
















  • I tried your code using 'fileinput' module but it gives me a MemoryError. My system has 8GB RAM and the combined size of the 550 files in the directory is around 2.3 GB. Any ideas how to avoid this?
    – Arun
    Nov 10 at 14:56










  • Could be that your machine has not enough free RAM or your OS has limited Python to a certain amount of RAM only. Nevertheless, as I suggested you better do whatever you are planning to do with those files on-the-fly rather than adding them up because opening file consumes memory and the string again consumes memory. So you're using a lot of memory which is inefficient in the first place.
    – Mani Kumar Reddy Kancharla
    Nov 12 at 9:45















I tried your code using 'fileinput' module but it gives me a MemoryError. My system has 8GB RAM and the combined size of the 550 files in the directory is around 2.3 GB. Any ideas how to avoid this?
– Arun
Nov 10 at 14:56




I tried your code using 'fileinput' module but it gives me a MemoryError. My system has 8GB RAM and the combined size of the 550 files in the directory is around 2.3 GB. Any ideas how to avoid this?
– Arun
Nov 10 at 14:56












Could be that your machine has not enough free RAM or your OS has limited Python to a certain amount of RAM only. Nevertheless, as I suggested you better do whatever you are planning to do with those files on-the-fly rather than adding them up because opening file consumes memory and the string again consumes memory. So you're using a lot of memory which is inefficient in the first place.
– Mani Kumar Reddy Kancharla
Nov 12 at 9:45




Could be that your machine has not enough free RAM or your OS has limited Python to a certain amount of RAM only. Nevertheless, as I suggested you better do whatever you are planning to do with those files on-the-fly rather than adding them up because opening file consumes memory and the string again consumes memory. So you're using a lot of memory which is inefficient in the first place.
– Mani Kumar Reddy Kancharla
Nov 12 at 9:45



Popular posts from this blog

Use pre created SQLite database for Android project in kotlin

Darth Vader #20

Ondo