Efficient string concatenation in Python 3 [duplicate]

up vote
0
down vote

favorite

This question already has an answer here:

How slow is Python's string concatenation vs. str.join?

5 answers

I am writing a Python 3 code where the task is to open about 550 files in a directory, read their contents and append it to a string variable 'all_text' which will be say around millions of line long as a single line.

The inefficient code I was using till now is as follows-

all_text += str(file_content)

But then I read that using 'join()' method is efficient, so I tried the following code-

all_text = ''.join(file_content)

The problem with this code is that this is removing the previously held contents of 'all_text' variable and writing the current file's content only!

How do I get around this problem?

Thanks for your help!

asked Nov 10 at 7:07

Arun

115218

marked as duplicate by stovfl, GhostCat, Matt Raines, mrpatg, Alexei Nov 10 at 16:09

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

all_text += ''.join(file_content)?
– DYZ
Nov 10 at 7:09

add a comment |

up vote
0
down vote

favorite

This question already has an answer here:

How slow is Python's string concatenation vs. str.join?

5 answers

The inefficient code I was using till now is as follows-

all_text += str(file_content)

But then I read that using 'join()' method is efficient, so I tried the following code-

all_text = ''.join(file_content)

The problem with this code is that this is removing the previously held contents of 'all_text' variable and writing the current file's content only!

How do I get around this problem?

Thanks for your help!

asked Nov 10 at 7:07

Arun

115218

marked as duplicate by stovfl, GhostCat, Matt Raines, mrpatg, Alexei Nov 10 at 16:09

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

all_text += ''.join(file_content)?
– DYZ
Nov 10 at 7:09

add a comment |

up vote
0
down vote

favorite

This question already has an answer here:

How slow is Python's string concatenation vs. str.join?

5 answers

The inefficient code I was using till now is as follows-

all_text += str(file_content)

But then I read that using 'join()' method is efficient, so I tried the following code-

all_text = ''.join(file_content)

The problem with this code is that this is removing the previously held contents of 'all_text' variable and writing the current file's content only!

How do I get around this problem?

Thanks for your help!

asked Nov 10 at 7:07

Arun

115218

This question already has an answer here:

How slow is Python's string concatenation vs. str.join?

5 answers

The inefficient code I was using till now is as follows-

all_text += str(file_content)

But then I read that using 'join()' method is efficient, so I tried the following code-

all_text = ''.join(file_content)

The problem with this code is that this is removing the previously held contents of 'all_text' variable and writing the current file's content only!

How do I get around this problem?

Thanks for your help!

This question already has an answer here:

How slow is Python's string concatenation vs. str.join?

5 answers

python

asked Nov 10 at 7:07

Arun

115218

asked Nov 10 at 7:07

Arun

115218

asked Nov 10 at 7:07

Arun

115218

asked Nov 10 at 7:07

Arun

115218

asked Nov 10 at 7:07

Arun

115218

marked as duplicate by stovfl, GhostCat, Matt Raines, mrpatg, Alexei Nov 10 at 16:09

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

marked as duplicate by stovfl, GhostCat, Matt Raines, mrpatg, Alexei Nov 10 at 16:09

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

all_text += ''.join(file_content)?
– DYZ
Nov 10 at 7:09

add a comment |

all_text += ''.join(file_content)?
– DYZ
Nov 10 at 7:09

all_text += ''.join(file_content)?
– DYZ
Nov 10 at 7:09

add a comment |

1 Answer
1

active

oldest

votes

up vote
0
down vote

join() has a definition str.join(iterable) where iterable is a generator or a list or a set and so on. So it is helpful if you already have a list of strings read from the files and you are concatenating them using join.
For example

numList = ['1', '2', '3', '4']
seperator = ', '
print(seperator.join(numList))

numTuple = ('1', '2', '3', '4')
print(seperator.join(numTuple))

s1 = 'abc'
s2 = '123'

""" Each character of s2 is concatenated to the front of s1""" 
print('s1.join(s2):', s1.join(s2))

""" Each character of s1 is concatenated to the front of s2""" 
print('s2.join(s1):', s2.join(s1))

You can get all lines in a file using join like ''.join(readlines(f))

Now you can accomplish your task using join as follows using fileinput module

import fileinput
files= ['package-lock.json', 'sqldump.sql', 'felony.json', 'maindata.csv']
allfiles = fileinput.input(files)
all_text = ''.join(allfiles)

Refer to this answer to know the most efficient way to concat files into a string.

Suggestion: As you mentioned there would be millions of lines, did you consider the memory it is going to consume to store it in a variable? So it is better you do what you are planning to do on the fly while reading the lines instead of storing it in a variable.

edited Nov 10 at 7:38

answered Nov 10 at 7:23

Mani Kumar Reddy Kancharla

539

I tried your code using 'fileinput' module but it gives me a MemoryError. My system has 8GB RAM and the combined size of the 550 files in the directory is around 2.3 GB. Any ideas how to avoid this?
– Arun
Nov 10 at 14:56

Could be that your machine has not enough free RAM or your OS has limited Python to a certain amount of RAM only. Nevertheless, as I suggested you better do whatever you are planning to do with those files on-the-fly rather than adding them up because opening file consumes memory and the string again consumes memory. So you're using a lot of memory which is inefficient in the first place.
– Mani Kumar Reddy Kancharla
Nov 12 at 9:45

add a comment |

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
0
down vote

numList = ['1', '2', '3', '4']
seperator = ', '
print(seperator.join(numList))

numTuple = ('1', '2', '3', '4')
print(seperator.join(numTuple))

s1 = 'abc'
s2 = '123'

""" Each character of s2 is concatenated to the front of s1""" 
print('s1.join(s2):', s1.join(s2))

""" Each character of s1 is concatenated to the front of s2""" 
print('s2.join(s1):', s2.join(s1))

You can get all lines in a file using join like ''.join(readlines(f))

Now you can accomplish your task using join as follows using fileinput module

import fileinput
files= ['package-lock.json', 'sqldump.sql', 'felony.json', 'maindata.csv']
allfiles = fileinput.input(files)
all_text = ''.join(allfiles)

Refer to this answer to know the most efficient way to concat files into a string.

edited Nov 10 at 7:38

answered Nov 10 at 7:23

Mani Kumar Reddy Kancharla

539

I tried your code using 'fileinput' module but it gives me a MemoryError. My system has 8GB RAM and the combined size of the 550 files in the directory is around 2.3 GB. Any ideas how to avoid this?
– Arun
Nov 10 at 14:56

Could be that your machine has not enough free RAM or your OS has limited Python to a certain amount of RAM only. Nevertheless, as I suggested you better do whatever you are planning to do with those files on-the-fly rather than adding them up because opening file consumes memory and the string again consumes memory. So you're using a lot of memory which is inefficient in the first place.
– Mani Kumar Reddy Kancharla
Nov 12 at 9:45

add a comment |

up vote
0
down vote

numList = ['1', '2', '3', '4']
seperator = ', '
print(seperator.join(numList))

numTuple = ('1', '2', '3', '4')
print(seperator.join(numTuple))

s1 = 'abc'
s2 = '123'

""" Each character of s2 is concatenated to the front of s1""" 
print('s1.join(s2):', s1.join(s2))

""" Each character of s1 is concatenated to the front of s2""" 
print('s2.join(s1):', s2.join(s1))

You can get all lines in a file using join like ''.join(readlines(f))

Now you can accomplish your task using join as follows using fileinput module

import fileinput
files= ['package-lock.json', 'sqldump.sql', 'felony.json', 'maindata.csv']
allfiles = fileinput.input(files)
all_text = ''.join(allfiles)

Refer to this answer to know the most efficient way to concat files into a string.

edited Nov 10 at 7:38

answered Nov 10 at 7:23

Mani Kumar Reddy Kancharla

539

I tried your code using 'fileinput' module but it gives me a MemoryError. My system has 8GB RAM and the combined size of the 550 files in the directory is around 2.3 GB. Any ideas how to avoid this?
– Arun
Nov 10 at 14:56

Could be that your machine has not enough free RAM or your OS has limited Python to a certain amount of RAM only. Nevertheless, as I suggested you better do whatever you are planning to do with those files on-the-fly rather than adding them up because opening file consumes memory and the string again consumes memory. So you're using a lot of memory which is inefficient in the first place.
– Mani Kumar Reddy Kancharla
Nov 12 at 9:45

add a comment |

up vote
0
down vote

numList = ['1', '2', '3', '4']
seperator = ', '
print(seperator.join(numList))

numTuple = ('1', '2', '3', '4')
print(seperator.join(numTuple))

s1 = 'abc'
s2 = '123'

""" Each character of s2 is concatenated to the front of s1""" 
print('s1.join(s2):', s1.join(s2))

""" Each character of s1 is concatenated to the front of s2""" 
print('s2.join(s1):', s2.join(s1))

You can get all lines in a file using join like ''.join(readlines(f))

Now you can accomplish your task using join as follows using fileinput module

import fileinput
files= ['package-lock.json', 'sqldump.sql', 'felony.json', 'maindata.csv']
allfiles = fileinput.input(files)
all_text = ''.join(allfiles)

Refer to this answer to know the most efficient way to concat files into a string.

edited Nov 10 at 7:38

answered Nov 10 at 7:23

Mani Kumar Reddy Kancharla

539

numList = ['1', '2', '3', '4']
seperator = ', '
print(seperator.join(numList))

numTuple = ('1', '2', '3', '4')
print(seperator.join(numTuple))

s1 = 'abc'
s2 = '123'

""" Each character of s2 is concatenated to the front of s1""" 
print('s1.join(s2):', s1.join(s2))

""" Each character of s1 is concatenated to the front of s2""" 
print('s2.join(s1):', s2.join(s1))

You can get all lines in a file using join like ''.join(readlines(f))

Now you can accomplish your task using join as follows using fileinput module

import fileinput
files= ['package-lock.json', 'sqldump.sql', 'felony.json', 'maindata.csv']
allfiles = fileinput.input(files)
all_text = ''.join(allfiles)

Refer to this answer to know the most efficient way to concat files into a string.

edited Nov 10 at 7:38

answered Nov 10 at 7:23

Mani Kumar Reddy Kancharla

539

edited Nov 10 at 7:38

answered Nov 10 at 7:23

Mani Kumar Reddy Kancharla

539

answered Nov 10 at 7:23

Mani Kumar Reddy Kancharla

539

answered Nov 10 at 7:23

Mani Kumar Reddy Kancharla

539

I tried your code using 'fileinput' module but it gives me a MemoryError. My system has 8GB RAM and the combined size of the 550 files in the directory is around 2.3 GB. Any ideas how to avoid this?
– Arun
Nov 10 at 14:56

Could be that your machine has not enough free RAM or your OS has limited Python to a certain amount of RAM only. Nevertheless, as I suggested you better do whatever you are planning to do with those files on-the-fly rather than adding them up because opening file consumes memory and the string again consumes memory. So you're using a lot of memory which is inefficient in the first place.
– Mani Kumar Reddy Kancharla
Nov 12 at 9:45

add a comment |

I tried your code using 'fileinput' module but it gives me a MemoryError. My system has 8GB RAM and the combined size of the 550 files in the directory is around 2.3 GB. Any ideas how to avoid this?
– Arun
Nov 10 at 14:56

Could be that your machine has not enough free RAM or your OS has limited Python to a certain amount of RAM only. Nevertheless, as I suggested you better do whatever you are planning to do with those files on-the-fly rather than adding them up because opening file consumes memory and the string again consumes memory. So you're using a lot of memory which is inefficient in the first place.
– Mani Kumar Reddy Kancharla
Nov 12 at 9:45

I tried your code using 'fileinput' module but it gives me a MemoryError. My system has 8GB RAM and the combined size of the 550 files in the directory is around 2.3 GB. Any ideas how to avoid this?
– Arun
Nov 10 at 14:56

Could be that your machine has not enough free RAM or your OS has limited Python to a certain amount of RAM only. Nevertheless, as I suggested you better do whatever you are planning to do with those files on-the-fly rather than adding them up because opening file consumes memory and the string again consumes memory. So you're using a lot of memory which is inefficient in the first place.
– Mani Kumar Reddy Kancharla
Nov 12 at 9:45

add a comment |

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Pfthb