Python fuzzy string matching as correlation style table/matrix

I have a file with x number of string names and their associated IDs. Essentially two columns of data.

What I would like, is a correlation style table with the format x by x (having the data in question both as the x-axis and y axis), but instead of correlation, I would like the fuzzywuzzy library's function fuzz.ratio(x,y) as the output using the string names as input. Essentially running every entry against every entry.

This is sort of what I had in mind. Just to show my intent:

import pandas as pd
from fuzzywuzzy import fuzz

df = pd.read_csv('random_data_file.csv')

df = df[['ID','String']]
df['String_Dup'] = df['String'] #creating duplicate of data in question
df = df.set_index('ID')

df = df.groupby('ID')[['String','String_Dup']].apply(fuzz.ratio())

But clearly this approach is not working for me at the moment. Any help appreciated. It doesn't have to be pandas, it is just an environment I am relatively more familiar with.

I hope my issue is clearly worded, and really, any input is appreciated,

edited Nov 12 '18 at 11:59

Brian Tompsett - 汤莱恩

4,1931337101

asked Nov 12 '18 at 11:28

WayOutofDepth

323

add a comment |

I have a file with x number of string names and their associated IDs. Essentially two columns of data.

This is sort of what I had in mind. Just to show my intent:

import pandas as pd
from fuzzywuzzy import fuzz

df = pd.read_csv('random_data_file.csv')

df = df[['ID','String']]
df['String_Dup'] = df['String'] #creating duplicate of data in question
df = df.set_index('ID')

df = df.groupby('ID')[['String','String_Dup']].apply(fuzz.ratio())

But clearly this approach is not working for me at the moment. Any help appreciated. It doesn't have to be pandas, it is just an environment I am relatively more familiar with.

I hope my issue is clearly worded, and really, any input is appreciated,

edited Nov 12 '18 at 11:59

Brian Tompsett - 汤莱恩

4,1931337101

asked Nov 12 '18 at 11:28

WayOutofDepth

323

add a comment |

I have a file with x number of string names and their associated IDs. Essentially two columns of data.

This is sort of what I had in mind. Just to show my intent:

import pandas as pd
from fuzzywuzzy import fuzz

df = pd.read_csv('random_data_file.csv')

df = df[['ID','String']]
df['String_Dup'] = df['String'] #creating duplicate of data in question
df = df.set_index('ID')

df = df.groupby('ID')[['String','String_Dup']].apply(fuzz.ratio())

But clearly this approach is not working for me at the moment. Any help appreciated. It doesn't have to be pandas, it is just an environment I am relatively more familiar with.

I hope my issue is clearly worded, and really, any input is appreciated,

edited Nov 12 '18 at 11:59

Brian Tompsett - 汤莱恩

4,1931337101

asked Nov 12 '18 at 11:28

WayOutofDepth

323

I have a file with x number of string names and their associated IDs. Essentially two columns of data.

This is sort of what I had in mind. Just to show my intent:

import pandas as pd
from fuzzywuzzy import fuzz

df = pd.read_csv('random_data_file.csv')

df = df[['ID','String']]
df['String_Dup'] = df['String'] #creating duplicate of data in question
df = df.set_index('ID')

df = df.groupby('ID')[['String','String_Dup']].apply(fuzz.ratio())

But clearly this approach is not working for me at the moment. Any help appreciated. It doesn't have to be pandas, it is just an environment I am relatively more familiar with.

I hope my issue is clearly worded, and really, any input is appreciated,

python pandas matrix fuzzy

edited Nov 12 '18 at 11:59

Brian Tompsett - 汤莱恩

4,1931337101

asked Nov 12 '18 at 11:28

WayOutofDepth

323

edited Nov 12 '18 at 11:59

Brian Tompsett - 汤莱恩

4,1931337101

asked Nov 12 '18 at 11:28

WayOutofDepth

323

edited Nov 12 '18 at 11:59

Brian Tompsett - 汤莱恩

4,1931337101

edited Nov 12 '18 at 11:59

Brian Tompsett - 汤莱恩

4,1931337101

edited Nov 12 '18 at 11:59

Brian Tompsett - 汤莱恩

4,1931337101

asked Nov 12 '18 at 11:28

WayOutofDepth

323

asked Nov 12 '18 at 11:28

WayOutofDepth

323

asked Nov 12 '18 at 11:28

WayOutofDepth

323

add a comment |

3 Answers
3

active

oldest

votes

Use pandas' crosstab function, followed by a column-wise apply to compute the fuzz.
This is considerably more elegant than my first answer.

import pandas as pd
from fuzzywuzzy import fuzz

# Create sample data frame.
df = pd.DataFrame([(1, 'abracadabra'), (2,'abc'), (3,'cadra'), (4, 'brabra')],
 columns=['id', 'strings'])
# Create the cartesian product between the strings column with itself.
ct = pd.crosstab(df['strings'], df['strings'])
# Note: for pandas versions <0.22, the two series must have different names.
# In case you observe a "Level XX not found" error, the following may help:
# ct = pd.crosstab(df['strings'].rename(), df['strings'].rename())

# Apply the fuzz (column-wise). Argument col has type pd.Series.
ct = ct.apply(lambda col: [fuzz.ratio(col.name, x) for x in col.index])

# This results in the following:
# strings abc abracadabra brabra cadra
# strings
# abc 100 43 44 25
# abracadabra 43 100 71 62
# brabra 44 71 100 55
# cadra 25 62 55 100

For simplicity, I omitted the groupby operation as suggested in your question. In case need want to apply the fuzzy string matching on groups, simply create a separate function:

def cross_fuzz(df):
 ct = pd.crosstab(df['strings'], df['strings'])
 ct = ct.apply(lambda col: [fuzz.ratio(col.name, x) for x in col.index])
 return ct

df.groupby('id').apply(cross_fuzz)

edited Nov 12 '18 at 21:55

answered Nov 12 '18 at 13:33

normanius

1,4841128

Thanks! This works as long as i do not attempt to cross tabulate the same series. i.e. i had to create a copy of 'strings' and call it something else, otherwise it gave me a 'level strings not found error'. I am happy to accept this as the answer otherwise.

– WayOutofDepth
Nov 12 '18 at 15:43

The above code works for pandas 0.22.0. But you're right, former versions of pandas (e.g. 0.20.3) required the series to have different names. I edited my answer accordingly. Thanks for pointing this out.

– normanius
Nov 12 '18 at 16:11

add a comment |

import csv
from fuzzywuzzy import fuzz
import numpy as np 

input_file = csv.DictReader(open('random_data_file.csv')) 
string = 
for row in input_file: #file is appended row by row into a python dictionary
 string.append(row["String"]) #keys for the dict. are the headers



#now you have a list of the string values
length = len(string)
resultMat = np.zeros((length, length)) #zeros 2D matrix, with size X * X

for i in range (length):
 for j in range (length):
 resultMat[i][j] = fuzz.ratio(string[i], string[j])

print resultMat

I did the implementation in a numby 2D matrix. I am not that good in pandas, but I think what you were doing is adding another column and comparing it to the string column, meaning: string[i] will be matched with string_dub[i], all results will be 100

Hope it helps

answered Nov 12 '18 at 13:05

Hossam26644

446

add a comment |

In pandas, the cartesian cross product between two columns can be created using a dummy variable and pd.merge. The fuzz operation is applied using apply. A final pivot operation will extract the format you had in mind. For simplicity, I omitted the groupby operation, but of course, you could apply the procedure to all group-tables by moving the code below into a separate function.

Here is what this could look like:

import pandas as pd
from fuzzywuzzy import fuzz

# Create sample data frame.
df = pd.DataFrame([(1, 'abracadabra'), (2,'abc'), (3,'cadra'), (4, 'brabra')],
 columns=['id', 'strings'])

# Cross product, using a temporary column.
df['_tmp'] = 0
mrg = pd.merge(df, df, on='_tmp', suffixes=['_1','_2'])

# Apply the function between the two strings.
mrg['fuzz'] = mrg.apply(lambda s: fuzz.ratio(s['strings_1'], s['strings_2']), axis=1)

# Reorganize data.
ret = mrg.pivot(index='strings_1', columns='strings_2', values='fuzz')
ret.index.name = None 
ret.columns.name = None

# This results in the following:
# abc abracadabra brabra cadra
# abc 100 43 44 25
# abracadabra 43 100 71 62
# brabra 44 71 100 55
# cadra 25 62 55 100

edited Nov 12 '18 at 13:39

answered Nov 12 '18 at 13:07

normanius

1,4841128

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53261214%2fpython-fuzzy-string-matching-as-correlation-style-table-matrix%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

Use pandas' crosstab function, followed by a column-wise apply to compute the fuzz.
This is considerably more elegant than my first answer.

import pandas as pd
from fuzzywuzzy import fuzz

# Create sample data frame.
df = pd.DataFrame([(1, 'abracadabra'), (2,'abc'), (3,'cadra'), (4, 'brabra')],
 columns=['id', 'strings'])
# Create the cartesian product between the strings column with itself.
ct = pd.crosstab(df['strings'], df['strings'])
# Note: for pandas versions <0.22, the two series must have different names.
# In case you observe a "Level XX not found" error, the following may help:
# ct = pd.crosstab(df['strings'].rename(), df['strings'].rename())

# Apply the fuzz (column-wise). Argument col has type pd.Series.
ct = ct.apply(lambda col: [fuzz.ratio(col.name, x) for x in col.index])

# This results in the following:
# strings abc abracadabra brabra cadra
# strings
# abc 100 43 44 25
# abracadabra 43 100 71 62
# brabra 44 71 100 55
# cadra 25 62 55 100

For simplicity, I omitted the groupby operation as suggested in your question. In case need want to apply the fuzzy string matching on groups, simply create a separate function:

def cross_fuzz(df):
 ct = pd.crosstab(df['strings'], df['strings'])
 ct = ct.apply(lambda col: [fuzz.ratio(col.name, x) for x in col.index])
 return ct

df.groupby('id').apply(cross_fuzz)

edited Nov 12 '18 at 21:55

answered Nov 12 '18 at 13:33

normanius

1,4841128

Thanks! This works as long as i do not attempt to cross tabulate the same series. i.e. i had to create a copy of 'strings' and call it something else, otherwise it gave me a 'level strings not found error'. I am happy to accept this as the answer otherwise.

– WayOutofDepth
Nov 12 '18 at 15:43

The above code works for pandas 0.22.0. But you're right, former versions of pandas (e.g. 0.20.3) required the series to have different names. I edited my answer accordingly. Thanks for pointing this out.

– normanius
Nov 12 '18 at 16:11

add a comment |

Use pandas' crosstab function, followed by a column-wise apply to compute the fuzz.
This is considerably more elegant than my first answer.

import pandas as pd
from fuzzywuzzy import fuzz

# Create sample data frame.
df = pd.DataFrame([(1, 'abracadabra'), (2,'abc'), (3,'cadra'), (4, 'brabra')],
 columns=['id', 'strings'])
# Create the cartesian product between the strings column with itself.
ct = pd.crosstab(df['strings'], df['strings'])
# Note: for pandas versions <0.22, the two series must have different names.
# In case you observe a "Level XX not found" error, the following may help:
# ct = pd.crosstab(df['strings'].rename(), df['strings'].rename())

# Apply the fuzz (column-wise). Argument col has type pd.Series.
ct = ct.apply(lambda col: [fuzz.ratio(col.name, x) for x in col.index])

# This results in the following:
# strings abc abracadabra brabra cadra
# strings
# abc 100 43 44 25
# abracadabra 43 100 71 62
# brabra 44 71 100 55
# cadra 25 62 55 100

For simplicity, I omitted the groupby operation as suggested in your question. In case need want to apply the fuzzy string matching on groups, simply create a separate function:

def cross_fuzz(df):
 ct = pd.crosstab(df['strings'], df['strings'])
 ct = ct.apply(lambda col: [fuzz.ratio(col.name, x) for x in col.index])
 return ct

df.groupby('id').apply(cross_fuzz)

edited Nov 12 '18 at 21:55

answered Nov 12 '18 at 13:33

normanius

1,4841128

Thanks! This works as long as i do not attempt to cross tabulate the same series. i.e. i had to create a copy of 'strings' and call it something else, otherwise it gave me a 'level strings not found error'. I am happy to accept this as the answer otherwise.

– WayOutofDepth
Nov 12 '18 at 15:43

The above code works for pandas 0.22.0. But you're right, former versions of pandas (e.g. 0.20.3) required the series to have different names. I edited my answer accordingly. Thanks for pointing this out.

– normanius
Nov 12 '18 at 16:11

add a comment |

Use pandas' crosstab function, followed by a column-wise apply to compute the fuzz.
This is considerably more elegant than my first answer.

import pandas as pd
from fuzzywuzzy import fuzz

# Create sample data frame.
df = pd.DataFrame([(1, 'abracadabra'), (2,'abc'), (3,'cadra'), (4, 'brabra')],
 columns=['id', 'strings'])
# Create the cartesian product between the strings column with itself.
ct = pd.crosstab(df['strings'], df['strings'])
# Note: for pandas versions <0.22, the two series must have different names.
# In case you observe a "Level XX not found" error, the following may help:
# ct = pd.crosstab(df['strings'].rename(), df['strings'].rename())

# Apply the fuzz (column-wise). Argument col has type pd.Series.
ct = ct.apply(lambda col: [fuzz.ratio(col.name, x) for x in col.index])

# This results in the following:
# strings abc abracadabra brabra cadra
# strings
# abc 100 43 44 25
# abracadabra 43 100 71 62
# brabra 44 71 100 55
# cadra 25 62 55 100

For simplicity, I omitted the groupby operation as suggested in your question. In case need want to apply the fuzzy string matching on groups, simply create a separate function:

def cross_fuzz(df):
 ct = pd.crosstab(df['strings'], df['strings'])
 ct = ct.apply(lambda col: [fuzz.ratio(col.name, x) for x in col.index])
 return ct

df.groupby('id').apply(cross_fuzz)

edited Nov 12 '18 at 21:55

answered Nov 12 '18 at 13:33

normanius

1,4841128

Use pandas' crosstab function, followed by a column-wise apply to compute the fuzz.
This is considerably more elegant than my first answer.

import pandas as pd
from fuzzywuzzy import fuzz

# Create sample data frame.
df = pd.DataFrame([(1, 'abracadabra'), (2,'abc'), (3,'cadra'), (4, 'brabra')],
 columns=['id', 'strings'])
# Create the cartesian product between the strings column with itself.
ct = pd.crosstab(df['strings'], df['strings'])
# Note: for pandas versions <0.22, the two series must have different names.
# In case you observe a "Level XX not found" error, the following may help:
# ct = pd.crosstab(df['strings'].rename(), df['strings'].rename())

# Apply the fuzz (column-wise). Argument col has type pd.Series.
ct = ct.apply(lambda col: [fuzz.ratio(col.name, x) for x in col.index])

# This results in the following:
# strings abc abracadabra brabra cadra
# strings
# abc 100 43 44 25
# abracadabra 43 100 71 62
# brabra 44 71 100 55
# cadra 25 62 55 100

For simplicity, I omitted the groupby operation as suggested in your question. In case need want to apply the fuzzy string matching on groups, simply create a separate function:

def cross_fuzz(df):
 ct = pd.crosstab(df['strings'], df['strings'])
 ct = ct.apply(lambda col: [fuzz.ratio(col.name, x) for x in col.index])
 return ct

df.groupby('id').apply(cross_fuzz)

edited Nov 12 '18 at 21:55

answered Nov 12 '18 at 13:33

normanius

1,4841128

edited Nov 12 '18 at 21:55

answered Nov 12 '18 at 13:33

normanius

1,4841128

answered Nov 12 '18 at 13:33

normanius

1,4841128

answered Nov 12 '18 at 13:33

normanius

1,4841128

Thanks! This works as long as i do not attempt to cross tabulate the same series. i.e. i had to create a copy of 'strings' and call it something else, otherwise it gave me a 'level strings not found error'. I am happy to accept this as the answer otherwise.

– WayOutofDepth
Nov 12 '18 at 15:43

The above code works for pandas 0.22.0. But you're right, former versions of pandas (e.g. 0.20.3) required the series to have different names. I edited my answer accordingly. Thanks for pointing this out.

– normanius
Nov 12 '18 at 16:11

add a comment |

Thanks! This works as long as i do not attempt to cross tabulate the same series. i.e. i had to create a copy of 'strings' and call it something else, otherwise it gave me a 'level strings not found error'. I am happy to accept this as the answer otherwise.

– WayOutofDepth
Nov 12 '18 at 15:43

The above code works for pandas 0.22.0. But you're right, former versions of pandas (e.g. 0.20.3) required the series to have different names. I edited my answer accordingly. Thanks for pointing this out.

– normanius
Nov 12 '18 at 16:11

Thanks! This works as long as i do not attempt to cross tabulate the same series. i.e. i had to create a copy of 'strings' and call it something else, otherwise it gave me a 'level strings not found error'. I am happy to accept this as the answer otherwise.

– WayOutofDepth
Nov 12 '18 at 15:43

The above code works for pandas 0.22.0. But you're right, former versions of pandas (e.g. 0.20.3) required the series to have different names. I edited my answer accordingly. Thanks for pointing this out.

– normanius
Nov 12 '18 at 16:11

add a comment |

import csv
from fuzzywuzzy import fuzz
import numpy as np 

input_file = csv.DictReader(open('random_data_file.csv')) 
string = 
for row in input_file: #file is appended row by row into a python dictionary
 string.append(row["String"]) #keys for the dict. are the headers



#now you have a list of the string values
length = len(string)
resultMat = np.zeros((length, length)) #zeros 2D matrix, with size X * X

for i in range (length):
 for j in range (length):
 resultMat[i][j] = fuzz.ratio(string[i], string[j])

print resultMat

Hope it helps

answered Nov 12 '18 at 13:05

Hossam26644

446

add a comment |

import csv
from fuzzywuzzy import fuzz
import numpy as np 

input_file = csv.DictReader(open('random_data_file.csv')) 
string = 
for row in input_file: #file is appended row by row into a python dictionary
 string.append(row["String"]) #keys for the dict. are the headers



#now you have a list of the string values
length = len(string)
resultMat = np.zeros((length, length)) #zeros 2D matrix, with size X * X

for i in range (length):
 for j in range (length):
 resultMat[i][j] = fuzz.ratio(string[i], string[j])

print resultMat

Hope it helps

answered Nov 12 '18 at 13:05

Hossam26644

446

add a comment |

import csv
from fuzzywuzzy import fuzz
import numpy as np 

input_file = csv.DictReader(open('random_data_file.csv')) 
string = 
for row in input_file: #file is appended row by row into a python dictionary
 string.append(row["String"]) #keys for the dict. are the headers



#now you have a list of the string values
length = len(string)
resultMat = np.zeros((length, length)) #zeros 2D matrix, with size X * X

for i in range (length):
 for j in range (length):
 resultMat[i][j] = fuzz.ratio(string[i], string[j])

print resultMat

Hope it helps

answered Nov 12 '18 at 13:05

Hossam26644

446

import csv
from fuzzywuzzy import fuzz
import numpy as np 

input_file = csv.DictReader(open('random_data_file.csv')) 
string = 
for row in input_file: #file is appended row by row into a python dictionary
 string.append(row["String"]) #keys for the dict. are the headers



#now you have a list of the string values
length = len(string)
resultMat = np.zeros((length, length)) #zeros 2D matrix, with size X * X

for i in range (length):
 for j in range (length):
 resultMat[i][j] = fuzz.ratio(string[i], string[j])

print resultMat

Hope it helps

answered Nov 12 '18 at 13:05

Hossam26644

446

answered Nov 12 '18 at 13:05

Hossam26644

446

answered Nov 12 '18 at 13:05

Hossam26644

446

answered Nov 12 '18 at 13:05

Hossam26644

446

add a comment |

Here is what this could look like:

import pandas as pd
from fuzzywuzzy import fuzz

# Create sample data frame.
df = pd.DataFrame([(1, 'abracadabra'), (2,'abc'), (3,'cadra'), (4, 'brabra')],
 columns=['id', 'strings'])

# Cross product, using a temporary column.
df['_tmp'] = 0
mrg = pd.merge(df, df, on='_tmp', suffixes=['_1','_2'])

# Apply the function between the two strings.
mrg['fuzz'] = mrg.apply(lambda s: fuzz.ratio(s['strings_1'], s['strings_2']), axis=1)

# Reorganize data.
ret = mrg.pivot(index='strings_1', columns='strings_2', values='fuzz')
ret.index.name = None 
ret.columns.name = None

# This results in the following:
# abc abracadabra brabra cadra
# abc 100 43 44 25
# abracadabra 43 100 71 62
# brabra 44 71 100 55
# cadra 25 62 55 100

edited Nov 12 '18 at 13:39

answered Nov 12 '18 at 13:07

normanius

1,4841128

add a comment |

Here is what this could look like:

import pandas as pd
from fuzzywuzzy import fuzz

# Create sample data frame.
df = pd.DataFrame([(1, 'abracadabra'), (2,'abc'), (3,'cadra'), (4, 'brabra')],
 columns=['id', 'strings'])

# Cross product, using a temporary column.
df['_tmp'] = 0
mrg = pd.merge(df, df, on='_tmp', suffixes=['_1','_2'])

# Apply the function between the two strings.
mrg['fuzz'] = mrg.apply(lambda s: fuzz.ratio(s['strings_1'], s['strings_2']), axis=1)

# Reorganize data.
ret = mrg.pivot(index='strings_1', columns='strings_2', values='fuzz')
ret.index.name = None 
ret.columns.name = None

# This results in the following:
# abc abracadabra brabra cadra
# abc 100 43 44 25
# abracadabra 43 100 71 62
# brabra 44 71 100 55
# cadra 25 62 55 100

edited Nov 12 '18 at 13:39

answered Nov 12 '18 at 13:07

normanius

1,4841128

add a comment |

Here is what this could look like:

import pandas as pd
from fuzzywuzzy import fuzz

# Create sample data frame.
df = pd.DataFrame([(1, 'abracadabra'), (2,'abc'), (3,'cadra'), (4, 'brabra')],
 columns=['id', 'strings'])

# Cross product, using a temporary column.
df['_tmp'] = 0
mrg = pd.merge(df, df, on='_tmp', suffixes=['_1','_2'])

# Apply the function between the two strings.
mrg['fuzz'] = mrg.apply(lambda s: fuzz.ratio(s['strings_1'], s['strings_2']), axis=1)

# Reorganize data.
ret = mrg.pivot(index='strings_1', columns='strings_2', values='fuzz')
ret.index.name = None 
ret.columns.name = None

# This results in the following:
# abc abracadabra brabra cadra
# abc 100 43 44 25
# abracadabra 43 100 71 62
# brabra 44 71 100 55
# cadra 25 62 55 100

edited Nov 12 '18 at 13:39

answered Nov 12 '18 at 13:07

normanius

1,4841128

Here is what this could look like:

import pandas as pd
from fuzzywuzzy import fuzz

# Create sample data frame.
df = pd.DataFrame([(1, 'abracadabra'), (2,'abc'), (3,'cadra'), (4, 'brabra')],
 columns=['id', 'strings'])

# Cross product, using a temporary column.
df['_tmp'] = 0
mrg = pd.merge(df, df, on='_tmp', suffixes=['_1','_2'])

# Apply the function between the two strings.
mrg['fuzz'] = mrg.apply(lambda s: fuzz.ratio(s['strings_1'], s['strings_2']), axis=1)

# Reorganize data.
ret = mrg.pivot(index='strings_1', columns='strings_2', values='fuzz')
ret.index.name = None 
ret.columns.name = None

# This results in the following:
# abc abracadabra brabra cadra
# abc 100 43 44 25
# abracadabra 43 100 71 62
# brabra 44 71 100 55
# cadra 25 62 55 100

edited Nov 12 '18 at 13:39

answered Nov 12 '18 at 13:07

normanius

1,4841128

edited Nov 12 '18 at 13:39

answered Nov 12 '18 at 13:07

normanius

1,4841128

answered Nov 12 '18 at 13:07

normanius

1,4841128

answered Nov 12 '18 at 13:07

normanius

1,4841128

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Pfthb