Add string column to float matrix NumPy
I'm looking for a method to add a column of float values to a matrix of string values.
Mymatrix =
[["a","b"],
["c","d"]]
I need to have a matrix like this =
[["a","b",0.4],
["c","d",0.6]]
python string numpy matrix floating-point
add a comment |
I'm looking for a method to add a column of float values to a matrix of string values.
Mymatrix =
[["a","b"],
["c","d"]]
I need to have a matrix like this =
[["a","b",0.4],
["c","d",0.6]]
python string numpy matrix floating-point
4
You cannot have that in NumPy (unless you have an array ofobject
, which is not usually very useful). Consider using just Python (nested) lists or other advanced structures like a Pandas data frame.
– jdehesa
Nov 13 '18 at 10:59
You're right! Thank you so much
– Vin B.
Nov 13 '18 at 13:03
add a comment |
I'm looking for a method to add a column of float values to a matrix of string values.
Mymatrix =
[["a","b"],
["c","d"]]
I need to have a matrix like this =
[["a","b",0.4],
["c","d",0.6]]
python string numpy matrix floating-point
I'm looking for a method to add a column of float values to a matrix of string values.
Mymatrix =
[["a","b"],
["c","d"]]
I need to have a matrix like this =
[["a","b",0.4],
["c","d",0.6]]
python string numpy matrix floating-point
python string numpy matrix floating-point
edited Nov 13 '18 at 11:02
Mehrdad Pedramfar
5,69511439
5,69511439
asked Nov 13 '18 at 10:54
Vin B.Vin B.
134
134
4
You cannot have that in NumPy (unless you have an array ofobject
, which is not usually very useful). Consider using just Python (nested) lists or other advanced structures like a Pandas data frame.
– jdehesa
Nov 13 '18 at 10:59
You're right! Thank you so much
– Vin B.
Nov 13 '18 at 13:03
add a comment |
4
You cannot have that in NumPy (unless you have an array ofobject
, which is not usually very useful). Consider using just Python (nested) lists or other advanced structures like a Pandas data frame.
– jdehesa
Nov 13 '18 at 10:59
You're right! Thank you so much
– Vin B.
Nov 13 '18 at 13:03
4
4
You cannot have that in NumPy (unless you have an array of
object
, which is not usually very useful). Consider using just Python (nested) lists or other advanced structures like a Pandas data frame.– jdehesa
Nov 13 '18 at 10:59
You cannot have that in NumPy (unless you have an array of
object
, which is not usually very useful). Consider using just Python (nested) lists or other advanced structures like a Pandas data frame.– jdehesa
Nov 13 '18 at 10:59
You're right! Thank you so much
– Vin B.
Nov 13 '18 at 13:03
You're right! Thank you so much
– Vin B.
Nov 13 '18 at 13:03
add a comment |
3 Answers
3
active
oldest
votes
As noted you can't mix data types in a ndarray, but can do so in a structured or record array. They are similar in that you can mix datatypes as defined by the dtype=
argument (it defines the datatypes and field names). Record arrays allow access to fields of structured arrays by attribute instead of only by index. You don't need for
loops when you want to copy the entire contents between arrays. See my example below (using your data):
Mymatrix = np.array([["a","b"], ["c","d"]])
Mycol = np.array([0.4, 0.6])
dt=np.dtype([('col0','U1'),('col1','U1'),('col2',float)])
new_recarr = np.empty((2,), dtype=dt)
new_recarr['col0'] = Mymatrix[:,0]
new_recarr['col1'] = Mymatrix[:,1]
new_recarr['col2'] = Mycol[:]
print (new_recarr)
Resulting output looks like this:
[('a', 'b', 0.4) ('c', 'd', 0.6)]
From there, use formatted strings to print.
You can also copy from a recarray to an ndarray if you reverse assignment order in my example.
Note: I discovered there can be a significant performance penalty when using recarrays. See answer in this thread:
is ndarray faster than recarray access?
add a comment |
I would suggest using a pandas
DataFrame instead:
import pandas as pd
df = pd.DataFrame([["a","b",0.4],
["c","d",0.6]])
print(df)
0 1 2
0 a b 0.4
1 c d 0.6
You can also specify column (Series
) names:
df = pd.DataFrame([["a","b",0.4],
["c","d",0.6]], columns=['A', 'B', 'C'])
df
A B C
0 a b 0.4
1 c d 0.6
add a comment |
You need to understand why you do that. Numpy is efficient because data are aligned in memory. So mixing types is generally source of bad performance. but in your case you can preserve alignement, since all your strings have same length. since types are not homogeneous, you can use structured array:
raw=[["a","b",0.4],
["c","d",0.6]]
dt=dtype([('col0','U1'),('col1','U1'),('col2',float)])
aligned=ndarray(len(raw),dt)
for i in range (len(raw)):
for j in range (len(dt)):
aligned[i][j]=raw[i][j]
You can also use pandas, but you loose often some performance.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53279442%2fadd-string-column-to-float-matrix-numpy%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
As noted you can't mix data types in a ndarray, but can do so in a structured or record array. They are similar in that you can mix datatypes as defined by the dtype=
argument (it defines the datatypes and field names). Record arrays allow access to fields of structured arrays by attribute instead of only by index. You don't need for
loops when you want to copy the entire contents between arrays. See my example below (using your data):
Mymatrix = np.array([["a","b"], ["c","d"]])
Mycol = np.array([0.4, 0.6])
dt=np.dtype([('col0','U1'),('col1','U1'),('col2',float)])
new_recarr = np.empty((2,), dtype=dt)
new_recarr['col0'] = Mymatrix[:,0]
new_recarr['col1'] = Mymatrix[:,1]
new_recarr['col2'] = Mycol[:]
print (new_recarr)
Resulting output looks like this:
[('a', 'b', 0.4) ('c', 'd', 0.6)]
From there, use formatted strings to print.
You can also copy from a recarray to an ndarray if you reverse assignment order in my example.
Note: I discovered there can be a significant performance penalty when using recarrays. See answer in this thread:
is ndarray faster than recarray access?
add a comment |
As noted you can't mix data types in a ndarray, but can do so in a structured or record array. They are similar in that you can mix datatypes as defined by the dtype=
argument (it defines the datatypes and field names). Record arrays allow access to fields of structured arrays by attribute instead of only by index. You don't need for
loops when you want to copy the entire contents between arrays. See my example below (using your data):
Mymatrix = np.array([["a","b"], ["c","d"]])
Mycol = np.array([0.4, 0.6])
dt=np.dtype([('col0','U1'),('col1','U1'),('col2',float)])
new_recarr = np.empty((2,), dtype=dt)
new_recarr['col0'] = Mymatrix[:,0]
new_recarr['col1'] = Mymatrix[:,1]
new_recarr['col2'] = Mycol[:]
print (new_recarr)
Resulting output looks like this:
[('a', 'b', 0.4) ('c', 'd', 0.6)]
From there, use formatted strings to print.
You can also copy from a recarray to an ndarray if you reverse assignment order in my example.
Note: I discovered there can be a significant performance penalty when using recarrays. See answer in this thread:
is ndarray faster than recarray access?
add a comment |
As noted you can't mix data types in a ndarray, but can do so in a structured or record array. They are similar in that you can mix datatypes as defined by the dtype=
argument (it defines the datatypes and field names). Record arrays allow access to fields of structured arrays by attribute instead of only by index. You don't need for
loops when you want to copy the entire contents between arrays. See my example below (using your data):
Mymatrix = np.array([["a","b"], ["c","d"]])
Mycol = np.array([0.4, 0.6])
dt=np.dtype([('col0','U1'),('col1','U1'),('col2',float)])
new_recarr = np.empty((2,), dtype=dt)
new_recarr['col0'] = Mymatrix[:,0]
new_recarr['col1'] = Mymatrix[:,1]
new_recarr['col2'] = Mycol[:]
print (new_recarr)
Resulting output looks like this:
[('a', 'b', 0.4) ('c', 'd', 0.6)]
From there, use formatted strings to print.
You can also copy from a recarray to an ndarray if you reverse assignment order in my example.
Note: I discovered there can be a significant performance penalty when using recarrays. See answer in this thread:
is ndarray faster than recarray access?
As noted you can't mix data types in a ndarray, but can do so in a structured or record array. They are similar in that you can mix datatypes as defined by the dtype=
argument (it defines the datatypes and field names). Record arrays allow access to fields of structured arrays by attribute instead of only by index. You don't need for
loops when you want to copy the entire contents between arrays. See my example below (using your data):
Mymatrix = np.array([["a","b"], ["c","d"]])
Mycol = np.array([0.4, 0.6])
dt=np.dtype([('col0','U1'),('col1','U1'),('col2',float)])
new_recarr = np.empty((2,), dtype=dt)
new_recarr['col0'] = Mymatrix[:,0]
new_recarr['col1'] = Mymatrix[:,1]
new_recarr['col2'] = Mycol[:]
print (new_recarr)
Resulting output looks like this:
[('a', 'b', 0.4) ('c', 'd', 0.6)]
From there, use formatted strings to print.
You can also copy from a recarray to an ndarray if you reverse assignment order in my example.
Note: I discovered there can be a significant performance penalty when using recarrays. See answer in this thread:
is ndarray faster than recarray access?
answered Nov 13 '18 at 16:19
kcw78kcw78
345210
345210
add a comment |
add a comment |
I would suggest using a pandas
DataFrame instead:
import pandas as pd
df = pd.DataFrame([["a","b",0.4],
["c","d",0.6]])
print(df)
0 1 2
0 a b 0.4
1 c d 0.6
You can also specify column (Series
) names:
df = pd.DataFrame([["a","b",0.4],
["c","d",0.6]], columns=['A', 'B', 'C'])
df
A B C
0 a b 0.4
1 c d 0.6
add a comment |
I would suggest using a pandas
DataFrame instead:
import pandas as pd
df = pd.DataFrame([["a","b",0.4],
["c","d",0.6]])
print(df)
0 1 2
0 a b 0.4
1 c d 0.6
You can also specify column (Series
) names:
df = pd.DataFrame([["a","b",0.4],
["c","d",0.6]], columns=['A', 'B', 'C'])
df
A B C
0 a b 0.4
1 c d 0.6
add a comment |
I would suggest using a pandas
DataFrame instead:
import pandas as pd
df = pd.DataFrame([["a","b",0.4],
["c","d",0.6]])
print(df)
0 1 2
0 a b 0.4
1 c d 0.6
You can also specify column (Series
) names:
df = pd.DataFrame([["a","b",0.4],
["c","d",0.6]], columns=['A', 'B', 'C'])
df
A B C
0 a b 0.4
1 c d 0.6
I would suggest using a pandas
DataFrame instead:
import pandas as pd
df = pd.DataFrame([["a","b",0.4],
["c","d",0.6]])
print(df)
0 1 2
0 a b 0.4
1 c d 0.6
You can also specify column (Series
) names:
df = pd.DataFrame([["a","b",0.4],
["c","d",0.6]], columns=['A', 'B', 'C'])
df
A B C
0 a b 0.4
1 c d 0.6
answered Nov 13 '18 at 11:12
AlexAlex
773621
773621
add a comment |
add a comment |
You need to understand why you do that. Numpy is efficient because data are aligned in memory. So mixing types is generally source of bad performance. but in your case you can preserve alignement, since all your strings have same length. since types are not homogeneous, you can use structured array:
raw=[["a","b",0.4],
["c","d",0.6]]
dt=dtype([('col0','U1'),('col1','U1'),('col2',float)])
aligned=ndarray(len(raw),dt)
for i in range (len(raw)):
for j in range (len(dt)):
aligned[i][j]=raw[i][j]
You can also use pandas, but you loose often some performance.
add a comment |
You need to understand why you do that. Numpy is efficient because data are aligned in memory. So mixing types is generally source of bad performance. but in your case you can preserve alignement, since all your strings have same length. since types are not homogeneous, you can use structured array:
raw=[["a","b",0.4],
["c","d",0.6]]
dt=dtype([('col0','U1'),('col1','U1'),('col2',float)])
aligned=ndarray(len(raw),dt)
for i in range (len(raw)):
for j in range (len(dt)):
aligned[i][j]=raw[i][j]
You can also use pandas, but you loose often some performance.
add a comment |
You need to understand why you do that. Numpy is efficient because data are aligned in memory. So mixing types is generally source of bad performance. but in your case you can preserve alignement, since all your strings have same length. since types are not homogeneous, you can use structured array:
raw=[["a","b",0.4],
["c","d",0.6]]
dt=dtype([('col0','U1'),('col1','U1'),('col2',float)])
aligned=ndarray(len(raw),dt)
for i in range (len(raw)):
for j in range (len(dt)):
aligned[i][j]=raw[i][j]
You can also use pandas, but you loose often some performance.
You need to understand why you do that. Numpy is efficient because data are aligned in memory. So mixing types is generally source of bad performance. but in your case you can preserve alignement, since all your strings have same length. since types are not homogeneous, you can use structured array:
raw=[["a","b",0.4],
["c","d",0.6]]
dt=dtype([('col0','U1'),('col1','U1'),('col2',float)])
aligned=ndarray(len(raw),dt)
for i in range (len(raw)):
for j in range (len(dt)):
aligned[i][j]=raw[i][j]
You can also use pandas, but you loose often some performance.
answered Nov 13 '18 at 12:49
B. M.B. M.
13.3k12034
13.3k12034
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53279442%2fadd-string-column-to-float-matrix-numpy%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
4
You cannot have that in NumPy (unless you have an array of
object
, which is not usually very useful). Consider using just Python (nested) lists or other advanced structures like a Pandas data frame.– jdehesa
Nov 13 '18 at 10:59
You're right! Thank you so much
– Vin B.
Nov 13 '18 at 13:03