indexing error on the bin size generation,










1














I am trying to bin data



def binned(cols, bins, labels):
"""helper method that puts proper bin values in proper rows of pandas df"""
for col in cols:
yield ('_bin'.format(col), labels[np.digitize(calProbDf[col], bins)])

year2017 year2016 year2015 year2014 year2013 year2012 year2011 year2010 year2009 year2008 year2007 year2006 year2005 year2004 year2003 year2002 year2001 year2000 year1999
0 0 2302688.000 2413693.000 2256486.000 2168558.000 2066282.000 1985939.000 1951843.000 1931046.000 1923389.000 2097383.000 2244158.000 2044425.000 2008856.000 1885044.000 1808412.000 1717079.000 1672181.000 1117481.000 1496984.000
1 1 1362965.000 1350909.000 1327282.000 1313024.000 1305272.000 1337781.000 1338887.000 1327547.000 1259782.000 948843.000 868455.000 862390.000 841620.000 768903.000 765313.000 748585.000 447990.000 326099.000 253401.000
2 2 772403.000 778213.000 751895.000 748270.000 794325.000 701705.000 713557.000 745222.000 779142.000 898040.000 900440.000 902645.000 914607.000 933216.000 946243.000 1302167.000 1043364.000 1091437.000 1098097.000
3 3 760653.000 727597.000 683742.000 641941.000 616501.000 605726.000 601817.000 605987.000 633514.000 618503.000 576404.000 544912.000 522607.000 498113.000 466032.000 426481.000 383305.000 356646.000 333690.000
4 4 704425.000 703839.000 693907.000 696804.000 706696.000 727458.000 742575.000 746332.000 729393.000 721906.000 699326.000 711959.000 671249.000 583177.000 580600.000 576899.000 568372.000 576476.000 555458.000


I use following method to generate the bins



dit='0123456789'

binDefined = np.array([5, 10, 15, 20, 30, 50, 75, 100, 150, 200, 250, 300, 350, 400, 450, 500, 600,
700, 800, 900, 1000, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000, 3000,
4000, 5000, 10000, 15000, 20000, 25000, 30000, 50000, 100000, 200000, np.inf])


def id_generator(size=6, chars=string.digits):
"""generate a nonunique bin to avoid double counting"""
return ''.join(random.choice(chars) for _ in range(size))


binList= np.array(['b_'.format(id_generator(2, dit))for i in range(len(binDefined))])

cols = ('year2017', 'year2016', 'year2015', 'year2014',
'year2013', 'year2012', 'year2011', 'year2010', 'year2009', 'year2008',
'year2007', 'year2006', 'year2005', 'year2004', 'year2003', 'year2002',
'year2001', 'year2000', 'year1999')


binnedDF=calProbDf.assign(**dict(binned(cols, binDefined, binList)))


when I try to create assigned bins and generate binnedDF. I get out of range error.



IndexError Traceback (most recent call last)
<ipython-input-73-e18328775bee> in <module>()
20
21
---> 22 binnedDF=calProbDf.assign(**dict(binned(cols, binDefined, binList)))
23

<ipython-input-7-3cc0b93a043c> in binned(cols, bins, labels)
1 def binned(cols, bins, labels):
2 for col in cols:
----> 3 yield ('_bin'.format(col), labels[np.digitize(calProbDf[col], bins)])

IndexError: index 42 is out of bounds for axis 1 with size 42


The length of binList and binDefined is 42. there are 19 columns.



My goal is to classify each data into individual bin based on the binList and binDefined above.










share|improve this question





















  • Try printing the shape and len of variables in the function binned. By the way, this looks convoluted and over-thought for your stated goal. Have you looked into pandas.cut? pandas.pydata.org/pandas-docs/version/0.23.4/generated/…
    – Peter Leimbigler
    Nov 11 at 16:27










  • Peter I have tried pd.cut. How would you assign binA, binB, upto say about 45 bins using pd.cut for the above data?
    – lpt
    Nov 11 at 21:39















1














I am trying to bin data



def binned(cols, bins, labels):
"""helper method that puts proper bin values in proper rows of pandas df"""
for col in cols:
yield ('_bin'.format(col), labels[np.digitize(calProbDf[col], bins)])

year2017 year2016 year2015 year2014 year2013 year2012 year2011 year2010 year2009 year2008 year2007 year2006 year2005 year2004 year2003 year2002 year2001 year2000 year1999
0 0 2302688.000 2413693.000 2256486.000 2168558.000 2066282.000 1985939.000 1951843.000 1931046.000 1923389.000 2097383.000 2244158.000 2044425.000 2008856.000 1885044.000 1808412.000 1717079.000 1672181.000 1117481.000 1496984.000
1 1 1362965.000 1350909.000 1327282.000 1313024.000 1305272.000 1337781.000 1338887.000 1327547.000 1259782.000 948843.000 868455.000 862390.000 841620.000 768903.000 765313.000 748585.000 447990.000 326099.000 253401.000
2 2 772403.000 778213.000 751895.000 748270.000 794325.000 701705.000 713557.000 745222.000 779142.000 898040.000 900440.000 902645.000 914607.000 933216.000 946243.000 1302167.000 1043364.000 1091437.000 1098097.000
3 3 760653.000 727597.000 683742.000 641941.000 616501.000 605726.000 601817.000 605987.000 633514.000 618503.000 576404.000 544912.000 522607.000 498113.000 466032.000 426481.000 383305.000 356646.000 333690.000
4 4 704425.000 703839.000 693907.000 696804.000 706696.000 727458.000 742575.000 746332.000 729393.000 721906.000 699326.000 711959.000 671249.000 583177.000 580600.000 576899.000 568372.000 576476.000 555458.000


I use following method to generate the bins



dit='0123456789'

binDefined = np.array([5, 10, 15, 20, 30, 50, 75, 100, 150, 200, 250, 300, 350, 400, 450, 500, 600,
700, 800, 900, 1000, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000, 3000,
4000, 5000, 10000, 15000, 20000, 25000, 30000, 50000, 100000, 200000, np.inf])


def id_generator(size=6, chars=string.digits):
"""generate a nonunique bin to avoid double counting"""
return ''.join(random.choice(chars) for _ in range(size))


binList= np.array(['b_'.format(id_generator(2, dit))for i in range(len(binDefined))])

cols = ('year2017', 'year2016', 'year2015', 'year2014',
'year2013', 'year2012', 'year2011', 'year2010', 'year2009', 'year2008',
'year2007', 'year2006', 'year2005', 'year2004', 'year2003', 'year2002',
'year2001', 'year2000', 'year1999')


binnedDF=calProbDf.assign(**dict(binned(cols, binDefined, binList)))


when I try to create assigned bins and generate binnedDF. I get out of range error.



IndexError Traceback (most recent call last)
<ipython-input-73-e18328775bee> in <module>()
20
21
---> 22 binnedDF=calProbDf.assign(**dict(binned(cols, binDefined, binList)))
23

<ipython-input-7-3cc0b93a043c> in binned(cols, bins, labels)
1 def binned(cols, bins, labels):
2 for col in cols:
----> 3 yield ('_bin'.format(col), labels[np.digitize(calProbDf[col], bins)])

IndexError: index 42 is out of bounds for axis 1 with size 42


The length of binList and binDefined is 42. there are 19 columns.



My goal is to classify each data into individual bin based on the binList and binDefined above.










share|improve this question





















  • Try printing the shape and len of variables in the function binned. By the way, this looks convoluted and over-thought for your stated goal. Have you looked into pandas.cut? pandas.pydata.org/pandas-docs/version/0.23.4/generated/…
    – Peter Leimbigler
    Nov 11 at 16:27










  • Peter I have tried pd.cut. How would you assign binA, binB, upto say about 45 bins using pd.cut for the above data?
    – lpt
    Nov 11 at 21:39













1












1








1







I am trying to bin data



def binned(cols, bins, labels):
"""helper method that puts proper bin values in proper rows of pandas df"""
for col in cols:
yield ('_bin'.format(col), labels[np.digitize(calProbDf[col], bins)])

year2017 year2016 year2015 year2014 year2013 year2012 year2011 year2010 year2009 year2008 year2007 year2006 year2005 year2004 year2003 year2002 year2001 year2000 year1999
0 0 2302688.000 2413693.000 2256486.000 2168558.000 2066282.000 1985939.000 1951843.000 1931046.000 1923389.000 2097383.000 2244158.000 2044425.000 2008856.000 1885044.000 1808412.000 1717079.000 1672181.000 1117481.000 1496984.000
1 1 1362965.000 1350909.000 1327282.000 1313024.000 1305272.000 1337781.000 1338887.000 1327547.000 1259782.000 948843.000 868455.000 862390.000 841620.000 768903.000 765313.000 748585.000 447990.000 326099.000 253401.000
2 2 772403.000 778213.000 751895.000 748270.000 794325.000 701705.000 713557.000 745222.000 779142.000 898040.000 900440.000 902645.000 914607.000 933216.000 946243.000 1302167.000 1043364.000 1091437.000 1098097.000
3 3 760653.000 727597.000 683742.000 641941.000 616501.000 605726.000 601817.000 605987.000 633514.000 618503.000 576404.000 544912.000 522607.000 498113.000 466032.000 426481.000 383305.000 356646.000 333690.000
4 4 704425.000 703839.000 693907.000 696804.000 706696.000 727458.000 742575.000 746332.000 729393.000 721906.000 699326.000 711959.000 671249.000 583177.000 580600.000 576899.000 568372.000 576476.000 555458.000


I use following method to generate the bins



dit='0123456789'

binDefined = np.array([5, 10, 15, 20, 30, 50, 75, 100, 150, 200, 250, 300, 350, 400, 450, 500, 600,
700, 800, 900, 1000, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000, 3000,
4000, 5000, 10000, 15000, 20000, 25000, 30000, 50000, 100000, 200000, np.inf])


def id_generator(size=6, chars=string.digits):
"""generate a nonunique bin to avoid double counting"""
return ''.join(random.choice(chars) for _ in range(size))


binList= np.array(['b_'.format(id_generator(2, dit))for i in range(len(binDefined))])

cols = ('year2017', 'year2016', 'year2015', 'year2014',
'year2013', 'year2012', 'year2011', 'year2010', 'year2009', 'year2008',
'year2007', 'year2006', 'year2005', 'year2004', 'year2003', 'year2002',
'year2001', 'year2000', 'year1999')


binnedDF=calProbDf.assign(**dict(binned(cols, binDefined, binList)))


when I try to create assigned bins and generate binnedDF. I get out of range error.



IndexError Traceback (most recent call last)
<ipython-input-73-e18328775bee> in <module>()
20
21
---> 22 binnedDF=calProbDf.assign(**dict(binned(cols, binDefined, binList)))
23

<ipython-input-7-3cc0b93a043c> in binned(cols, bins, labels)
1 def binned(cols, bins, labels):
2 for col in cols:
----> 3 yield ('_bin'.format(col), labels[np.digitize(calProbDf[col], bins)])

IndexError: index 42 is out of bounds for axis 1 with size 42


The length of binList and binDefined is 42. there are 19 columns.



My goal is to classify each data into individual bin based on the binList and binDefined above.










share|improve this question













I am trying to bin data



def binned(cols, bins, labels):
"""helper method that puts proper bin values in proper rows of pandas df"""
for col in cols:
yield ('_bin'.format(col), labels[np.digitize(calProbDf[col], bins)])

year2017 year2016 year2015 year2014 year2013 year2012 year2011 year2010 year2009 year2008 year2007 year2006 year2005 year2004 year2003 year2002 year2001 year2000 year1999
0 0 2302688.000 2413693.000 2256486.000 2168558.000 2066282.000 1985939.000 1951843.000 1931046.000 1923389.000 2097383.000 2244158.000 2044425.000 2008856.000 1885044.000 1808412.000 1717079.000 1672181.000 1117481.000 1496984.000
1 1 1362965.000 1350909.000 1327282.000 1313024.000 1305272.000 1337781.000 1338887.000 1327547.000 1259782.000 948843.000 868455.000 862390.000 841620.000 768903.000 765313.000 748585.000 447990.000 326099.000 253401.000
2 2 772403.000 778213.000 751895.000 748270.000 794325.000 701705.000 713557.000 745222.000 779142.000 898040.000 900440.000 902645.000 914607.000 933216.000 946243.000 1302167.000 1043364.000 1091437.000 1098097.000
3 3 760653.000 727597.000 683742.000 641941.000 616501.000 605726.000 601817.000 605987.000 633514.000 618503.000 576404.000 544912.000 522607.000 498113.000 466032.000 426481.000 383305.000 356646.000 333690.000
4 4 704425.000 703839.000 693907.000 696804.000 706696.000 727458.000 742575.000 746332.000 729393.000 721906.000 699326.000 711959.000 671249.000 583177.000 580600.000 576899.000 568372.000 576476.000 555458.000


I use following method to generate the bins



dit='0123456789'

binDefined = np.array([5, 10, 15, 20, 30, 50, 75, 100, 150, 200, 250, 300, 350, 400, 450, 500, 600,
700, 800, 900, 1000, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000, 3000,
4000, 5000, 10000, 15000, 20000, 25000, 30000, 50000, 100000, 200000, np.inf])


def id_generator(size=6, chars=string.digits):
"""generate a nonunique bin to avoid double counting"""
return ''.join(random.choice(chars) for _ in range(size))


binList= np.array(['b_'.format(id_generator(2, dit))for i in range(len(binDefined))])

cols = ('year2017', 'year2016', 'year2015', 'year2014',
'year2013', 'year2012', 'year2011', 'year2010', 'year2009', 'year2008',
'year2007', 'year2006', 'year2005', 'year2004', 'year2003', 'year2002',
'year2001', 'year2000', 'year1999')


binnedDF=calProbDf.assign(**dict(binned(cols, binDefined, binList)))


when I try to create assigned bins and generate binnedDF. I get out of range error.



IndexError Traceback (most recent call last)
<ipython-input-73-e18328775bee> in <module>()
20
21
---> 22 binnedDF=calProbDf.assign(**dict(binned(cols, binDefined, binList)))
23

<ipython-input-7-3cc0b93a043c> in binned(cols, bins, labels)
1 def binned(cols, bins, labels):
2 for col in cols:
----> 3 yield ('_bin'.format(col), labels[np.digitize(calProbDf[col], bins)])

IndexError: index 42 is out of bounds for axis 1 with size 42


The length of binList and binDefined is 42. there are 19 columns.



My goal is to classify each data into individual bin based on the binList and binDefined above.







pandas numpy histogram






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 11 at 14:04









lpt

329111




329111











  • Try printing the shape and len of variables in the function binned. By the way, this looks convoluted and over-thought for your stated goal. Have you looked into pandas.cut? pandas.pydata.org/pandas-docs/version/0.23.4/generated/…
    – Peter Leimbigler
    Nov 11 at 16:27










  • Peter I have tried pd.cut. How would you assign binA, binB, upto say about 45 bins using pd.cut for the above data?
    – lpt
    Nov 11 at 21:39
















  • Try printing the shape and len of variables in the function binned. By the way, this looks convoluted and over-thought for your stated goal. Have you looked into pandas.cut? pandas.pydata.org/pandas-docs/version/0.23.4/generated/…
    – Peter Leimbigler
    Nov 11 at 16:27










  • Peter I have tried pd.cut. How would you assign binA, binB, upto say about 45 bins using pd.cut for the above data?
    – lpt
    Nov 11 at 21:39















Try printing the shape and len of variables in the function binned. By the way, this looks convoluted and over-thought for your stated goal. Have you looked into pandas.cut? pandas.pydata.org/pandas-docs/version/0.23.4/generated/…
– Peter Leimbigler
Nov 11 at 16:27




Try printing the shape and len of variables in the function binned. By the way, this looks convoluted and over-thought for your stated goal. Have you looked into pandas.cut? pandas.pydata.org/pandas-docs/version/0.23.4/generated/…
– Peter Leimbigler
Nov 11 at 16:27












Peter I have tried pd.cut. How would you assign binA, binB, upto say about 45 bins using pd.cut for the above data?
– lpt
Nov 11 at 21:39




Peter I have tried pd.cut. How would you assign binA, binB, upto say about 45 bins using pd.cut for the above data?
– lpt
Nov 11 at 21:39

















active

oldest

votes











Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53249518%2findexing-error-on-the-bin-size-generation%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown






























active

oldest

votes













active

oldest

votes









active

oldest

votes






active

oldest

votes















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53249518%2findexing-error-on-the-bin-size-generation%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Kleinkühnau

Makov (Slowakei)

Deutsches Schauspielhaus