How can I bootstrap the innermost array of a numpy array?

up vote
1
down vote

favorite

I have a numpy array of these dimensions

data.shape (categories, models, types, events): (10, 11, 50, 100)

Now I want to do sample with replacement in the innermost array (100) only. For a single array such as this:

data[0][0][0]

array([ 40.448624 , 39.459843 , 33.76762 , 38.944622 , 21.407362 , 35.55499 , 68.5111 , 16.512974 , 21.118315 , 18.447166 , 16.026619 , 21.596252 , 41.798622 , 63.01645 , 46.886642 , 68.874756 , 17.472408 , 53.015724 , 85.41213 , 59.388977 , 17.352108 , 61.161705 , 23.430847 , 20.203123 , 22.73194 , 77.40547 , 43.02974 , 29.745787 , 21.50163 , 13.820962 , 46.91466 , 41.43656 , 18.008326 , 13.122162 , 59.79936 , 94.555305 , 24.798452 , 30.362497 , 13.629236 , 10.792178 , 35.298515 , 20.904285 , 15.409604 , 20.567234 , 46.376335 , 13.82727 , 17.970661 , 18.408686 , 21.987917 , 21.30094 , 24.26776 , 27.399046 , 49.16879 , 21.831453 , 66.577 , 15.524615 , 18.091696 , 24.346598 , 24.709772 , 19.068447 , 24.221592 , 25.244864 , 52.865868 , 22.860783 , 23.586731 , 18.928782 , 21.960285 , 74.77856 , 15.176119 , 20.795431 , 14.3638935, 35.937237 , 29.993324 , 30.848495 , 48.145336 , 38.02541 , 101.15249 , 49.801117 , 38.123184 , 12.041505 , 18.788296 , 20.53382 , 31.20367 , 19.76104 , 92.56279 , 41.62944 , 23.53344 , 18.967432 , 14.781404 , 20.02018 , 27.736559 , 16.108913 , 44.935062 , 12.629299 , 34.65672 , 20.60169 , 21.779675 , 31.585844 , 23.768578 , 92.463196 ], dtype=float32)

I can do sample with replacement using this: np.random.choice(data[0][0][0], 100), which I will be doing thousands of times.

array([ 13.629236, 92.56279 , 21.960285, 20.567234, 21.50163 , 16.026619, 20.203123, 23.430847, 16.512974, 15.524615, 18.967432, 22.860783, 85.41213 , 21.779675, 23.586731, 24.26776 , 66.577 , 20.904285, 19.068447, 21.960285, 68.874756, 31.585844, 23.586731, 61.161705, 101.15249 , 59.79936 , 16.512974, 43.02974 , 16.108913, 24.26776 , 23.430847, 14.781404, 40.448624, 13.629236, 24.26776 , 19.068447, 16.026619, 16.512974, 16.108913, 77.40547 , 12.629299, 31.585844, 24.798452, 18.967432, 14.781404, 23.430847, 49.16879 , 18.408686, 22.73194 , 10.792178, 16.108913, 18.967432, 12.041505, 85.41213 , 41.62944 , 31.20367 , 17.970661, 29.745787, 39.459843, 10.792178, 43.02974 , 21.831453, 21.50163 , 24.798452, 30.362497, 21.50163 , 18.788296, 20.904285, 17.352108, 41.798622, 18.447166, 16.108913, 19.068447, 61.161705, 52.865868, 20.795431, 85.41213 , 49.801117, 13.82727 , 18.928782, 41.43656 , 46.886642, 92.56279 , 41.62944 , 18.091696, 20.60169 , 48.145336, 20.53382 , 40.448624, 20.60169 , 23.586731, 22.73194 , 92.56279 , 94.555305, 22.73194 , 17.352108, 46.886642, 27.399046, 18.008326, 15.176119], dtype=float32)

But since there is no axis in np.random.choice, how can I do it for all arrays (i.e. (categories, models, types))? Or is looping through it the only option?

edited Nov 11 at 2:39

merv

24.5k671108

asked Nov 10 at 4:40

maximusdooku

1,38821343

Is shuffling okay?
– coldspeed
Nov 10 at 4:44

@coldspeed It needs to be sampling with replacement. But if you have a shuffling example/answer, please do provide, as I will test both at a later stage. Thanks!
– maximusdooku
Nov 10 at 4:46

In numpy, the first dimension is outermost, last inner. Also, where possible we index with one set of , e.g. arr[0,0,0,:] should produce a (100,) shape array. Transposing to put that size 100 axis first (outermost) might be a good idea.
– hpaulj
Nov 10 at 4:48

I'd try applying choice to a 1d indexing array, e.g. idx = np.random.choice(100, size); arr[:,:,: idx].
– hpaulj
Nov 10 at 4:50

@hpaulj Thank you so much for the correction. I have updated the question.
– maximusdooku
Nov 10 at 4:52

|
show 3 more comments

up vote
1
down vote

favorite

I have a numpy array of these dimensions

data.shape (categories, models, types, events): (10, 11, 50, 100)

Now I want to do sample with replacement in the innermost array (100) only. For a single array such as this:

data[0][0][0]

I can do sample with replacement using this: np.random.choice(data[0][0][0], 100), which I will be doing thousands of times.

But since there is no axis in np.random.choice, how can I do it for all arrays (i.e. (categories, models, types))? Or is looping through it the only option?

edited Nov 11 at 2:39

merv

24.5k671108

asked Nov 10 at 4:40

maximusdooku

1,38821343

Is shuffling okay?
– coldspeed
Nov 10 at 4:44

@coldspeed It needs to be sampling with replacement. But if you have a shuffling example/answer, please do provide, as I will test both at a later stage. Thanks!
– maximusdooku
Nov 10 at 4:46

In numpy, the first dimension is outermost, last inner. Also, where possible we index with one set of , e.g. arr[0,0,0,:] should produce a (100,) shape array. Transposing to put that size 100 axis first (outermost) might be a good idea.
– hpaulj
Nov 10 at 4:48

I'd try applying choice to a 1d indexing array, e.g. idx = np.random.choice(100, size); arr[:,:,: idx].
– hpaulj
Nov 10 at 4:50

@hpaulj Thank you so much for the correction. I have updated the question.
– maximusdooku
Nov 10 at 4:52

|
show 3 more comments

up vote
1
down vote

favorite

I have a numpy array of these dimensions

data.shape (categories, models, types, events): (10, 11, 50, 100)

Now I want to do sample with replacement in the innermost array (100) only. For a single array such as this:

data[0][0][0]

I can do sample with replacement using this: np.random.choice(data[0][0][0], 100), which I will be doing thousands of times.

But since there is no axis in np.random.choice, how can I do it for all arrays (i.e. (categories, models, types))? Or is looping through it the only option?

edited Nov 11 at 2:39

merv

24.5k671108

asked Nov 10 at 4:40

maximusdooku

1,38821343

I have a numpy array of these dimensions

data.shape (categories, models, types, events): (10, 11, 50, 100)

Now I want to do sample with replacement in the innermost array (100) only. For a single array such as this:

data[0][0][0]

I can do sample with replacement using this: np.random.choice(data[0][0][0], 100), which I will be doing thousands of times.

But since there is no axis in np.random.choice, how can I do it for all arrays (i.e. (categories, models, types))? Or is looping through it the only option?

python arrays numpy numpy-ndarray statistics-bootstrap

edited Nov 11 at 2:39

merv

24.5k671108

asked Nov 10 at 4:40

maximusdooku

1,38821343

edited Nov 11 at 2:39

merv

24.5k671108

asked Nov 10 at 4:40

maximusdooku

1,38821343

edited Nov 11 at 2:39

merv

24.5k671108

edited Nov 11 at 2:39

merv

24.5k671108

edited Nov 11 at 2:39

merv

24.5k671108

asked Nov 10 at 4:40

maximusdooku

1,38821343

asked Nov 10 at 4:40

maximusdooku

1,38821343

asked Nov 10 at 4:40

maximusdooku

1,38821343

Is shuffling okay?
– coldspeed
Nov 10 at 4:44

@coldspeed It needs to be sampling with replacement. But if you have a shuffling example/answer, please do provide, as I will test both at a later stage. Thanks!
– maximusdooku
Nov 10 at 4:46

In numpy, the first dimension is outermost, last inner. Also, where possible we index with one set of , e.g. arr[0,0,0,:] should produce a (100,) shape array. Transposing to put that size 100 axis first (outermost) might be a good idea.
– hpaulj
Nov 10 at 4:48

I'd try applying choice to a 1d indexing array, e.g. idx = np.random.choice(100, size); arr[:,:,: idx].
– hpaulj
Nov 10 at 4:50

@hpaulj Thank you so much for the correction. I have updated the question.
– maximusdooku
Nov 10 at 4:52

|
show 3 more comments

Is shuffling okay?
– coldspeed
Nov 10 at 4:44

@coldspeed It needs to be sampling with replacement. But if you have a shuffling example/answer, please do provide, as I will test both at a later stage. Thanks!
– maximusdooku
Nov 10 at 4:46

In numpy, the first dimension is outermost, last inner. Also, where possible we index with one set of , e.g. arr[0,0,0,:] should produce a (100,) shape array. Transposing to put that size 100 axis first (outermost) might be a good idea.
– hpaulj
Nov 10 at 4:48

I'd try applying choice to a 1d indexing array, e.g. idx = np.random.choice(100, size); arr[:,:,: idx].
– hpaulj
Nov 10 at 4:50

@hpaulj Thank you so much for the correction. I have updated the question.
– maximusdooku
Nov 10 at 4:52

Is shuffling okay?
– coldspeed
Nov 10 at 4:44

@coldspeed It needs to be sampling with replacement. But if you have a shuffling example/answer, please do provide, as I will test both at a later stage. Thanks!
– maximusdooku
Nov 10 at 4:46

In numpy, the first dimension is outermost, last inner. Also, where possible we index with one set of , e.g. arr[0,0,0,:] should produce a (100,) shape array. Transposing to put that size 100 axis first (outermost) might be a good idea.
– hpaulj
Nov 10 at 4:48

I'd try applying choice to a 1d indexing array, e.g. idx = np.random.choice(100, size); arr[:,:,: idx].
– hpaulj
Nov 10 at 4:50

@hpaulj Thank you so much for the correction. I have updated the question.
– maximusdooku
Nov 10 at 4:52

|
show 3 more comments

3 Answers
3

active

oldest

votes

up vote
1
down vote

accepted

The fastest/simplest answer turns out to be based on indexing a flattened version of your array:

def resampFlat(arr, reps):
 n = arr.shape[-1]

 # create an array to shift random indexes as needed
 shift = np.repeat(np.arange(0, arr.size, n), n).reshape(arr.shape)

 # get a flat view of the array
 arrflat = arr.ravel()
 # sample the array by generating random ints and shifting them appropriately
 return np.array([arrflat[np.random.randint(0, n, arr.shape) + shift] 
 for i in range(reps)])

Timings confirm that this is the fastest answer.

Timings

I tested out the above resampFlat function alongside a simpler for loop based solution:

def resampFor(arr, reps):
 # store the shape for the return value
 shape = arr.shape
 # flatten all dimensions of arr except the last
 arr = arr.reshape(-1, arr.shape[-1])
 # preallocate the return value
 ret = np.empty((reps, *arr.shape), dtype=arr.dtype)
 # generate the indices of the resampled values
 idxs = np.random.randint(0, arr.shape[-1], (reps, *arr.shape))

 for rep,idx in zip(ret, idxs):
 # iterate over the resampled replicates
 for row,rowrep,i in zip(arr, rep, idx):
 # iterate over the event arrays within a replicate
 rowrep[...] = row[i]

 # give the return value the appropriate shape
 return ret.reshape((reps, *shape))

and a solution based on Paul Panzer's fancy indexing approach:

def resampFancyIdx(arr, reps):
 idx = np.random.randint(0, arr.shape[-1], (reps, *data.shape))
 _, I, J, K, _ = np.ogrid[tuple(map(slice, (0, *arr.shape[:-1], 0)))]

 return arr[I, J, K, idx]

I tested with the following data:

shape = ((10, 11, 50, 100))
data = np.arange(np.prod(shape)).reshape(shape)

Here's the results from the array flattening approach:

%%timeit
resampFlat(data, 100)

1.25 s ± 9.02 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

the results from the for loop approach:

%%timeit
resampFor(data, 100)

1.66 s ± 16.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

and from Paul's fancy indexing:

%%timeit
resampFancyIdx(data, 100)

1.42 s ± 16.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Contrary to my expectations, resampFancyIdx beat resampFor, and I actually had to work fairly hard to come up with something better. At this point I would really like a better explanation of how fancy indexing works at the C-level, and why it's so performant.

edited Nov 10 at 13:02

answered Nov 10 at 5:32

tel

3,5711428

Thank you. Several new concepts in your code for me. Prob will take a while to understand everything. Though the shape looks good.
– maximusdooku
Nov 10 at 6:33

@maximusdooku my first answer was a little hacky, so I came up with something better. Hopefully it'll be a little easier to understand, too
– tel
Nov 10 at 13:09

add a comment |

up vote
1
down vote

You can draw the indices of your samples and then apply fancy indexing:

>>> import numpy as np
>>> 
>>> (categories, models, types, events) = (10, 11, 50, 100)
>>> data = np.random.random((categories, models, types, events))
>>> N_samples = 1000
>>> 
>>> idx = np.random.randint(0, events, (categories, models, types, N_samples))
>>> I, J, K, _ = np.ogrid[:categories, :models, :types, :0]
>>> 
>>> resampled = data[I, J, K, idx]

A small explicit example for concreteness. The fields are labeled with "category" (A or B), "model" (a or b) and "type" (1 or 2) to make it easy to verify that sampling does preserve these.

>>> I, J, K, L = np.ix_(*(np.array(list(x), 'O') for x in ('AB', 'ab', '12', 'xyzw')))
>>> data = I+J+K+L
>>> data
array([[[['Aa1x', 'Aa1y', 'Aa1z', 'Aa1w'],
 ['Aa2x', 'Aa2y', 'Aa2z', 'Aa2w']],

 [['Ab1x', 'Ab1y', 'Ab1z', 'Ab1w'],
 ['Ab2x', 'Ab2y', 'Ab2z', 'Ab2w']]],


 [[['Ba1x', 'Ba1y', 'Ba1z', 'Ba1w'],
 ['Ba2x', 'Ba2y', 'Ba2z', 'Ba2w']],

 [['Bb1x', 'Bb1y', 'Bb1z', 'Bb1w'],
 ['Bb2x', 'Bb2y', 'Bb2z', 'Bb2w']]]], dtype=object)
>>> N_samples = 3
>>> 
>>> idx = np.random.randint(0, data.shape[-1], (N_samples, *data.shape))
>>> _, I, J, K, _ = np.ogrid[tuple(map(slice, (0, *data.shape[:-1], 0)))]
>>> 
>>> resampled = data[I, J, K, idx]
>>> res
ResourceWarning resampled 
>>> resampled
array([[[[['Aa1z', 'Aa1y', 'Aa1y', 'Aa1x'],
 ['Aa2y', 'Aa2z', 'Aa2z', 'Aa2z']],

 [['Ab1w', 'Ab1z', 'Ab1y', 'Ab1x'],
 ['Ab2y', 'Ab2w', 'Ab2y', 'Ab2w']]],


 [[['Ba1z', 'Ba1y', 'Ba1y', 'Ba1x'],
 ['Ba2x', 'Ba2x', 'Ba2z', 'Ba2x']],

 [['Bb1x', 'Bb1x', 'Bb1y', 'Bb1z'],
 ['Bb2y', 'Bb2w', 'Bb2y', 'Bb2z']]]],



 [[[['Aa1x', 'Aa1w', 'Aa1x', 'Aa1z'],
 ['Aa2y', 'Aa2y', 'Aa2x', 'Aa2z']],

 [['Ab1y', 'Ab1x', 'Ab1w', 'Ab1z'],
 ['Ab2w', 'Ab2x', 'Ab2w', 'Ab2w']]],


 [[['Ba1x', 'Ba1z', 'Ba1x', 'Ba1z'],
 ['Ba2x', 'Ba2y', 'Ba2y', 'Ba2w']],

 [['Bb1z', 'Bb1w', 'Bb1y', 'Bb1w'],
 ['Bb2w', 'Bb2x', 'Bb2w', 'Bb2z']]]],



 [[[['Aa1w', 'Aa1w', 'Aa1w', 'Aa1y'],
 ['Aa2z', 'Aa2x', 'Aa2y', 'Aa2x']],

 [['Ab1z', 'Ab1z', 'Ab1x', 'Ab1y'],
 ['Ab2w', 'Ab2x', 'Ab2x', 'Ab2y']]],


 [[['Ba1w', 'Ba1x', 'Ba1y', 'Ba1y'],
 ['Ba2z', 'Ba2x', 'Ba2x', 'Ba2x']],

 [['Bb1z', 'Bb1w', 'Bb1x', 'Bb1x'],
 ['Bb2z', 'Bb2x', 'Bb2w', 'Bb2z']]]]], dtype=object)

edited Nov 10 at 7:13

answered Nov 10 at 5:41

Paul Panzer

28.9k21240

Thank you! Could you please explain if this is different/better than what I posted and why? Specially with regards to cross-correlations. Looks neat!
– maximusdooku
Nov 10 at 5:46

I am not sure I understand this code unfortunately. Is it doing "global" sample with replacement in all events? Because the sampling needs to happen in each 100 event array...
– maximusdooku
Nov 10 at 5:49

@maximusdooku I've added a small example, so you can check whether it does what you need. The advantage of this method is that it avoids a python level loop. Let me know whether it works for you.
– Paul Panzer
Nov 10 at 6:00

I am going through every step. But the shape of data changes from (2, 2, 2, 4) to (2, 2, 2, 6). But I need to generate 6 samples of (2,2,2,4) for example...Perhaps I am not understanding the code because of fancy indexing...
– maximusdooku
Nov 10 at 6:29

@maximusdooku think about it this way: given N_samples=12, Paul's answer would produce the same result as if you took 3 samples of shape (2, 2, 2, 4) and then just stuck them together along the last axis. Either approach (one array of shape (2,2,2,N*4) vs N arrays of shape (2,2,2,4)) can produce a correct bootstrap. In any case, I think this answer is a good one. It's similar to hpaulj's, but without the flaw in the math
– tel
Nov 10 at 6:38

|
show 3 more comments

up vote
0
down vote

databoot = 
for i in range(5):
 idx = np.random.choice(100, 100)
 databoot.append(data[:,:,:,idx])

shape of databoot -> (5, 10, 11, 50, 100)

shape of data -> (10, 11, 50, 100)

answered Nov 10 at 5:12

maximusdooku

1,38821343

@hpaulj came up with a nicely efficient/parsimonious answer, but I don't think it's 100% mathematically correct. Basically, if there's any correlations between the events in different (categories, models, types), the above answer may bias your resampled array, as well as the results from your bootstrap
– tel
Nov 10 at 5:19

@tel I would love to see a more correct answer...If you know how to, please do post.
– maximusdooku
Nov 10 at 5:20

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53236040%2fhow-can-i-bootstrap-the-innermost-array-of-a-numpy-array%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

up vote
1
down vote

accepted

The fastest/simplest answer turns out to be based on indexing a flattened version of your array:

def resampFlat(arr, reps):
 n = arr.shape[-1]

 # create an array to shift random indexes as needed
 shift = np.repeat(np.arange(0, arr.size, n), n).reshape(arr.shape)

 # get a flat view of the array
 arrflat = arr.ravel()
 # sample the array by generating random ints and shifting them appropriately
 return np.array([arrflat[np.random.randint(0, n, arr.shape) + shift] 
 for i in range(reps)])

Timings confirm that this is the fastest answer.

Timings

I tested out the above resampFlat function alongside a simpler for loop based solution:

def resampFor(arr, reps):
 # store the shape for the return value
 shape = arr.shape
 # flatten all dimensions of arr except the last
 arr = arr.reshape(-1, arr.shape[-1])
 # preallocate the return value
 ret = np.empty((reps, *arr.shape), dtype=arr.dtype)
 # generate the indices of the resampled values
 idxs = np.random.randint(0, arr.shape[-1], (reps, *arr.shape))

 for rep,idx in zip(ret, idxs):
 # iterate over the resampled replicates
 for row,rowrep,i in zip(arr, rep, idx):
 # iterate over the event arrays within a replicate
 rowrep[...] = row[i]

 # give the return value the appropriate shape
 return ret.reshape((reps, *shape))

and a solution based on Paul Panzer's fancy indexing approach:

def resampFancyIdx(arr, reps):
 idx = np.random.randint(0, arr.shape[-1], (reps, *data.shape))
 _, I, J, K, _ = np.ogrid[tuple(map(slice, (0, *arr.shape[:-1], 0)))]

 return arr[I, J, K, idx]

I tested with the following data:

shape = ((10, 11, 50, 100))
data = np.arange(np.prod(shape)).reshape(shape)

Here's the results from the array flattening approach:

%%timeit
resampFlat(data, 100)

1.25 s ± 9.02 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

the results from the for loop approach:

%%timeit
resampFor(data, 100)

1.66 s ± 16.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

and from Paul's fancy indexing:

%%timeit
resampFancyIdx(data, 100)

1.42 s ± 16.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

edited Nov 10 at 13:02

answered Nov 10 at 5:32

tel

3,5711428

Thank you. Several new concepts in your code for me. Prob will take a while to understand everything. Though the shape looks good.
– maximusdooku
Nov 10 at 6:33

@maximusdooku my first answer was a little hacky, so I came up with something better. Hopefully it'll be a little easier to understand, too
– tel
Nov 10 at 13:09

add a comment |

up vote
1
down vote

accepted

The fastest/simplest answer turns out to be based on indexing a flattened version of your array:

def resampFlat(arr, reps):
 n = arr.shape[-1]

 # create an array to shift random indexes as needed
 shift = np.repeat(np.arange(0, arr.size, n), n).reshape(arr.shape)

 # get a flat view of the array
 arrflat = arr.ravel()
 # sample the array by generating random ints and shifting them appropriately
 return np.array([arrflat[np.random.randint(0, n, arr.shape) + shift] 
 for i in range(reps)])

Timings confirm that this is the fastest answer.

Timings

I tested out the above resampFlat function alongside a simpler for loop based solution:

def resampFor(arr, reps):
 # store the shape for the return value
 shape = arr.shape
 # flatten all dimensions of arr except the last
 arr = arr.reshape(-1, arr.shape[-1])
 # preallocate the return value
 ret = np.empty((reps, *arr.shape), dtype=arr.dtype)
 # generate the indices of the resampled values
 idxs = np.random.randint(0, arr.shape[-1], (reps, *arr.shape))

 for rep,idx in zip(ret, idxs):
 # iterate over the resampled replicates
 for row,rowrep,i in zip(arr, rep, idx):
 # iterate over the event arrays within a replicate
 rowrep[...] = row[i]

 # give the return value the appropriate shape
 return ret.reshape((reps, *shape))

and a solution based on Paul Panzer's fancy indexing approach:

def resampFancyIdx(arr, reps):
 idx = np.random.randint(0, arr.shape[-1], (reps, *data.shape))
 _, I, J, K, _ = np.ogrid[tuple(map(slice, (0, *arr.shape[:-1], 0)))]

 return arr[I, J, K, idx]

I tested with the following data:

shape = ((10, 11, 50, 100))
data = np.arange(np.prod(shape)).reshape(shape)

Here's the results from the array flattening approach:

%%timeit
resampFlat(data, 100)

1.25 s ± 9.02 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

the results from the for loop approach:

%%timeit
resampFor(data, 100)

1.66 s ± 16.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

and from Paul's fancy indexing:

%%timeit
resampFancyIdx(data, 100)

1.42 s ± 16.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

edited Nov 10 at 13:02

answered Nov 10 at 5:32

tel

3,5711428

Thank you. Several new concepts in your code for me. Prob will take a while to understand everything. Though the shape looks good.
– maximusdooku
Nov 10 at 6:33

@maximusdooku my first answer was a little hacky, so I came up with something better. Hopefully it'll be a little easier to understand, too
– tel
Nov 10 at 13:09

add a comment |

up vote
1
down vote

accepted

The fastest/simplest answer turns out to be based on indexing a flattened version of your array:

def resampFlat(arr, reps):
 n = arr.shape[-1]

 # create an array to shift random indexes as needed
 shift = np.repeat(np.arange(0, arr.size, n), n).reshape(arr.shape)

 # get a flat view of the array
 arrflat = arr.ravel()
 # sample the array by generating random ints and shifting them appropriately
 return np.array([arrflat[np.random.randint(0, n, arr.shape) + shift] 
 for i in range(reps)])

Timings confirm that this is the fastest answer.

Timings

I tested out the above resampFlat function alongside a simpler for loop based solution:

def resampFor(arr, reps):
 # store the shape for the return value
 shape = arr.shape
 # flatten all dimensions of arr except the last
 arr = arr.reshape(-1, arr.shape[-1])
 # preallocate the return value
 ret = np.empty((reps, *arr.shape), dtype=arr.dtype)
 # generate the indices of the resampled values
 idxs = np.random.randint(0, arr.shape[-1], (reps, *arr.shape))

 for rep,idx in zip(ret, idxs):
 # iterate over the resampled replicates
 for row,rowrep,i in zip(arr, rep, idx):
 # iterate over the event arrays within a replicate
 rowrep[...] = row[i]

 # give the return value the appropriate shape
 return ret.reshape((reps, *shape))

and a solution based on Paul Panzer's fancy indexing approach:

def resampFancyIdx(arr, reps):
 idx = np.random.randint(0, arr.shape[-1], (reps, *data.shape))
 _, I, J, K, _ = np.ogrid[tuple(map(slice, (0, *arr.shape[:-1], 0)))]

 return arr[I, J, K, idx]

I tested with the following data:

shape = ((10, 11, 50, 100))
data = np.arange(np.prod(shape)).reshape(shape)

Here's the results from the array flattening approach:

%%timeit
resampFlat(data, 100)

1.25 s ± 9.02 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

the results from the for loop approach:

%%timeit
resampFor(data, 100)

1.66 s ± 16.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

and from Paul's fancy indexing:

%%timeit
resampFancyIdx(data, 100)

1.42 s ± 16.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

edited Nov 10 at 13:02

answered Nov 10 at 5:32

tel

3,5711428

The fastest/simplest answer turns out to be based on indexing a flattened version of your array:

def resampFlat(arr, reps):
 n = arr.shape[-1]

 # create an array to shift random indexes as needed
 shift = np.repeat(np.arange(0, arr.size, n), n).reshape(arr.shape)

 # get a flat view of the array
 arrflat = arr.ravel()
 # sample the array by generating random ints and shifting them appropriately
 return np.array([arrflat[np.random.randint(0, n, arr.shape) + shift] 
 for i in range(reps)])

Timings confirm that this is the fastest answer.

Timings

I tested out the above resampFlat function alongside a simpler for loop based solution:

def resampFor(arr, reps):
 # store the shape for the return value
 shape = arr.shape
 # flatten all dimensions of arr except the last
 arr = arr.reshape(-1, arr.shape[-1])
 # preallocate the return value
 ret = np.empty((reps, *arr.shape), dtype=arr.dtype)
 # generate the indices of the resampled values
 idxs = np.random.randint(0, arr.shape[-1], (reps, *arr.shape))

 for rep,idx in zip(ret, idxs):
 # iterate over the resampled replicates
 for row,rowrep,i in zip(arr, rep, idx):
 # iterate over the event arrays within a replicate
 rowrep[...] = row[i]

 # give the return value the appropriate shape
 return ret.reshape((reps, *shape))

and a solution based on Paul Panzer's fancy indexing approach:

def resampFancyIdx(arr, reps):
 idx = np.random.randint(0, arr.shape[-1], (reps, *data.shape))
 _, I, J, K, _ = np.ogrid[tuple(map(slice, (0, *arr.shape[:-1], 0)))]

 return arr[I, J, K, idx]

I tested with the following data:

shape = ((10, 11, 50, 100))
data = np.arange(np.prod(shape)).reshape(shape)

Here's the results from the array flattening approach:

%%timeit
resampFlat(data, 100)

1.25 s ± 9.02 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

the results from the for loop approach:

%%timeit
resampFor(data, 100)

1.66 s ± 16.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

and from Paul's fancy indexing:

%%timeit
resampFancyIdx(data, 100)

1.42 s ± 16.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

edited Nov 10 at 13:02

answered Nov 10 at 5:32

tel

3,5711428

edited Nov 10 at 13:02

answered Nov 10 at 5:32

tel

3,5711428

answered Nov 10 at 5:32

tel

3,5711428

answered Nov 10 at 5:32

tel

3,5711428

Thank you. Several new concepts in your code for me. Prob will take a while to understand everything. Though the shape looks good.
– maximusdooku
Nov 10 at 6:33

@maximusdooku my first answer was a little hacky, so I came up with something better. Hopefully it'll be a little easier to understand, too
– tel
Nov 10 at 13:09

add a comment |

Thank you. Several new concepts in your code for me. Prob will take a while to understand everything. Though the shape looks good.
– maximusdooku
Nov 10 at 6:33

@maximusdooku my first answer was a little hacky, so I came up with something better. Hopefully it'll be a little easier to understand, too
– tel
Nov 10 at 13:09

Thank you. Several new concepts in your code for me. Prob will take a while to understand everything. Though the shape looks good.
– maximusdooku
Nov 10 at 6:33

@maximusdooku my first answer was a little hacky, so I came up with something better. Hopefully it'll be a little easier to understand, too
– tel
Nov 10 at 13:09

add a comment |

up vote
1
down vote

You can draw the indices of your samples and then apply fancy indexing:

>>> import numpy as np
>>> 
>>> (categories, models, types, events) = (10, 11, 50, 100)
>>> data = np.random.random((categories, models, types, events))
>>> N_samples = 1000
>>> 
>>> idx = np.random.randint(0, events, (categories, models, types, N_samples))
>>> I, J, K, _ = np.ogrid[:categories, :models, :types, :0]
>>> 
>>> resampled = data[I, J, K, idx]

A small explicit example for concreteness. The fields are labeled with "category" (A or B), "model" (a or b) and "type" (1 or 2) to make it easy to verify that sampling does preserve these.

>>> I, J, K, L = np.ix_(*(np.array(list(x), 'O') for x in ('AB', 'ab', '12', 'xyzw')))
>>> data = I+J+K+L
>>> data
array([[[['Aa1x', 'Aa1y', 'Aa1z', 'Aa1w'],
 ['Aa2x', 'Aa2y', 'Aa2z', 'Aa2w']],

 [['Ab1x', 'Ab1y', 'Ab1z', 'Ab1w'],
 ['Ab2x', 'Ab2y', 'Ab2z', 'Ab2w']]],


 [[['Ba1x', 'Ba1y', 'Ba1z', 'Ba1w'],
 ['Ba2x', 'Ba2y', 'Ba2z', 'Ba2w']],

 [['Bb1x', 'Bb1y', 'Bb1z', 'Bb1w'],
 ['Bb2x', 'Bb2y', 'Bb2z', 'Bb2w']]]], dtype=object)
>>> N_samples = 3
>>> 
>>> idx = np.random.randint(0, data.shape[-1], (N_samples, *data.shape))
>>> _, I, J, K, _ = np.ogrid[tuple(map(slice, (0, *data.shape[:-1], 0)))]
>>> 
>>> resampled = data[I, J, K, idx]
>>> res
ResourceWarning resampled 
>>> resampled
array([[[[['Aa1z', 'Aa1y', 'Aa1y', 'Aa1x'],
 ['Aa2y', 'Aa2z', 'Aa2z', 'Aa2z']],

 [['Ab1w', 'Ab1z', 'Ab1y', 'Ab1x'],
 ['Ab2y', 'Ab2w', 'Ab2y', 'Ab2w']]],


 [[['Ba1z', 'Ba1y', 'Ba1y', 'Ba1x'],
 ['Ba2x', 'Ba2x', 'Ba2z', 'Ba2x']],

 [['Bb1x', 'Bb1x', 'Bb1y', 'Bb1z'],
 ['Bb2y', 'Bb2w', 'Bb2y', 'Bb2z']]]],



 [[[['Aa1x', 'Aa1w', 'Aa1x', 'Aa1z'],
 ['Aa2y', 'Aa2y', 'Aa2x', 'Aa2z']],

 [['Ab1y', 'Ab1x', 'Ab1w', 'Ab1z'],
 ['Ab2w', 'Ab2x', 'Ab2w', 'Ab2w']]],


 [[['Ba1x', 'Ba1z', 'Ba1x', 'Ba1z'],
 ['Ba2x', 'Ba2y', 'Ba2y', 'Ba2w']],

 [['Bb1z', 'Bb1w', 'Bb1y', 'Bb1w'],
 ['Bb2w', 'Bb2x', 'Bb2w', 'Bb2z']]]],



 [[[['Aa1w', 'Aa1w', 'Aa1w', 'Aa1y'],
 ['Aa2z', 'Aa2x', 'Aa2y', 'Aa2x']],

 [['Ab1z', 'Ab1z', 'Ab1x', 'Ab1y'],
 ['Ab2w', 'Ab2x', 'Ab2x', 'Ab2y']]],


 [[['Ba1w', 'Ba1x', 'Ba1y', 'Ba1y'],
 ['Ba2z', 'Ba2x', 'Ba2x', 'Ba2x']],

 [['Bb1z', 'Bb1w', 'Bb1x', 'Bb1x'],
 ['Bb2z', 'Bb2x', 'Bb2w', 'Bb2z']]]]], dtype=object)

edited Nov 10 at 7:13

answered Nov 10 at 5:41

Paul Panzer

28.9k21240

Thank you! Could you please explain if this is different/better than what I posted and why? Specially with regards to cross-correlations. Looks neat!
– maximusdooku
Nov 10 at 5:46

I am not sure I understand this code unfortunately. Is it doing "global" sample with replacement in all events? Because the sampling needs to happen in each 100 event array...
– maximusdooku
Nov 10 at 5:49

@maximusdooku I've added a small example, so you can check whether it does what you need. The advantage of this method is that it avoids a python level loop. Let me know whether it works for you.
– Paul Panzer
Nov 10 at 6:00

I am going through every step. But the shape of data changes from (2, 2, 2, 4) to (2, 2, 2, 6). But I need to generate 6 samples of (2,2,2,4) for example...Perhaps I am not understanding the code because of fancy indexing...
– maximusdooku
Nov 10 at 6:29

@maximusdooku think about it this way: given N_samples=12, Paul's answer would produce the same result as if you took 3 samples of shape (2, 2, 2, 4) and then just stuck them together along the last axis. Either approach (one array of shape (2,2,2,N*4) vs N arrays of shape (2,2,2,4)) can produce a correct bootstrap. In any case, I think this answer is a good one. It's similar to hpaulj's, but without the flaw in the math
– tel
Nov 10 at 6:38

|
show 3 more comments

up vote
1
down vote

You can draw the indices of your samples and then apply fancy indexing:

>>> import numpy as np
>>> 
>>> (categories, models, types, events) = (10, 11, 50, 100)
>>> data = np.random.random((categories, models, types, events))
>>> N_samples = 1000
>>> 
>>> idx = np.random.randint(0, events, (categories, models, types, N_samples))
>>> I, J, K, _ = np.ogrid[:categories, :models, :types, :0]
>>> 
>>> resampled = data[I, J, K, idx]

A small explicit example for concreteness. The fields are labeled with "category" (A or B), "model" (a or b) and "type" (1 or 2) to make it easy to verify that sampling does preserve these.

>>> I, J, K, L = np.ix_(*(np.array(list(x), 'O') for x in ('AB', 'ab', '12', 'xyzw')))
>>> data = I+J+K+L
>>> data
array([[[['Aa1x', 'Aa1y', 'Aa1z', 'Aa1w'],
 ['Aa2x', 'Aa2y', 'Aa2z', 'Aa2w']],

 [['Ab1x', 'Ab1y', 'Ab1z', 'Ab1w'],
 ['Ab2x', 'Ab2y', 'Ab2z', 'Ab2w']]],


 [[['Ba1x', 'Ba1y', 'Ba1z', 'Ba1w'],
 ['Ba2x', 'Ba2y', 'Ba2z', 'Ba2w']],

 [['Bb1x', 'Bb1y', 'Bb1z', 'Bb1w'],
 ['Bb2x', 'Bb2y', 'Bb2z', 'Bb2w']]]], dtype=object)
>>> N_samples = 3
>>> 
>>> idx = np.random.randint(0, data.shape[-1], (N_samples, *data.shape))
>>> _, I, J, K, _ = np.ogrid[tuple(map(slice, (0, *data.shape[:-1], 0)))]
>>> 
>>> resampled = data[I, J, K, idx]
>>> res
ResourceWarning resampled 
>>> resampled
array([[[[['Aa1z', 'Aa1y', 'Aa1y', 'Aa1x'],
 ['Aa2y', 'Aa2z', 'Aa2z', 'Aa2z']],

 [['Ab1w', 'Ab1z', 'Ab1y', 'Ab1x'],
 ['Ab2y', 'Ab2w', 'Ab2y', 'Ab2w']]],


 [[['Ba1z', 'Ba1y', 'Ba1y', 'Ba1x'],
 ['Ba2x', 'Ba2x', 'Ba2z', 'Ba2x']],

 [['Bb1x', 'Bb1x', 'Bb1y', 'Bb1z'],
 ['Bb2y', 'Bb2w', 'Bb2y', 'Bb2z']]]],



 [[[['Aa1x', 'Aa1w', 'Aa1x', 'Aa1z'],
 ['Aa2y', 'Aa2y', 'Aa2x', 'Aa2z']],

 [['Ab1y', 'Ab1x', 'Ab1w', 'Ab1z'],
 ['Ab2w', 'Ab2x', 'Ab2w', 'Ab2w']]],


 [[['Ba1x', 'Ba1z', 'Ba1x', 'Ba1z'],
 ['Ba2x', 'Ba2y', 'Ba2y', 'Ba2w']],

 [['Bb1z', 'Bb1w', 'Bb1y', 'Bb1w'],
 ['Bb2w', 'Bb2x', 'Bb2w', 'Bb2z']]]],



 [[[['Aa1w', 'Aa1w', 'Aa1w', 'Aa1y'],
 ['Aa2z', 'Aa2x', 'Aa2y', 'Aa2x']],

 [['Ab1z', 'Ab1z', 'Ab1x', 'Ab1y'],
 ['Ab2w', 'Ab2x', 'Ab2x', 'Ab2y']]],


 [[['Ba1w', 'Ba1x', 'Ba1y', 'Ba1y'],
 ['Ba2z', 'Ba2x', 'Ba2x', 'Ba2x']],

 [['Bb1z', 'Bb1w', 'Bb1x', 'Bb1x'],
 ['Bb2z', 'Bb2x', 'Bb2w', 'Bb2z']]]]], dtype=object)

edited Nov 10 at 7:13

answered Nov 10 at 5:41

Paul Panzer

28.9k21240

Thank you! Could you please explain if this is different/better than what I posted and why? Specially with regards to cross-correlations. Looks neat!
– maximusdooku
Nov 10 at 5:46

I am not sure I understand this code unfortunately. Is it doing "global" sample with replacement in all events? Because the sampling needs to happen in each 100 event array...
– maximusdooku
Nov 10 at 5:49

@maximusdooku I've added a small example, so you can check whether it does what you need. The advantage of this method is that it avoids a python level loop. Let me know whether it works for you.
– Paul Panzer
Nov 10 at 6:00

I am going through every step. But the shape of data changes from (2, 2, 2, 4) to (2, 2, 2, 6). But I need to generate 6 samples of (2,2,2,4) for example...Perhaps I am not understanding the code because of fancy indexing...
– maximusdooku
Nov 10 at 6:29

@maximusdooku think about it this way: given N_samples=12, Paul's answer would produce the same result as if you took 3 samples of shape (2, 2, 2, 4) and then just stuck them together along the last axis. Either approach (one array of shape (2,2,2,N*4) vs N arrays of shape (2,2,2,4)) can produce a correct bootstrap. In any case, I think this answer is a good one. It's similar to hpaulj's, but without the flaw in the math
– tel
Nov 10 at 6:38

|
show 3 more comments

up vote
1
down vote

You can draw the indices of your samples and then apply fancy indexing:

>>> import numpy as np
>>> 
>>> (categories, models, types, events) = (10, 11, 50, 100)
>>> data = np.random.random((categories, models, types, events))
>>> N_samples = 1000
>>> 
>>> idx = np.random.randint(0, events, (categories, models, types, N_samples))
>>> I, J, K, _ = np.ogrid[:categories, :models, :types, :0]
>>> 
>>> resampled = data[I, J, K, idx]

A small explicit example for concreteness. The fields are labeled with "category" (A or B), "model" (a or b) and "type" (1 or 2) to make it easy to verify that sampling does preserve these.

>>> I, J, K, L = np.ix_(*(np.array(list(x), 'O') for x in ('AB', 'ab', '12', 'xyzw')))
>>> data = I+J+K+L
>>> data
array([[[['Aa1x', 'Aa1y', 'Aa1z', 'Aa1w'],
 ['Aa2x', 'Aa2y', 'Aa2z', 'Aa2w']],

 [['Ab1x', 'Ab1y', 'Ab1z', 'Ab1w'],
 ['Ab2x', 'Ab2y', 'Ab2z', 'Ab2w']]],


 [[['Ba1x', 'Ba1y', 'Ba1z', 'Ba1w'],
 ['Ba2x', 'Ba2y', 'Ba2z', 'Ba2w']],

 [['Bb1x', 'Bb1y', 'Bb1z', 'Bb1w'],
 ['Bb2x', 'Bb2y', 'Bb2z', 'Bb2w']]]], dtype=object)
>>> N_samples = 3
>>> 
>>> idx = np.random.randint(0, data.shape[-1], (N_samples, *data.shape))
>>> _, I, J, K, _ = np.ogrid[tuple(map(slice, (0, *data.shape[:-1], 0)))]
>>> 
>>> resampled = data[I, J, K, idx]
>>> res
ResourceWarning resampled 
>>> resampled
array([[[[['Aa1z', 'Aa1y', 'Aa1y', 'Aa1x'],
 ['Aa2y', 'Aa2z', 'Aa2z', 'Aa2z']],

 [['Ab1w', 'Ab1z', 'Ab1y', 'Ab1x'],
 ['Ab2y', 'Ab2w', 'Ab2y', 'Ab2w']]],


 [[['Ba1z', 'Ba1y', 'Ba1y', 'Ba1x'],
 ['Ba2x', 'Ba2x', 'Ba2z', 'Ba2x']],

 [['Bb1x', 'Bb1x', 'Bb1y', 'Bb1z'],
 ['Bb2y', 'Bb2w', 'Bb2y', 'Bb2z']]]],



 [[[['Aa1x', 'Aa1w', 'Aa1x', 'Aa1z'],
 ['Aa2y', 'Aa2y', 'Aa2x', 'Aa2z']],

 [['Ab1y', 'Ab1x', 'Ab1w', 'Ab1z'],
 ['Ab2w', 'Ab2x', 'Ab2w', 'Ab2w']]],


 [[['Ba1x', 'Ba1z', 'Ba1x', 'Ba1z'],
 ['Ba2x', 'Ba2y', 'Ba2y', 'Ba2w']],

 [['Bb1z', 'Bb1w', 'Bb1y', 'Bb1w'],
 ['Bb2w', 'Bb2x', 'Bb2w', 'Bb2z']]]],



 [[[['Aa1w', 'Aa1w', 'Aa1w', 'Aa1y'],
 ['Aa2z', 'Aa2x', 'Aa2y', 'Aa2x']],

 [['Ab1z', 'Ab1z', 'Ab1x', 'Ab1y'],
 ['Ab2w', 'Ab2x', 'Ab2x', 'Ab2y']]],


 [[['Ba1w', 'Ba1x', 'Ba1y', 'Ba1y'],
 ['Ba2z', 'Ba2x', 'Ba2x', 'Ba2x']],

 [['Bb1z', 'Bb1w', 'Bb1x', 'Bb1x'],
 ['Bb2z', 'Bb2x', 'Bb2w', 'Bb2z']]]]], dtype=object)

edited Nov 10 at 7:13

answered Nov 10 at 5:41

Paul Panzer

28.9k21240

You can draw the indices of your samples and then apply fancy indexing:

>>> import numpy as np
>>> 
>>> (categories, models, types, events) = (10, 11, 50, 100)
>>> data = np.random.random((categories, models, types, events))
>>> N_samples = 1000
>>> 
>>> idx = np.random.randint(0, events, (categories, models, types, N_samples))
>>> I, J, K, _ = np.ogrid[:categories, :models, :types, :0]
>>> 
>>> resampled = data[I, J, K, idx]

A small explicit example for concreteness. The fields are labeled with "category" (A or B), "model" (a or b) and "type" (1 or 2) to make it easy to verify that sampling does preserve these.

>>> I, J, K, L = np.ix_(*(np.array(list(x), 'O') for x in ('AB', 'ab', '12', 'xyzw')))
>>> data = I+J+K+L
>>> data
array([[[['Aa1x', 'Aa1y', 'Aa1z', 'Aa1w'],
 ['Aa2x', 'Aa2y', 'Aa2z', 'Aa2w']],

 [['Ab1x', 'Ab1y', 'Ab1z', 'Ab1w'],
 ['Ab2x', 'Ab2y', 'Ab2z', 'Ab2w']]],


 [[['Ba1x', 'Ba1y', 'Ba1z', 'Ba1w'],
 ['Ba2x', 'Ba2y', 'Ba2z', 'Ba2w']],

 [['Bb1x', 'Bb1y', 'Bb1z', 'Bb1w'],
 ['Bb2x', 'Bb2y', 'Bb2z', 'Bb2w']]]], dtype=object)
>>> N_samples = 3
>>> 
>>> idx = np.random.randint(0, data.shape[-1], (N_samples, *data.shape))
>>> _, I, J, K, _ = np.ogrid[tuple(map(slice, (0, *data.shape[:-1], 0)))]
>>> 
>>> resampled = data[I, J, K, idx]
>>> res
ResourceWarning resampled 
>>> resampled
array([[[[['Aa1z', 'Aa1y', 'Aa1y', 'Aa1x'],
 ['Aa2y', 'Aa2z', 'Aa2z', 'Aa2z']],

 [['Ab1w', 'Ab1z', 'Ab1y', 'Ab1x'],
 ['Ab2y', 'Ab2w', 'Ab2y', 'Ab2w']]],


 [[['Ba1z', 'Ba1y', 'Ba1y', 'Ba1x'],
 ['Ba2x', 'Ba2x', 'Ba2z', 'Ba2x']],

 [['Bb1x', 'Bb1x', 'Bb1y', 'Bb1z'],
 ['Bb2y', 'Bb2w', 'Bb2y', 'Bb2z']]]],



 [[[['Aa1x', 'Aa1w', 'Aa1x', 'Aa1z'],
 ['Aa2y', 'Aa2y', 'Aa2x', 'Aa2z']],

 [['Ab1y', 'Ab1x', 'Ab1w', 'Ab1z'],
 ['Ab2w', 'Ab2x', 'Ab2w', 'Ab2w']]],


 [[['Ba1x', 'Ba1z', 'Ba1x', 'Ba1z'],
 ['Ba2x', 'Ba2y', 'Ba2y', 'Ba2w']],

 [['Bb1z', 'Bb1w', 'Bb1y', 'Bb1w'],
 ['Bb2w', 'Bb2x', 'Bb2w', 'Bb2z']]]],



 [[[['Aa1w', 'Aa1w', 'Aa1w', 'Aa1y'],
 ['Aa2z', 'Aa2x', 'Aa2y', 'Aa2x']],

 [['Ab1z', 'Ab1z', 'Ab1x', 'Ab1y'],
 ['Ab2w', 'Ab2x', 'Ab2x', 'Ab2y']]],


 [[['Ba1w', 'Ba1x', 'Ba1y', 'Ba1y'],
 ['Ba2z', 'Ba2x', 'Ba2x', 'Ba2x']],

 [['Bb1z', 'Bb1w', 'Bb1x', 'Bb1x'],
 ['Bb2z', 'Bb2x', 'Bb2w', 'Bb2z']]]]], dtype=object)

edited Nov 10 at 7:13

answered Nov 10 at 5:41

Paul Panzer

28.9k21240

edited Nov 10 at 7:13

answered Nov 10 at 5:41

Paul Panzer

28.9k21240

answered Nov 10 at 5:41

Paul Panzer

28.9k21240

answered Nov 10 at 5:41

Paul Panzer

28.9k21240

Thank you! Could you please explain if this is different/better than what I posted and why? Specially with regards to cross-correlations. Looks neat!
– maximusdooku
Nov 10 at 5:46

I am not sure I understand this code unfortunately. Is it doing "global" sample with replacement in all events? Because the sampling needs to happen in each 100 event array...
– maximusdooku
Nov 10 at 5:49

@maximusdooku I've added a small example, so you can check whether it does what you need. The advantage of this method is that it avoids a python level loop. Let me know whether it works for you.
– Paul Panzer
Nov 10 at 6:00

I am going through every step. But the shape of data changes from (2, 2, 2, 4) to (2, 2, 2, 6). But I need to generate 6 samples of (2,2,2,4) for example...Perhaps I am not understanding the code because of fancy indexing...
– maximusdooku
Nov 10 at 6:29

@maximusdooku think about it this way: given N_samples=12, Paul's answer would produce the same result as if you took 3 samples of shape (2, 2, 2, 4) and then just stuck them together along the last axis. Either approach (one array of shape (2,2,2,N*4) vs N arrays of shape (2,2,2,4)) can produce a correct bootstrap. In any case, I think this answer is a good one. It's similar to hpaulj's, but without the flaw in the math
– tel
Nov 10 at 6:38

|
show 3 more comments

Thank you! Could you please explain if this is different/better than what I posted and why? Specially with regards to cross-correlations. Looks neat!
– maximusdooku
Nov 10 at 5:46

I am not sure I understand this code unfortunately. Is it doing "global" sample with replacement in all events? Because the sampling needs to happen in each 100 event array...
– maximusdooku
Nov 10 at 5:49

@maximusdooku I've added a small example, so you can check whether it does what you need. The advantage of this method is that it avoids a python level loop. Let me know whether it works for you.
– Paul Panzer
Nov 10 at 6:00

I am going through every step. But the shape of data changes from (2, 2, 2, 4) to (2, 2, 2, 6). But I need to generate 6 samples of (2,2,2,4) for example...Perhaps I am not understanding the code because of fancy indexing...
– maximusdooku
Nov 10 at 6:29

@maximusdooku think about it this way: given N_samples=12, Paul's answer would produce the same result as if you took 3 samples of shape (2, 2, 2, 4) and then just stuck them together along the last axis. Either approach (one array of shape (2,2,2,N*4) vs N arrays of shape (2,2,2,4)) can produce a correct bootstrap. In any case, I think this answer is a good one. It's similar to hpaulj's, but without the flaw in the math
– tel
Nov 10 at 6:38

Thank you! Could you please explain if this is different/better than what I posted and why? Specially with regards to cross-correlations. Looks neat!
– maximusdooku
Nov 10 at 5:46

I am not sure I understand this code unfortunately. Is it doing "global" sample with replacement in all events? Because the sampling needs to happen in each 100 event array...
– maximusdooku
Nov 10 at 5:49

@maximusdooku I've added a small example, so you can check whether it does what you need. The advantage of this method is that it avoids a python level loop. Let me know whether it works for you.
– Paul Panzer
Nov 10 at 6:00

I am going through every step. But the shape of data changes from (2, 2, 2, 4) to (2, 2, 2, 6). But I need to generate 6 samples of (2,2,2,4) for example...Perhaps I am not understanding the code because of fancy indexing...
– maximusdooku
Nov 10 at 6:29

@maximusdooku think about it this way: given N_samples=12, Paul's answer would produce the same result as if you took 3 samples of shape (2, 2, 2, 4) and then just stuck them together along the last axis. Either approach (one array of shape (2,2,2,N*4) vs N arrays of shape (2,2,2,4)) can produce a correct bootstrap. In any case, I think this answer is a good one. It's similar to hpaulj's, but without the flaw in the math
– tel
Nov 10 at 6:38

|
show 3 more comments

up vote
0
down vote

databoot = 
for i in range(5):
 idx = np.random.choice(100, 100)
 databoot.append(data[:,:,:,idx])

shape of databoot -> (5, 10, 11, 50, 100)

shape of data -> (10, 11, 50, 100)

answered Nov 10 at 5:12

maximusdooku

1,38821343

@hpaulj came up with a nicely efficient/parsimonious answer, but I don't think it's 100% mathematically correct. Basically, if there's any correlations between the events in different (categories, models, types), the above answer may bias your resampled array, as well as the results from your bootstrap
– tel
Nov 10 at 5:19

@tel I would love to see a more correct answer...If you know how to, please do post.
– maximusdooku
Nov 10 at 5:20

add a comment |

up vote
0
down vote

databoot = 
for i in range(5):
 idx = np.random.choice(100, 100)
 databoot.append(data[:,:,:,idx])

shape of databoot -> (5, 10, 11, 50, 100)

shape of data -> (10, 11, 50, 100)

answered Nov 10 at 5:12

maximusdooku

1,38821343

@hpaulj came up with a nicely efficient/parsimonious answer, but I don't think it's 100% mathematically correct. Basically, if there's any correlations between the events in different (categories, models, types), the above answer may bias your resampled array, as well as the results from your bootstrap
– tel
Nov 10 at 5:19

@tel I would love to see a more correct answer...If you know how to, please do post.
– maximusdooku
Nov 10 at 5:20

add a comment |

up vote
0
down vote

databoot = 
for i in range(5):
 idx = np.random.choice(100, 100)
 databoot.append(data[:,:,:,idx])

shape of databoot -> (5, 10, 11, 50, 100)

shape of data -> (10, 11, 50, 100)

answered Nov 10 at 5:12

maximusdooku

1,38821343

databoot = 
for i in range(5):
 idx = np.random.choice(100, 100)
 databoot.append(data[:,:,:,idx])

shape of databoot -> (5, 10, 11, 50, 100)

shape of data -> (10, 11, 50, 100)

answered Nov 10 at 5:12

maximusdooku

1,38821343

answered Nov 10 at 5:12

maximusdooku

1,38821343

answered Nov 10 at 5:12

maximusdooku

1,38821343

answered Nov 10 at 5:12

maximusdooku

1,38821343

@hpaulj came up with a nicely efficient/parsimonious answer, but I don't think it's 100% mathematically correct. Basically, if there's any correlations between the events in different (categories, models, types), the above answer may bias your resampled array, as well as the results from your bootstrap
– tel
Nov 10 at 5:19

@tel I would love to see a more correct answer...If you know how to, please do post.
– maximusdooku
Nov 10 at 5:20

add a comment |

@hpaulj came up with a nicely efficient/parsimonious answer, but I don't think it's 100% mathematically correct. Basically, if there's any correlations between the events in different (categories, models, types), the above answer may bias your resampled array, as well as the results from your bootstrap
– tel
Nov 10 at 5:19

@tel I would love to see a more correct answer...If you know how to, please do post.
– maximusdooku
Nov 10 at 5:20

@hpaulj came up with a nicely efficient/parsimonious answer, but I don't think it's 100% mathematically correct. Basically, if there's any correlations between the events in different (categories, models, types), the above answer may bias your resampled array, as well as the results from your bootstrap
– tel
Nov 10 at 5:19

@tel I would love to see a more correct answer...If you know how to, please do post.
– maximusdooku
Nov 10 at 5:20

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Pfthb