What is a distribution over functions?



.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
10
down vote

favorite
6












I am reading a textbook Gaussian Process for Machine Learning by C.E. Rasmussen and C.K.I. Williams and I am having some trouble understanding what does distribution over functions mean. In the textbook, an example is given, that one should imagine a function as a very long vector (in fact, it should be infinitely long?). So I imagine a distribution over functions to be a probability distribution drawn "above" such vector values. Would it then be a probability that a function will take this particular value? Or would it be a probability that a function will take a value that is in a given range? Or is distribution over functions a probability assigned to a whole function?



Quotes from the textbook:



Chapter 1: Introduction, page 2




A Gaussian process is a generalization of the Gaussian probability
distribution. Whereas a probability distribution describes random
variables which are scalars or vectors (for multivariate
distributions), a stochastic process governs the properties of
functions. Leaving mathematical sophistication aside, one can loosely
think of a function as a very long vector, each entry in the vector
specifying the function value f(x) at a particular input x. It turns
out, that although this idea is a little naive, it is surprisingly
close what we need. Indeed, the question of how we deal
computationally with these infinite dimensional objects has the most
pleasant resolution imaginable: if you ask only for the properties of
the function at a finite number of points, then inference in the
Gaussian process will give you the same answer if you ignore the
infinitely many other points, as if you would have taken them all into
account!




Chapter 2: Regression, page 7




There are several ways to interpret Gaussian process (GP) regression
models. One can think of a Gaussian process as defining a distribution
over functions
, and inference taking place directly in the space of
functions, the function-space view.





From the initial question:



I made this conceptual picture to try to visualize this for myself. I am not sure if such explanation that I made for myself is correct.



enter image description here




After the update:



After the answer of Gijs I updated the picture to be conceptually more something like this:



enter image description here










share|cite|improve this question





























    up vote
    10
    down vote

    favorite
    6












    I am reading a textbook Gaussian Process for Machine Learning by C.E. Rasmussen and C.K.I. Williams and I am having some trouble understanding what does distribution over functions mean. In the textbook, an example is given, that one should imagine a function as a very long vector (in fact, it should be infinitely long?). So I imagine a distribution over functions to be a probability distribution drawn "above" such vector values. Would it then be a probability that a function will take this particular value? Or would it be a probability that a function will take a value that is in a given range? Or is distribution over functions a probability assigned to a whole function?



    Quotes from the textbook:



    Chapter 1: Introduction, page 2




    A Gaussian process is a generalization of the Gaussian probability
    distribution. Whereas a probability distribution describes random
    variables which are scalars or vectors (for multivariate
    distributions), a stochastic process governs the properties of
    functions. Leaving mathematical sophistication aside, one can loosely
    think of a function as a very long vector, each entry in the vector
    specifying the function value f(x) at a particular input x. It turns
    out, that although this idea is a little naive, it is surprisingly
    close what we need. Indeed, the question of how we deal
    computationally with these infinite dimensional objects has the most
    pleasant resolution imaginable: if you ask only for the properties of
    the function at a finite number of points, then inference in the
    Gaussian process will give you the same answer if you ignore the
    infinitely many other points, as if you would have taken them all into
    account!




    Chapter 2: Regression, page 7




    There are several ways to interpret Gaussian process (GP) regression
    models. One can think of a Gaussian process as defining a distribution
    over functions
    , and inference taking place directly in the space of
    functions, the function-space view.





    From the initial question:



    I made this conceptual picture to try to visualize this for myself. I am not sure if such explanation that I made for myself is correct.



    enter image description here




    After the update:



    After the answer of Gijs I updated the picture to be conceptually more something like this:



    enter image description here










    share|cite|improve this question

























      up vote
      10
      down vote

      favorite
      6









      up vote
      10
      down vote

      favorite
      6






      6





      I am reading a textbook Gaussian Process for Machine Learning by C.E. Rasmussen and C.K.I. Williams and I am having some trouble understanding what does distribution over functions mean. In the textbook, an example is given, that one should imagine a function as a very long vector (in fact, it should be infinitely long?). So I imagine a distribution over functions to be a probability distribution drawn "above" such vector values. Would it then be a probability that a function will take this particular value? Or would it be a probability that a function will take a value that is in a given range? Or is distribution over functions a probability assigned to a whole function?



      Quotes from the textbook:



      Chapter 1: Introduction, page 2




      A Gaussian process is a generalization of the Gaussian probability
      distribution. Whereas a probability distribution describes random
      variables which are scalars or vectors (for multivariate
      distributions), a stochastic process governs the properties of
      functions. Leaving mathematical sophistication aside, one can loosely
      think of a function as a very long vector, each entry in the vector
      specifying the function value f(x) at a particular input x. It turns
      out, that although this idea is a little naive, it is surprisingly
      close what we need. Indeed, the question of how we deal
      computationally with these infinite dimensional objects has the most
      pleasant resolution imaginable: if you ask only for the properties of
      the function at a finite number of points, then inference in the
      Gaussian process will give you the same answer if you ignore the
      infinitely many other points, as if you would have taken them all into
      account!




      Chapter 2: Regression, page 7




      There are several ways to interpret Gaussian process (GP) regression
      models. One can think of a Gaussian process as defining a distribution
      over functions
      , and inference taking place directly in the space of
      functions, the function-space view.





      From the initial question:



      I made this conceptual picture to try to visualize this for myself. I am not sure if such explanation that I made for myself is correct.



      enter image description here




      After the update:



      After the answer of Gijs I updated the picture to be conceptually more something like this:



      enter image description here










      share|cite|improve this question















      I am reading a textbook Gaussian Process for Machine Learning by C.E. Rasmussen and C.K.I. Williams and I am having some trouble understanding what does distribution over functions mean. In the textbook, an example is given, that one should imagine a function as a very long vector (in fact, it should be infinitely long?). So I imagine a distribution over functions to be a probability distribution drawn "above" such vector values. Would it then be a probability that a function will take this particular value? Or would it be a probability that a function will take a value that is in a given range? Or is distribution over functions a probability assigned to a whole function?



      Quotes from the textbook:



      Chapter 1: Introduction, page 2




      A Gaussian process is a generalization of the Gaussian probability
      distribution. Whereas a probability distribution describes random
      variables which are scalars or vectors (for multivariate
      distributions), a stochastic process governs the properties of
      functions. Leaving mathematical sophistication aside, one can loosely
      think of a function as a very long vector, each entry in the vector
      specifying the function value f(x) at a particular input x. It turns
      out, that although this idea is a little naive, it is surprisingly
      close what we need. Indeed, the question of how we deal
      computationally with these infinite dimensional objects has the most
      pleasant resolution imaginable: if you ask only for the properties of
      the function at a finite number of points, then inference in the
      Gaussian process will give you the same answer if you ignore the
      infinitely many other points, as if you would have taken them all into
      account!




      Chapter 2: Regression, page 7




      There are several ways to interpret Gaussian process (GP) regression
      models. One can think of a Gaussian process as defining a distribution
      over functions
      , and inference taking place directly in the space of
      functions, the function-space view.





      From the initial question:



      I made this conceptual picture to try to visualize this for myself. I am not sure if such explanation that I made for myself is correct.



      enter image description here




      After the update:



      After the answer of Gijs I updated the picture to be conceptually more something like this:



      enter image description here







      distributions gaussian-process






      share|cite|improve this question















      share|cite|improve this question













      share|cite|improve this question




      share|cite|improve this question








      edited Nov 9 at 13:29

























      asked Nov 9 at 9:17









      camillejr

      616




      616




















          2 Answers
          2






          active

          oldest

          votes

















          up vote
          13
          down vote













          Your question has already been asked, and beautifully answered, on the Mathematics SE site:



          https://math.stackexchange.com/questions/2297424/extending-a-distribution-over-samples-to-a-distribution-over-functions



          It sounds like you're not familiar with the concepts of Gaussian measures on infinite-dimensional spaces, linear functionals, pushforward measures, etc. thus I'll try to keep it as simple as possible.



          You already know how to define probabilities over real numbers (random variables) and over vectors (again, random variables, even if we usually call them random vectors). Now we want to introduce a probability measure over an infinite-dimensional vector-space: for example, the space $L^2([0,1])$ of square-integrable functions over $I=[0,1]$. Things get complicated now, because when we defined probability on $mathbbR$ or $mathbbR^n$, we were helped by the fact that the Lebesgue measure is defined on both spaces. However, there exists no Lebesgue measure over $L^2$(or any infinite-dimensional Banach space, for that matter). There are various solutions to this conundrum, most of which need a good familiarity with Functional Analysis.



          However, there's also a simple "trick" based on the Kolmogorov extension theorem, which is basically the way stochastic processes are introduced in most of the probability courses which are not heavily measure-theoretic. Now I'm going to be very hand-wavy and non-rigorous, and limit myself to the case of Gaussian processes. If you want a more general definition, you can read the above answer or look up the Wikipedia link. The Kolmogorov extension theorem, applied to your specific use case, states more or less the following:



          • suppose that, for each finite set of points $S_n= t_1, dots ,t_n subset I$, $mathbfx_n=(x(t_1),dots,x(t_n))$ has the multivariate Gaussian distribution

          • suppose now that for all possible $S_n, S_mvert S_nsubset S_m $, the corresponding probability distribution functions $f_S_n(x_1,dots,x_n)$ and $f_S_m(x_1,dots,x_n,x_n+1,dots,x_m)$ are consistent, i.e., if I integrate $f_S_m$ with respect to the variables which are in $S_m$ but not in $S_n$, then the resulting pdf is $f_S_n$:

          $$ int_mathbbR^n-m+1f_S_m(x_1,dots,x_m)textdx_n+1dots textdx_m=f_S_n(x_1,dots,x_n) $$



          • then there exist a stochastic process $X$, i.e., a random variable on the space of functions $L^2$, such that, for each finite set $S_n$, the probability distribution of those $n$ points is multivariate Gaussian.

          The actual theorem is widely more general, but I guess this is what you were looking for.






          share|cite|improve this answer





























            up vote
            10
            down vote













            The concept is a bit more abstract than a usual distribution. The problem is that we are used to the concept of a distribution over $mathbbR$, typically shown as a line, and then expand it to a surface $mathbbR^2$, and so on to distributions over $mathbbR^n$. But the space of functions cannot be represented as a square or a line or a vector. It's not a crime to think of it that way, like you do, but theory that works in $mathbbR^n$, having to do with distance, neighborhoods and such (this is known as the topology of the space), are not the same in the space of functions. So drawing it as a square can give you wrong intuitions about that space.



            You can simply think of the space of functions as a big collection of functions, perhaps a bag of things if you will. The distribution here then gives you the probabilities of drawing a subset of those things. The distribution will say: the probability that your next draw (of a function) is in this subset, is, for example, 10%. In the case of a Gaussian process on functions in two dimensions, you might ask, given an x-coordinate and an interval of y-values, this is a small vertical line segment, what is the probability that a (random) function will pass through this small line? That's going to be a positive probability. So the Gaussian process specifies a distribution (of probability) over a space of functions. In this example, the subset of the space of functions is the subset that passes through the line segment.



            Another confusing naming conventention here is that a distribution is commonly specified by a density function, such as the bell shape with the normal distribution. There, the area under the distribution function tells you how probable an interval is. This doesn't work for all distributions however, and in particular, in the case of functions (not $mathbbR$ as with the normal distributions), this doesn't work at all. That means you won't be able to write this distribution (as specified by the Gaussian process) as a density function.






            share|cite|improve this answer
















            • 1




              Thanks, so to clarify, this is not a distribution over one function's values, but instead a distribution over a collection of functions, right? One more question I have: you've said that this would be a probability that a random function will pass through a certain interval, so in example of GPR, it would be a random function but from a specific "family" of functions given by the covariance kernel?
              – camillejr
              Nov 9 at 10:41






            • 2




              Yes it is a distribution over a collection of functions. The example of passing through an interval applies if you have a Gaussian process. The covariance kernel will actually specify a Gaussian process. So if you know a covariance kernel, you can calculate the probability of a random function passing through a specific interval.
              – Gijs
              Nov 9 at 10:44











            Your Answer





            StackExchange.ifUsing("editor", function ()
            return StackExchange.using("mathjaxEditing", function ()
            StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
            StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
            );
            );
            , "mathjax-editing");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "65"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













             

            draft saved


            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f376141%2fwhat-is-a-distribution-over-functions%23new-answer', 'question_page');

            );

            Post as a guest






























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            13
            down vote













            Your question has already been asked, and beautifully answered, on the Mathematics SE site:



            https://math.stackexchange.com/questions/2297424/extending-a-distribution-over-samples-to-a-distribution-over-functions



            It sounds like you're not familiar with the concepts of Gaussian measures on infinite-dimensional spaces, linear functionals, pushforward measures, etc. thus I'll try to keep it as simple as possible.



            You already know how to define probabilities over real numbers (random variables) and over vectors (again, random variables, even if we usually call them random vectors). Now we want to introduce a probability measure over an infinite-dimensional vector-space: for example, the space $L^2([0,1])$ of square-integrable functions over $I=[0,1]$. Things get complicated now, because when we defined probability on $mathbbR$ or $mathbbR^n$, we were helped by the fact that the Lebesgue measure is defined on both spaces. However, there exists no Lebesgue measure over $L^2$(or any infinite-dimensional Banach space, for that matter). There are various solutions to this conundrum, most of which need a good familiarity with Functional Analysis.



            However, there's also a simple "trick" based on the Kolmogorov extension theorem, which is basically the way stochastic processes are introduced in most of the probability courses which are not heavily measure-theoretic. Now I'm going to be very hand-wavy and non-rigorous, and limit myself to the case of Gaussian processes. If you want a more general definition, you can read the above answer or look up the Wikipedia link. The Kolmogorov extension theorem, applied to your specific use case, states more or less the following:



            • suppose that, for each finite set of points $S_n= t_1, dots ,t_n subset I$, $mathbfx_n=(x(t_1),dots,x(t_n))$ has the multivariate Gaussian distribution

            • suppose now that for all possible $S_n, S_mvert S_nsubset S_m $, the corresponding probability distribution functions $f_S_n(x_1,dots,x_n)$ and $f_S_m(x_1,dots,x_n,x_n+1,dots,x_m)$ are consistent, i.e., if I integrate $f_S_m$ with respect to the variables which are in $S_m$ but not in $S_n$, then the resulting pdf is $f_S_n$:

            $$ int_mathbbR^n-m+1f_S_m(x_1,dots,x_m)textdx_n+1dots textdx_m=f_S_n(x_1,dots,x_n) $$



            • then there exist a stochastic process $X$, i.e., a random variable on the space of functions $L^2$, such that, for each finite set $S_n$, the probability distribution of those $n$ points is multivariate Gaussian.

            The actual theorem is widely more general, but I guess this is what you were looking for.






            share|cite|improve this answer


























              up vote
              13
              down vote













              Your question has already been asked, and beautifully answered, on the Mathematics SE site:



              https://math.stackexchange.com/questions/2297424/extending-a-distribution-over-samples-to-a-distribution-over-functions



              It sounds like you're not familiar with the concepts of Gaussian measures on infinite-dimensional spaces, linear functionals, pushforward measures, etc. thus I'll try to keep it as simple as possible.



              You already know how to define probabilities over real numbers (random variables) and over vectors (again, random variables, even if we usually call them random vectors). Now we want to introduce a probability measure over an infinite-dimensional vector-space: for example, the space $L^2([0,1])$ of square-integrable functions over $I=[0,1]$. Things get complicated now, because when we defined probability on $mathbbR$ or $mathbbR^n$, we were helped by the fact that the Lebesgue measure is defined on both spaces. However, there exists no Lebesgue measure over $L^2$(or any infinite-dimensional Banach space, for that matter). There are various solutions to this conundrum, most of which need a good familiarity with Functional Analysis.



              However, there's also a simple "trick" based on the Kolmogorov extension theorem, which is basically the way stochastic processes are introduced in most of the probability courses which are not heavily measure-theoretic. Now I'm going to be very hand-wavy and non-rigorous, and limit myself to the case of Gaussian processes. If you want a more general definition, you can read the above answer or look up the Wikipedia link. The Kolmogorov extension theorem, applied to your specific use case, states more or less the following:



              • suppose that, for each finite set of points $S_n= t_1, dots ,t_n subset I$, $mathbfx_n=(x(t_1),dots,x(t_n))$ has the multivariate Gaussian distribution

              • suppose now that for all possible $S_n, S_mvert S_nsubset S_m $, the corresponding probability distribution functions $f_S_n(x_1,dots,x_n)$ and $f_S_m(x_1,dots,x_n,x_n+1,dots,x_m)$ are consistent, i.e., if I integrate $f_S_m$ with respect to the variables which are in $S_m$ but not in $S_n$, then the resulting pdf is $f_S_n$:

              $$ int_mathbbR^n-m+1f_S_m(x_1,dots,x_m)textdx_n+1dots textdx_m=f_S_n(x_1,dots,x_n) $$



              • then there exist a stochastic process $X$, i.e., a random variable on the space of functions $L^2$, such that, for each finite set $S_n$, the probability distribution of those $n$ points is multivariate Gaussian.

              The actual theorem is widely more general, but I guess this is what you were looking for.






              share|cite|improve this answer
























                up vote
                13
                down vote










                up vote
                13
                down vote









                Your question has already been asked, and beautifully answered, on the Mathematics SE site:



                https://math.stackexchange.com/questions/2297424/extending-a-distribution-over-samples-to-a-distribution-over-functions



                It sounds like you're not familiar with the concepts of Gaussian measures on infinite-dimensional spaces, linear functionals, pushforward measures, etc. thus I'll try to keep it as simple as possible.



                You already know how to define probabilities over real numbers (random variables) and over vectors (again, random variables, even if we usually call them random vectors). Now we want to introduce a probability measure over an infinite-dimensional vector-space: for example, the space $L^2([0,1])$ of square-integrable functions over $I=[0,1]$. Things get complicated now, because when we defined probability on $mathbbR$ or $mathbbR^n$, we were helped by the fact that the Lebesgue measure is defined on both spaces. However, there exists no Lebesgue measure over $L^2$(or any infinite-dimensional Banach space, for that matter). There are various solutions to this conundrum, most of which need a good familiarity with Functional Analysis.



                However, there's also a simple "trick" based on the Kolmogorov extension theorem, which is basically the way stochastic processes are introduced in most of the probability courses which are not heavily measure-theoretic. Now I'm going to be very hand-wavy and non-rigorous, and limit myself to the case of Gaussian processes. If you want a more general definition, you can read the above answer or look up the Wikipedia link. The Kolmogorov extension theorem, applied to your specific use case, states more or less the following:



                • suppose that, for each finite set of points $S_n= t_1, dots ,t_n subset I$, $mathbfx_n=(x(t_1),dots,x(t_n))$ has the multivariate Gaussian distribution

                • suppose now that for all possible $S_n, S_mvert S_nsubset S_m $, the corresponding probability distribution functions $f_S_n(x_1,dots,x_n)$ and $f_S_m(x_1,dots,x_n,x_n+1,dots,x_m)$ are consistent, i.e., if I integrate $f_S_m$ with respect to the variables which are in $S_m$ but not in $S_n$, then the resulting pdf is $f_S_n$:

                $$ int_mathbbR^n-m+1f_S_m(x_1,dots,x_m)textdx_n+1dots textdx_m=f_S_n(x_1,dots,x_n) $$



                • then there exist a stochastic process $X$, i.e., a random variable on the space of functions $L^2$, such that, for each finite set $S_n$, the probability distribution of those $n$ points is multivariate Gaussian.

                The actual theorem is widely more general, but I guess this is what you were looking for.






                share|cite|improve this answer














                Your question has already been asked, and beautifully answered, on the Mathematics SE site:



                https://math.stackexchange.com/questions/2297424/extending-a-distribution-over-samples-to-a-distribution-over-functions



                It sounds like you're not familiar with the concepts of Gaussian measures on infinite-dimensional spaces, linear functionals, pushforward measures, etc. thus I'll try to keep it as simple as possible.



                You already know how to define probabilities over real numbers (random variables) and over vectors (again, random variables, even if we usually call them random vectors). Now we want to introduce a probability measure over an infinite-dimensional vector-space: for example, the space $L^2([0,1])$ of square-integrable functions over $I=[0,1]$. Things get complicated now, because when we defined probability on $mathbbR$ or $mathbbR^n$, we were helped by the fact that the Lebesgue measure is defined on both spaces. However, there exists no Lebesgue measure over $L^2$(or any infinite-dimensional Banach space, for that matter). There are various solutions to this conundrum, most of which need a good familiarity with Functional Analysis.



                However, there's also a simple "trick" based on the Kolmogorov extension theorem, which is basically the way stochastic processes are introduced in most of the probability courses which are not heavily measure-theoretic. Now I'm going to be very hand-wavy and non-rigorous, and limit myself to the case of Gaussian processes. If you want a more general definition, you can read the above answer or look up the Wikipedia link. The Kolmogorov extension theorem, applied to your specific use case, states more or less the following:



                • suppose that, for each finite set of points $S_n= t_1, dots ,t_n subset I$, $mathbfx_n=(x(t_1),dots,x(t_n))$ has the multivariate Gaussian distribution

                • suppose now that for all possible $S_n, S_mvert S_nsubset S_m $, the corresponding probability distribution functions $f_S_n(x_1,dots,x_n)$ and $f_S_m(x_1,dots,x_n,x_n+1,dots,x_m)$ are consistent, i.e., if I integrate $f_S_m$ with respect to the variables which are in $S_m$ but not in $S_n$, then the resulting pdf is $f_S_n$:

                $$ int_mathbbR^n-m+1f_S_m(x_1,dots,x_m)textdx_n+1dots textdx_m=f_S_n(x_1,dots,x_n) $$



                • then there exist a stochastic process $X$, i.e., a random variable on the space of functions $L^2$, such that, for each finite set $S_n$, the probability distribution of those $n$ points is multivariate Gaussian.

                The actual theorem is widely more general, but I guess this is what you were looking for.







                share|cite|improve this answer














                share|cite|improve this answer



                share|cite|improve this answer








                edited Nov 9 at 14:27

























                answered Nov 9 at 13:42









                DeltaIV

                6,82612258




                6,82612258






















                    up vote
                    10
                    down vote













                    The concept is a bit more abstract than a usual distribution. The problem is that we are used to the concept of a distribution over $mathbbR$, typically shown as a line, and then expand it to a surface $mathbbR^2$, and so on to distributions over $mathbbR^n$. But the space of functions cannot be represented as a square or a line or a vector. It's not a crime to think of it that way, like you do, but theory that works in $mathbbR^n$, having to do with distance, neighborhoods and such (this is known as the topology of the space), are not the same in the space of functions. So drawing it as a square can give you wrong intuitions about that space.



                    You can simply think of the space of functions as a big collection of functions, perhaps a bag of things if you will. The distribution here then gives you the probabilities of drawing a subset of those things. The distribution will say: the probability that your next draw (of a function) is in this subset, is, for example, 10%. In the case of a Gaussian process on functions in two dimensions, you might ask, given an x-coordinate and an interval of y-values, this is a small vertical line segment, what is the probability that a (random) function will pass through this small line? That's going to be a positive probability. So the Gaussian process specifies a distribution (of probability) over a space of functions. In this example, the subset of the space of functions is the subset that passes through the line segment.



                    Another confusing naming conventention here is that a distribution is commonly specified by a density function, such as the bell shape with the normal distribution. There, the area under the distribution function tells you how probable an interval is. This doesn't work for all distributions however, and in particular, in the case of functions (not $mathbbR$ as with the normal distributions), this doesn't work at all. That means you won't be able to write this distribution (as specified by the Gaussian process) as a density function.






                    share|cite|improve this answer
















                    • 1




                      Thanks, so to clarify, this is not a distribution over one function's values, but instead a distribution over a collection of functions, right? One more question I have: you've said that this would be a probability that a random function will pass through a certain interval, so in example of GPR, it would be a random function but from a specific "family" of functions given by the covariance kernel?
                      – camillejr
                      Nov 9 at 10:41






                    • 2




                      Yes it is a distribution over a collection of functions. The example of passing through an interval applies if you have a Gaussian process. The covariance kernel will actually specify a Gaussian process. So if you know a covariance kernel, you can calculate the probability of a random function passing through a specific interval.
                      – Gijs
                      Nov 9 at 10:44















                    up vote
                    10
                    down vote













                    The concept is a bit more abstract than a usual distribution. The problem is that we are used to the concept of a distribution over $mathbbR$, typically shown as a line, and then expand it to a surface $mathbbR^2$, and so on to distributions over $mathbbR^n$. But the space of functions cannot be represented as a square or a line or a vector. It's not a crime to think of it that way, like you do, but theory that works in $mathbbR^n$, having to do with distance, neighborhoods and such (this is known as the topology of the space), are not the same in the space of functions. So drawing it as a square can give you wrong intuitions about that space.



                    You can simply think of the space of functions as a big collection of functions, perhaps a bag of things if you will. The distribution here then gives you the probabilities of drawing a subset of those things. The distribution will say: the probability that your next draw (of a function) is in this subset, is, for example, 10%. In the case of a Gaussian process on functions in two dimensions, you might ask, given an x-coordinate and an interval of y-values, this is a small vertical line segment, what is the probability that a (random) function will pass through this small line? That's going to be a positive probability. So the Gaussian process specifies a distribution (of probability) over a space of functions. In this example, the subset of the space of functions is the subset that passes through the line segment.



                    Another confusing naming conventention here is that a distribution is commonly specified by a density function, such as the bell shape with the normal distribution. There, the area under the distribution function tells you how probable an interval is. This doesn't work for all distributions however, and in particular, in the case of functions (not $mathbbR$ as with the normal distributions), this doesn't work at all. That means you won't be able to write this distribution (as specified by the Gaussian process) as a density function.






                    share|cite|improve this answer
















                    • 1




                      Thanks, so to clarify, this is not a distribution over one function's values, but instead a distribution over a collection of functions, right? One more question I have: you've said that this would be a probability that a random function will pass through a certain interval, so in example of GPR, it would be a random function but from a specific "family" of functions given by the covariance kernel?
                      – camillejr
                      Nov 9 at 10:41






                    • 2




                      Yes it is a distribution over a collection of functions. The example of passing through an interval applies if you have a Gaussian process. The covariance kernel will actually specify a Gaussian process. So if you know a covariance kernel, you can calculate the probability of a random function passing through a specific interval.
                      – Gijs
                      Nov 9 at 10:44













                    up vote
                    10
                    down vote










                    up vote
                    10
                    down vote









                    The concept is a bit more abstract than a usual distribution. The problem is that we are used to the concept of a distribution over $mathbbR$, typically shown as a line, and then expand it to a surface $mathbbR^2$, and so on to distributions over $mathbbR^n$. But the space of functions cannot be represented as a square or a line or a vector. It's not a crime to think of it that way, like you do, but theory that works in $mathbbR^n$, having to do with distance, neighborhoods and such (this is known as the topology of the space), are not the same in the space of functions. So drawing it as a square can give you wrong intuitions about that space.



                    You can simply think of the space of functions as a big collection of functions, perhaps a bag of things if you will. The distribution here then gives you the probabilities of drawing a subset of those things. The distribution will say: the probability that your next draw (of a function) is in this subset, is, for example, 10%. In the case of a Gaussian process on functions in two dimensions, you might ask, given an x-coordinate and an interval of y-values, this is a small vertical line segment, what is the probability that a (random) function will pass through this small line? That's going to be a positive probability. So the Gaussian process specifies a distribution (of probability) over a space of functions. In this example, the subset of the space of functions is the subset that passes through the line segment.



                    Another confusing naming conventention here is that a distribution is commonly specified by a density function, such as the bell shape with the normal distribution. There, the area under the distribution function tells you how probable an interval is. This doesn't work for all distributions however, and in particular, in the case of functions (not $mathbbR$ as with the normal distributions), this doesn't work at all. That means you won't be able to write this distribution (as specified by the Gaussian process) as a density function.






                    share|cite|improve this answer












                    The concept is a bit more abstract than a usual distribution. The problem is that we are used to the concept of a distribution over $mathbbR$, typically shown as a line, and then expand it to a surface $mathbbR^2$, and so on to distributions over $mathbbR^n$. But the space of functions cannot be represented as a square or a line or a vector. It's not a crime to think of it that way, like you do, but theory that works in $mathbbR^n$, having to do with distance, neighborhoods and such (this is known as the topology of the space), are not the same in the space of functions. So drawing it as a square can give you wrong intuitions about that space.



                    You can simply think of the space of functions as a big collection of functions, perhaps a bag of things if you will. The distribution here then gives you the probabilities of drawing a subset of those things. The distribution will say: the probability that your next draw (of a function) is in this subset, is, for example, 10%. In the case of a Gaussian process on functions in two dimensions, you might ask, given an x-coordinate and an interval of y-values, this is a small vertical line segment, what is the probability that a (random) function will pass through this small line? That's going to be a positive probability. So the Gaussian process specifies a distribution (of probability) over a space of functions. In this example, the subset of the space of functions is the subset that passes through the line segment.



                    Another confusing naming conventention here is that a distribution is commonly specified by a density function, such as the bell shape with the normal distribution. There, the area under the distribution function tells you how probable an interval is. This doesn't work for all distributions however, and in particular, in the case of functions (not $mathbbR$ as with the normal distributions), this doesn't work at all. That means you won't be able to write this distribution (as specified by the Gaussian process) as a density function.







                    share|cite|improve this answer












                    share|cite|improve this answer



                    share|cite|improve this answer










                    answered Nov 9 at 10:29









                    Gijs

                    1,499513




                    1,499513







                    • 1




                      Thanks, so to clarify, this is not a distribution over one function's values, but instead a distribution over a collection of functions, right? One more question I have: you've said that this would be a probability that a random function will pass through a certain interval, so in example of GPR, it would be a random function but from a specific "family" of functions given by the covariance kernel?
                      – camillejr
                      Nov 9 at 10:41






                    • 2




                      Yes it is a distribution over a collection of functions. The example of passing through an interval applies if you have a Gaussian process. The covariance kernel will actually specify a Gaussian process. So if you know a covariance kernel, you can calculate the probability of a random function passing through a specific interval.
                      – Gijs
                      Nov 9 at 10:44













                    • 1




                      Thanks, so to clarify, this is not a distribution over one function's values, but instead a distribution over a collection of functions, right? One more question I have: you've said that this would be a probability that a random function will pass through a certain interval, so in example of GPR, it would be a random function but from a specific "family" of functions given by the covariance kernel?
                      – camillejr
                      Nov 9 at 10:41






                    • 2




                      Yes it is a distribution over a collection of functions. The example of passing through an interval applies if you have a Gaussian process. The covariance kernel will actually specify a Gaussian process. So if you know a covariance kernel, you can calculate the probability of a random function passing through a specific interval.
                      – Gijs
                      Nov 9 at 10:44








                    1




                    1




                    Thanks, so to clarify, this is not a distribution over one function's values, but instead a distribution over a collection of functions, right? One more question I have: you've said that this would be a probability that a random function will pass through a certain interval, so in example of GPR, it would be a random function but from a specific "family" of functions given by the covariance kernel?
                    – camillejr
                    Nov 9 at 10:41




                    Thanks, so to clarify, this is not a distribution over one function's values, but instead a distribution over a collection of functions, right? One more question I have: you've said that this would be a probability that a random function will pass through a certain interval, so in example of GPR, it would be a random function but from a specific "family" of functions given by the covariance kernel?
                    – camillejr
                    Nov 9 at 10:41




                    2




                    2




                    Yes it is a distribution over a collection of functions. The example of passing through an interval applies if you have a Gaussian process. The covariance kernel will actually specify a Gaussian process. So if you know a covariance kernel, you can calculate the probability of a random function passing through a specific interval.
                    – Gijs
                    Nov 9 at 10:44





                    Yes it is a distribution over a collection of functions. The example of passing through an interval applies if you have a Gaussian process. The covariance kernel will actually specify a Gaussian process. So if you know a covariance kernel, you can calculate the probability of a random function passing through a specific interval.
                    – Gijs
                    Nov 9 at 10:44


















                     

                    draft saved


                    draft discarded















































                     


                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f376141%2fwhat-is-a-distribution-over-functions%23new-answer', 'question_page');

                    );

                    Post as a guest














































































                    Popular posts from this blog

                    Darth Vader #20

                    How to how show current date and time by default on contact form 7 in WordPress without taking input from user in datetimepicker

                    Ondo