Create float from exponent and significand










1















Given integers exp and 0<=sig<2^52, how can I create the float64 with exp as exponent and whose significand bits are the same as the binary representation of sig (in Go)?










share|improve this question


























    1















    Given integers exp and 0<=sig<2^52, how can I create the float64 with exp as exponent and whose significand bits are the same as the binary representation of sig (in Go)?










    share|improve this question
























      1












      1








      1








      Given integers exp and 0<=sig<2^52, how can I create the float64 with exp as exponent and whose significand bits are the same as the binary representation of sig (in Go)?










      share|improve this question














      Given integers exp and 0<=sig<2^52, how can I create the float64 with exp as exponent and whose significand bits are the same as the binary representation of sig (in Go)?







      go floating-point binary






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 13 '18 at 8:32









      TedTed

      443414




      443414






















          1 Answer
          1






          active

          oldest

          votes


















          2














          The IEEE-754 standard defines the floating point arithmetics which Go uses for floating point numbers such as float32 and float64 (just like almost any other language).



          Since your significand may be up to 52 bits, obviously it can only be represented using a float64 value.



          The memory layout (bits) of a float64 value is described in Double-precision floating-point format.



          Here's a picture of the bits of a float64 value (taken from Wikipedia):



          enter image description here



          You claim you have the exponent value and the significand (which is the fraction part).



          You may use simple bitwise arithmetic to construct the 64-bit value of the floating point like this:



          bits := exp<<52 | sig


          (Note: exp and sig should be of type uint64. If not, use a type conversion.)



          Once you have the bits, you may use the math.Float64frombits() function to get it as a float64 value:



          f := math.Float64frombits(bits)


          Note that the exponent value of the memory layout is not the "direct" number you have to use when calculating the value of the number, but:




          The double-precision binary floating-point exponent is encoded using an offset-binary representation, with the zero offset being 1023; also known as exponent bias in the IEEE 754 standard.




          So the number encoded in the above double-precision format is calculated like:




          (-1)sign x 2e-1023 x 1.fraction







          share|improve this answer

























          • Would Math.Ldexp help here? func Ldexp(frac float64, exp int) float64

            – aMike
            Nov 13 '18 at 13:57











          • @aMike I was considering it, but it takes the fraction as a float64 value, and it does something similar under the hood.

            – icza
            Nov 13 '18 at 13:58












          • I see, so if I want the actual exponent to be exp, I have to do bits := (exp+1023)<<52 | sig, correct?

            – Ted
            Nov 13 '18 at 14:54






          • 1





            @Ted Yes, that's right.

            – icza
            Nov 13 '18 at 14:57










          Your Answer






          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "1"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53276811%2fcreate-float-from-exponent-and-significand%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          2














          The IEEE-754 standard defines the floating point arithmetics which Go uses for floating point numbers such as float32 and float64 (just like almost any other language).



          Since your significand may be up to 52 bits, obviously it can only be represented using a float64 value.



          The memory layout (bits) of a float64 value is described in Double-precision floating-point format.



          Here's a picture of the bits of a float64 value (taken from Wikipedia):



          enter image description here



          You claim you have the exponent value and the significand (which is the fraction part).



          You may use simple bitwise arithmetic to construct the 64-bit value of the floating point like this:



          bits := exp<<52 | sig


          (Note: exp and sig should be of type uint64. If not, use a type conversion.)



          Once you have the bits, you may use the math.Float64frombits() function to get it as a float64 value:



          f := math.Float64frombits(bits)


          Note that the exponent value of the memory layout is not the "direct" number you have to use when calculating the value of the number, but:




          The double-precision binary floating-point exponent is encoded using an offset-binary representation, with the zero offset being 1023; also known as exponent bias in the IEEE 754 standard.




          So the number encoded in the above double-precision format is calculated like:




          (-1)sign x 2e-1023 x 1.fraction







          share|improve this answer

























          • Would Math.Ldexp help here? func Ldexp(frac float64, exp int) float64

            – aMike
            Nov 13 '18 at 13:57











          • @aMike I was considering it, but it takes the fraction as a float64 value, and it does something similar under the hood.

            – icza
            Nov 13 '18 at 13:58












          • I see, so if I want the actual exponent to be exp, I have to do bits := (exp+1023)<<52 | sig, correct?

            – Ted
            Nov 13 '18 at 14:54






          • 1





            @Ted Yes, that's right.

            – icza
            Nov 13 '18 at 14:57















          2














          The IEEE-754 standard defines the floating point arithmetics which Go uses for floating point numbers such as float32 and float64 (just like almost any other language).



          Since your significand may be up to 52 bits, obviously it can only be represented using a float64 value.



          The memory layout (bits) of a float64 value is described in Double-precision floating-point format.



          Here's a picture of the bits of a float64 value (taken from Wikipedia):



          enter image description here



          You claim you have the exponent value and the significand (which is the fraction part).



          You may use simple bitwise arithmetic to construct the 64-bit value of the floating point like this:



          bits := exp<<52 | sig


          (Note: exp and sig should be of type uint64. If not, use a type conversion.)



          Once you have the bits, you may use the math.Float64frombits() function to get it as a float64 value:



          f := math.Float64frombits(bits)


          Note that the exponent value of the memory layout is not the "direct" number you have to use when calculating the value of the number, but:




          The double-precision binary floating-point exponent is encoded using an offset-binary representation, with the zero offset being 1023; also known as exponent bias in the IEEE 754 standard.




          So the number encoded in the above double-precision format is calculated like:




          (-1)sign x 2e-1023 x 1.fraction







          share|improve this answer

























          • Would Math.Ldexp help here? func Ldexp(frac float64, exp int) float64

            – aMike
            Nov 13 '18 at 13:57











          • @aMike I was considering it, but it takes the fraction as a float64 value, and it does something similar under the hood.

            – icza
            Nov 13 '18 at 13:58












          • I see, so if I want the actual exponent to be exp, I have to do bits := (exp+1023)<<52 | sig, correct?

            – Ted
            Nov 13 '18 at 14:54






          • 1





            @Ted Yes, that's right.

            – icza
            Nov 13 '18 at 14:57













          2












          2








          2







          The IEEE-754 standard defines the floating point arithmetics which Go uses for floating point numbers such as float32 and float64 (just like almost any other language).



          Since your significand may be up to 52 bits, obviously it can only be represented using a float64 value.



          The memory layout (bits) of a float64 value is described in Double-precision floating-point format.



          Here's a picture of the bits of a float64 value (taken from Wikipedia):



          enter image description here



          You claim you have the exponent value and the significand (which is the fraction part).



          You may use simple bitwise arithmetic to construct the 64-bit value of the floating point like this:



          bits := exp<<52 | sig


          (Note: exp and sig should be of type uint64. If not, use a type conversion.)



          Once you have the bits, you may use the math.Float64frombits() function to get it as a float64 value:



          f := math.Float64frombits(bits)


          Note that the exponent value of the memory layout is not the "direct" number you have to use when calculating the value of the number, but:




          The double-precision binary floating-point exponent is encoded using an offset-binary representation, with the zero offset being 1023; also known as exponent bias in the IEEE 754 standard.




          So the number encoded in the above double-precision format is calculated like:




          (-1)sign x 2e-1023 x 1.fraction







          share|improve this answer















          The IEEE-754 standard defines the floating point arithmetics which Go uses for floating point numbers such as float32 and float64 (just like almost any other language).



          Since your significand may be up to 52 bits, obviously it can only be represented using a float64 value.



          The memory layout (bits) of a float64 value is described in Double-precision floating-point format.



          Here's a picture of the bits of a float64 value (taken from Wikipedia):



          enter image description here



          You claim you have the exponent value and the significand (which is the fraction part).



          You may use simple bitwise arithmetic to construct the 64-bit value of the floating point like this:



          bits := exp<<52 | sig


          (Note: exp and sig should be of type uint64. If not, use a type conversion.)



          Once you have the bits, you may use the math.Float64frombits() function to get it as a float64 value:



          f := math.Float64frombits(bits)


          Note that the exponent value of the memory layout is not the "direct" number you have to use when calculating the value of the number, but:




          The double-precision binary floating-point exponent is encoded using an offset-binary representation, with the zero offset being 1023; also known as exponent bias in the IEEE 754 standard.




          So the number encoded in the above double-precision format is calculated like:




          (-1)sign x 2e-1023 x 1.fraction








          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 13 '18 at 14:58

























          answered Nov 13 '18 at 11:51









          iczaicza

          168k25333366




          168k25333366












          • Would Math.Ldexp help here? func Ldexp(frac float64, exp int) float64

            – aMike
            Nov 13 '18 at 13:57











          • @aMike I was considering it, but it takes the fraction as a float64 value, and it does something similar under the hood.

            – icza
            Nov 13 '18 at 13:58












          • I see, so if I want the actual exponent to be exp, I have to do bits := (exp+1023)<<52 | sig, correct?

            – Ted
            Nov 13 '18 at 14:54






          • 1





            @Ted Yes, that's right.

            – icza
            Nov 13 '18 at 14:57

















          • Would Math.Ldexp help here? func Ldexp(frac float64, exp int) float64

            – aMike
            Nov 13 '18 at 13:57











          • @aMike I was considering it, but it takes the fraction as a float64 value, and it does something similar under the hood.

            – icza
            Nov 13 '18 at 13:58












          • I see, so if I want the actual exponent to be exp, I have to do bits := (exp+1023)<<52 | sig, correct?

            – Ted
            Nov 13 '18 at 14:54






          • 1





            @Ted Yes, that's right.

            – icza
            Nov 13 '18 at 14:57
















          Would Math.Ldexp help here? func Ldexp(frac float64, exp int) float64

          – aMike
          Nov 13 '18 at 13:57





          Would Math.Ldexp help here? func Ldexp(frac float64, exp int) float64

          – aMike
          Nov 13 '18 at 13:57













          @aMike I was considering it, but it takes the fraction as a float64 value, and it does something similar under the hood.

          – icza
          Nov 13 '18 at 13:58






          @aMike I was considering it, but it takes the fraction as a float64 value, and it does something similar under the hood.

          – icza
          Nov 13 '18 at 13:58














          I see, so if I want the actual exponent to be exp, I have to do bits := (exp+1023)<<52 | sig, correct?

          – Ted
          Nov 13 '18 at 14:54





          I see, so if I want the actual exponent to be exp, I have to do bits := (exp+1023)<<52 | sig, correct?

          – Ted
          Nov 13 '18 at 14:54




          1




          1





          @Ted Yes, that's right.

          – icza
          Nov 13 '18 at 14:57





          @Ted Yes, that's right.

          – icza
          Nov 13 '18 at 14:57

















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53276811%2fcreate-float-from-exponent-and-significand%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Kleinkühnau

          Makov (Slowakei)

          Deutsches Schauspielhaus