Create float from exponent and significand

Given integers exp and 0<=sig<2^52, how can I create the float64 with exp as exponent and whose significand bits are the same as the binary representation of sig (in Go)?

asked Nov 13 '18 at 8:32

Ted

443414

add a comment |

Given integers exp and 0<=sig<2^52, how can I create the float64 with exp as exponent and whose significand bits are the same as the binary representation of sig (in Go)?

asked Nov 13 '18 at 8:32

Ted

443414

add a comment |

Given integers exp and 0<=sig<2^52, how can I create the float64 with exp as exponent and whose significand bits are the same as the binary representation of sig (in Go)?

asked Nov 13 '18 at 8:32

Ted

443414

Given integers exp and 0<=sig<2^52, how can I create the float64 with exp as exponent and whose significand bits are the same as the binary representation of sig (in Go)?

go floating-point binary

asked Nov 13 '18 at 8:32

Ted

443414

asked Nov 13 '18 at 8:32

Ted

443414

asked Nov 13 '18 at 8:32

Ted

443414

asked Nov 13 '18 at 8:32

Ted

443414

asked Nov 13 '18 at 8:32

Ted

443414

add a comment |

1 Answer
1

active

oldest

votes

The IEEE-754 standard defines the floating point arithmetics which Go uses for floating point numbers such as float32 and float64 (just like almost any other language).

Since your significand may be up to 52 bits, obviously it can only be represented using a float64 value.

The memory layout (bits) of a float64 value is described in Double-precision floating-point format.

Here's a picture of the bits of a float64 value (taken from Wikipedia):

enter image description here

You claim you have the exponent value and the significand (which is the fraction part).

You may use simple bitwise arithmetic to construct the 64-bit value of the floating point like this:

bits := exp<<52 | sig

(Note: exp and sig should be of type uint64. If not, use a type conversion.)

Once you have the bits, you may use the math.Float64frombits() function to get it as a float64 value:

f := math.Float64frombits(bits)

Note that the exponent value of the memory layout is not the "direct" number you have to use when calculating the value of the number, but:

The double-precision binary floating-point exponent is encoded using an offset-binary representation, with the zero offset being 1023; also known as exponent bias in the IEEE 754 standard.

So the number encoded in the above double-precision format is calculated like:

(-1)^sign x 2^e-1023 x 1.fraction

edited Nov 13 '18 at 14:58

answered Nov 13 '18 at 11:51

icza

168k25333366

Would Math.Ldexp help here? func Ldexp(frac float64, exp int) float64

– aMike
Nov 13 '18 at 13:57

@aMike I was considering it, but it takes the fraction as a float64 value, and it does something similar under the hood.

– icza
Nov 13 '18 at 13:58

I see, so if I want the actual exponent to be exp, I have to do bits := (exp+1023)<<52 | sig, correct?

– Ted
Nov 13 '18 at 14:54

1

@Ted Yes, that's right.

– icza
Nov 13 '18 at 14:57

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53276811%2fcreate-float-from-exponent-and-significand%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

The IEEE-754 standard defines the floating point arithmetics which Go uses for floating point numbers such as float32 and float64 (just like almost any other language).

Since your significand may be up to 52 bits, obviously it can only be represented using a float64 value.

The memory layout (bits) of a float64 value is described in Double-precision floating-point format.

Here's a picture of the bits of a float64 value (taken from Wikipedia):

enter image description here

You claim you have the exponent value and the significand (which is the fraction part).

You may use simple bitwise arithmetic to construct the 64-bit value of the floating point like this:

bits := exp<<52 | sig

(Note: exp and sig should be of type uint64. If not, use a type conversion.)

Once you have the bits, you may use the math.Float64frombits() function to get it as a float64 value:

f := math.Float64frombits(bits)

Note that the exponent value of the memory layout is not the "direct" number you have to use when calculating the value of the number, but:

The double-precision binary floating-point exponent is encoded using an offset-binary representation, with the zero offset being 1023; also known as exponent bias in the IEEE 754 standard.

So the number encoded in the above double-precision format is calculated like:

(-1)^sign x 2^e-1023 x 1.fraction

edited Nov 13 '18 at 14:58

answered Nov 13 '18 at 11:51

icza

168k25333366

Would Math.Ldexp help here? func Ldexp(frac float64, exp int) float64

– aMike
Nov 13 '18 at 13:57

@aMike I was considering it, but it takes the fraction as a float64 value, and it does something similar under the hood.

– icza
Nov 13 '18 at 13:58

I see, so if I want the actual exponent to be exp, I have to do bits := (exp+1023)<<52 | sig, correct?

– Ted
Nov 13 '18 at 14:54

1

@Ted Yes, that's right.

– icza
Nov 13 '18 at 14:57

add a comment |

The IEEE-754 standard defines the floating point arithmetics which Go uses for floating point numbers such as float32 and float64 (just like almost any other language).

Since your significand may be up to 52 bits, obviously it can only be represented using a float64 value.

The memory layout (bits) of a float64 value is described in Double-precision floating-point format.

Here's a picture of the bits of a float64 value (taken from Wikipedia):

enter image description here

You claim you have the exponent value and the significand (which is the fraction part).

You may use simple bitwise arithmetic to construct the 64-bit value of the floating point like this:

bits := exp<<52 | sig

(Note: exp and sig should be of type uint64. If not, use a type conversion.)

Once you have the bits, you may use the math.Float64frombits() function to get it as a float64 value:

f := math.Float64frombits(bits)

Note that the exponent value of the memory layout is not the "direct" number you have to use when calculating the value of the number, but:

The double-precision binary floating-point exponent is encoded using an offset-binary representation, with the zero offset being 1023; also known as exponent bias in the IEEE 754 standard.

So the number encoded in the above double-precision format is calculated like:

(-1)^sign x 2^e-1023 x 1.fraction

edited Nov 13 '18 at 14:58

answered Nov 13 '18 at 11:51

icza

168k25333366

Would Math.Ldexp help here? func Ldexp(frac float64, exp int) float64

– aMike
Nov 13 '18 at 13:57

@aMike I was considering it, but it takes the fraction as a float64 value, and it does something similar under the hood.

– icza
Nov 13 '18 at 13:58

I see, so if I want the actual exponent to be exp, I have to do bits := (exp+1023)<<52 | sig, correct?

– Ted
Nov 13 '18 at 14:54

1

@Ted Yes, that's right.

– icza
Nov 13 '18 at 14:57

add a comment |

The IEEE-754 standard defines the floating point arithmetics which Go uses for floating point numbers such as float32 and float64 (just like almost any other language).

Since your significand may be up to 52 bits, obviously it can only be represented using a float64 value.

The memory layout (bits) of a float64 value is described in Double-precision floating-point format.

Here's a picture of the bits of a float64 value (taken from Wikipedia):

enter image description here

You claim you have the exponent value and the significand (which is the fraction part).

You may use simple bitwise arithmetic to construct the 64-bit value of the floating point like this:

bits := exp<<52 | sig

(Note: exp and sig should be of type uint64. If not, use a type conversion.)

Once you have the bits, you may use the math.Float64frombits() function to get it as a float64 value:

f := math.Float64frombits(bits)

Note that the exponent value of the memory layout is not the "direct" number you have to use when calculating the value of the number, but:

The double-precision binary floating-point exponent is encoded using an offset-binary representation, with the zero offset being 1023; also known as exponent bias in the IEEE 754 standard.

So the number encoded in the above double-precision format is calculated like:

(-1)^sign x 2^e-1023 x 1.fraction

edited Nov 13 '18 at 14:58

answered Nov 13 '18 at 11:51

icza

168k25333366

The IEEE-754 standard defines the floating point arithmetics which Go uses for floating point numbers such as float32 and float64 (just like almost any other language).

Since your significand may be up to 52 bits, obviously it can only be represented using a float64 value.

The memory layout (bits) of a float64 value is described in Double-precision floating-point format.

Here's a picture of the bits of a float64 value (taken from Wikipedia):

enter image description here

You claim you have the exponent value and the significand (which is the fraction part).

You may use simple bitwise arithmetic to construct the 64-bit value of the floating point like this:

bits := exp<<52 | sig

(Note: exp and sig should be of type uint64. If not, use a type conversion.)

Once you have the bits, you may use the math.Float64frombits() function to get it as a float64 value:

f := math.Float64frombits(bits)

Note that the exponent value of the memory layout is not the "direct" number you have to use when calculating the value of the number, but:

The double-precision binary floating-point exponent is encoded using an offset-binary representation, with the zero offset being 1023; also known as exponent bias in the IEEE 754 standard.

So the number encoded in the above double-precision format is calculated like:

(-1)^sign x 2^e-1023 x 1.fraction

edited Nov 13 '18 at 14:58

answered Nov 13 '18 at 11:51

icza

168k25333366

edited Nov 13 '18 at 14:58

answered Nov 13 '18 at 11:51

icza

168k25333366

answered Nov 13 '18 at 11:51

icza

168k25333366

answered Nov 13 '18 at 11:51

icza

168k25333366

Would Math.Ldexp help here? func Ldexp(frac float64, exp int) float64

– aMike
Nov 13 '18 at 13:57

@aMike I was considering it, but it takes the fraction as a float64 value, and it does something similar under the hood.

– icza
Nov 13 '18 at 13:58

I see, so if I want the actual exponent to be exp, I have to do bits := (exp+1023)<<52 | sig, correct?

– Ted
Nov 13 '18 at 14:54

1

@Ted Yes, that's right.

– icza
Nov 13 '18 at 14:57

add a comment |

Would Math.Ldexp help here? func Ldexp(frac float64, exp int) float64

– aMike
Nov 13 '18 at 13:57

@aMike I was considering it, but it takes the fraction as a float64 value, and it does something similar under the hood.

– icza
Nov 13 '18 at 13:58

I see, so if I want the actual exponent to be exp, I have to do bits := (exp+1023)<<52 | sig, correct?

– Ted
Nov 13 '18 at 14:54

1

@Ted Yes, that's right.

– icza
Nov 13 '18 at 14:57

Would Math.Ldexp help here? func Ldexp(frac float64, exp int) float64

– aMike
Nov 13 '18 at 13:57

@aMike I was considering it, but it takes the fraction as a float64 value, and it does something similar under the hood.

– icza
Nov 13 '18 at 13:58

I see, so if I want the actual exponent to be exp, I have to do bits := (exp+1023)<<52 | sig, correct?

– Ted
Nov 13 '18 at 14:54

@Ted Yes, that's right.

– icza
Nov 13 '18 at 14:57

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Pfthb