Redundant basic blocks in llvm IR
I was just playing around with a program and watching its IR in llvm and I noticed certain basic blocks that don't make sense to me.
My code:
void proc()
int i, j, k, m, n, l;
k = 119;
for (i = 20; i < 200; i++)
for (j = 13; j < 130; j++)
l = 80;
Corresponding IR:
store i32 119, i32* %3, align 4
store i32 20, i32* %1, align 4
br label %7
; <label>:7: ; preds = %19, %0
%8 = load i32, i32* %1, align 4
%9 = icmp slt i32 %8, 200
br i1 %9, label %10, label %22
; <label>:10: ; preds = %7
store i32 13, i32* %2, align 4
br label %11
; <label>:11: ; preds = %15, %10
%12 = load i32, i32* %2, align 4
%13 = icmp slt i32 %12, 130
br i1 %13, label %14, label %18
; <label>:14: ; preds = %11
store i32 80, i32* %6, align 4
br label %15
; <label>:15: ; preds = %14
%16 = load i32, i32* %2, align 4
%17 = add nsw i32 %16, 1
store i32 %17, i32* %2, align 4
br label %11
; <label>:18: ; preds = %11
br label %19
; <label>:19: ; preds = %18
%20 = load i32, i32* %1, align 4
%21 = add nsw i32 %20, 1
store i32 %21, i32* %1, align 4
br label %7
; <label>:22: ; preds = %7
ret void
My problem is with labels 14 and 15. It looks like label 15 has just one predecessor, 14. Given that, why can't we just merge it with label 14? I am assuming that the basic block construction is done by the algorithm mentioned here.
compiler-construction llvm
|
show 3 more comments
I was just playing around with a program and watching its IR in llvm and I noticed certain basic blocks that don't make sense to me.
My code:
void proc()
int i, j, k, m, n, l;
k = 119;
for (i = 20; i < 200; i++)
for (j = 13; j < 130; j++)
l = 80;
Corresponding IR:
store i32 119, i32* %3, align 4
store i32 20, i32* %1, align 4
br label %7
; <label>:7: ; preds = %19, %0
%8 = load i32, i32* %1, align 4
%9 = icmp slt i32 %8, 200
br i1 %9, label %10, label %22
; <label>:10: ; preds = %7
store i32 13, i32* %2, align 4
br label %11
; <label>:11: ; preds = %15, %10
%12 = load i32, i32* %2, align 4
%13 = icmp slt i32 %12, 130
br i1 %13, label %14, label %18
; <label>:14: ; preds = %11
store i32 80, i32* %6, align 4
br label %15
; <label>:15: ; preds = %14
%16 = load i32, i32* %2, align 4
%17 = add nsw i32 %16, 1
store i32 %17, i32* %2, align 4
br label %11
; <label>:18: ; preds = %11
br label %19
; <label>:19: ; preds = %18
%20 = load i32, i32* %1, align 4
%21 = add nsw i32 %20, 1
store i32 %21, i32* %1, align 4
br label %7
; <label>:22: ; preds = %7
ret void
My problem is with labels 14 and 15. It looks like label 15 has just one predecessor, 14. Given that, why can't we just merge it with label 14? I am assuming that the basic block construction is done by the algorithm mentioned here.
compiler-construction llvm
Maybe the compiler always creates a basic block for the loop increment, so it can use that as the jump target if there's acontinue
in the loop (and it does that even when there's not because it doesn't hurt anything)? Or maybe the code just worked out more conveniently that way.
– sepp2k
Nov 12 '18 at 4:08
"I am assuming that the basic block construction is done by the algorithm mentioned here." That algorithm assumes that you sort existing three-address code into basic blocks, but LLVM code is always divided into basic blocks. LLVM frontends usually go from the AST to LLVM (I'm pretty sure clang does anyway), so we're talking about input that still contains loops and if statements, not jumps.
– sepp2k
Nov 12 '18 at 4:15
@sepp2k, "I am assuming that the basic block construction is done by the algorithm mentioned here.". By this, I meant the rules mentioned in that article regarding leaders, i.e. how splitting into basic blocks is done. Or do you mean llvm code is transformed from AST to a CFG?
– abjoshi
Nov 12 '18 at 4:34
I mean C code is transformed from an AST to LLVM code, which is a CFG. There's no such thing as LLVM code without basic blocks. So it's not likeC -> blockless LLVM-> LLVM-CFG
, it's justC -> LLVM
and that's your CFG. So identifying basic blocks isn't a separate step, it's part of generating LLVM.
– sepp2k
Nov 12 '18 at 4:42
1
Exactly, it just goes right from tree form to LLVM form, which is a CFG (at least I'm pretty sure that's the case as that's how LLVM is usually used - I haven't actually looked at clang's code, but I doubt that clang would define it's own TAC that's then translated to LLVM and there's definitely no blockless TAC form of LLVM).
– sepp2k
Nov 12 '18 at 4:55
|
show 3 more comments
I was just playing around with a program and watching its IR in llvm and I noticed certain basic blocks that don't make sense to me.
My code:
void proc()
int i, j, k, m, n, l;
k = 119;
for (i = 20; i < 200; i++)
for (j = 13; j < 130; j++)
l = 80;
Corresponding IR:
store i32 119, i32* %3, align 4
store i32 20, i32* %1, align 4
br label %7
; <label>:7: ; preds = %19, %0
%8 = load i32, i32* %1, align 4
%9 = icmp slt i32 %8, 200
br i1 %9, label %10, label %22
; <label>:10: ; preds = %7
store i32 13, i32* %2, align 4
br label %11
; <label>:11: ; preds = %15, %10
%12 = load i32, i32* %2, align 4
%13 = icmp slt i32 %12, 130
br i1 %13, label %14, label %18
; <label>:14: ; preds = %11
store i32 80, i32* %6, align 4
br label %15
; <label>:15: ; preds = %14
%16 = load i32, i32* %2, align 4
%17 = add nsw i32 %16, 1
store i32 %17, i32* %2, align 4
br label %11
; <label>:18: ; preds = %11
br label %19
; <label>:19: ; preds = %18
%20 = load i32, i32* %1, align 4
%21 = add nsw i32 %20, 1
store i32 %21, i32* %1, align 4
br label %7
; <label>:22: ; preds = %7
ret void
My problem is with labels 14 and 15. It looks like label 15 has just one predecessor, 14. Given that, why can't we just merge it with label 14? I am assuming that the basic block construction is done by the algorithm mentioned here.
compiler-construction llvm
I was just playing around with a program and watching its IR in llvm and I noticed certain basic blocks that don't make sense to me.
My code:
void proc()
int i, j, k, m, n, l;
k = 119;
for (i = 20; i < 200; i++)
for (j = 13; j < 130; j++)
l = 80;
Corresponding IR:
store i32 119, i32* %3, align 4
store i32 20, i32* %1, align 4
br label %7
; <label>:7: ; preds = %19, %0
%8 = load i32, i32* %1, align 4
%9 = icmp slt i32 %8, 200
br i1 %9, label %10, label %22
; <label>:10: ; preds = %7
store i32 13, i32* %2, align 4
br label %11
; <label>:11: ; preds = %15, %10
%12 = load i32, i32* %2, align 4
%13 = icmp slt i32 %12, 130
br i1 %13, label %14, label %18
; <label>:14: ; preds = %11
store i32 80, i32* %6, align 4
br label %15
; <label>:15: ; preds = %14
%16 = load i32, i32* %2, align 4
%17 = add nsw i32 %16, 1
store i32 %17, i32* %2, align 4
br label %11
; <label>:18: ; preds = %11
br label %19
; <label>:19: ; preds = %18
%20 = load i32, i32* %1, align 4
%21 = add nsw i32 %20, 1
store i32 %21, i32* %1, align 4
br label %7
; <label>:22: ; preds = %7
ret void
My problem is with labels 14 and 15. It looks like label 15 has just one predecessor, 14. Given that, why can't we just merge it with label 14? I am assuming that the basic block construction is done by the algorithm mentioned here.
compiler-construction llvm
compiler-construction llvm
asked Nov 12 '18 at 3:55
abjoshiabjoshi
798
798
Maybe the compiler always creates a basic block for the loop increment, so it can use that as the jump target if there's acontinue
in the loop (and it does that even when there's not because it doesn't hurt anything)? Or maybe the code just worked out more conveniently that way.
– sepp2k
Nov 12 '18 at 4:08
"I am assuming that the basic block construction is done by the algorithm mentioned here." That algorithm assumes that you sort existing three-address code into basic blocks, but LLVM code is always divided into basic blocks. LLVM frontends usually go from the AST to LLVM (I'm pretty sure clang does anyway), so we're talking about input that still contains loops and if statements, not jumps.
– sepp2k
Nov 12 '18 at 4:15
@sepp2k, "I am assuming that the basic block construction is done by the algorithm mentioned here.". By this, I meant the rules mentioned in that article regarding leaders, i.e. how splitting into basic blocks is done. Or do you mean llvm code is transformed from AST to a CFG?
– abjoshi
Nov 12 '18 at 4:34
I mean C code is transformed from an AST to LLVM code, which is a CFG. There's no such thing as LLVM code without basic blocks. So it's not likeC -> blockless LLVM-> LLVM-CFG
, it's justC -> LLVM
and that's your CFG. So identifying basic blocks isn't a separate step, it's part of generating LLVM.
– sepp2k
Nov 12 '18 at 4:42
1
Exactly, it just goes right from tree form to LLVM form, which is a CFG (at least I'm pretty sure that's the case as that's how LLVM is usually used - I haven't actually looked at clang's code, but I doubt that clang would define it's own TAC that's then translated to LLVM and there's definitely no blockless TAC form of LLVM).
– sepp2k
Nov 12 '18 at 4:55
|
show 3 more comments
Maybe the compiler always creates a basic block for the loop increment, so it can use that as the jump target if there's acontinue
in the loop (and it does that even when there's not because it doesn't hurt anything)? Or maybe the code just worked out more conveniently that way.
– sepp2k
Nov 12 '18 at 4:08
"I am assuming that the basic block construction is done by the algorithm mentioned here." That algorithm assumes that you sort existing three-address code into basic blocks, but LLVM code is always divided into basic blocks. LLVM frontends usually go from the AST to LLVM (I'm pretty sure clang does anyway), so we're talking about input that still contains loops and if statements, not jumps.
– sepp2k
Nov 12 '18 at 4:15
@sepp2k, "I am assuming that the basic block construction is done by the algorithm mentioned here.". By this, I meant the rules mentioned in that article regarding leaders, i.e. how splitting into basic blocks is done. Or do you mean llvm code is transformed from AST to a CFG?
– abjoshi
Nov 12 '18 at 4:34
I mean C code is transformed from an AST to LLVM code, which is a CFG. There's no such thing as LLVM code without basic blocks. So it's not likeC -> blockless LLVM-> LLVM-CFG
, it's justC -> LLVM
and that's your CFG. So identifying basic blocks isn't a separate step, it's part of generating LLVM.
– sepp2k
Nov 12 '18 at 4:42
1
Exactly, it just goes right from tree form to LLVM form, which is a CFG (at least I'm pretty sure that's the case as that's how LLVM is usually used - I haven't actually looked at clang's code, but I doubt that clang would define it's own TAC that's then translated to LLVM and there's definitely no blockless TAC form of LLVM).
– sepp2k
Nov 12 '18 at 4:55
Maybe the compiler always creates a basic block for the loop increment, so it can use that as the jump target if there's a
continue
in the loop (and it does that even when there's not because it doesn't hurt anything)? Or maybe the code just worked out more conveniently that way.– sepp2k
Nov 12 '18 at 4:08
Maybe the compiler always creates a basic block for the loop increment, so it can use that as the jump target if there's a
continue
in the loop (and it does that even when there's not because it doesn't hurt anything)? Or maybe the code just worked out more conveniently that way.– sepp2k
Nov 12 '18 at 4:08
"I am assuming that the basic block construction is done by the algorithm mentioned here." That algorithm assumes that you sort existing three-address code into basic blocks, but LLVM code is always divided into basic blocks. LLVM frontends usually go from the AST to LLVM (I'm pretty sure clang does anyway), so we're talking about input that still contains loops and if statements, not jumps.
– sepp2k
Nov 12 '18 at 4:15
"I am assuming that the basic block construction is done by the algorithm mentioned here." That algorithm assumes that you sort existing three-address code into basic blocks, but LLVM code is always divided into basic blocks. LLVM frontends usually go from the AST to LLVM (I'm pretty sure clang does anyway), so we're talking about input that still contains loops and if statements, not jumps.
– sepp2k
Nov 12 '18 at 4:15
@sepp2k, "I am assuming that the basic block construction is done by the algorithm mentioned here.". By this, I meant the rules mentioned in that article regarding leaders, i.e. how splitting into basic blocks is done. Or do you mean llvm code is transformed from AST to a CFG?
– abjoshi
Nov 12 '18 at 4:34
@sepp2k, "I am assuming that the basic block construction is done by the algorithm mentioned here.". By this, I meant the rules mentioned in that article regarding leaders, i.e. how splitting into basic blocks is done. Or do you mean llvm code is transformed from AST to a CFG?
– abjoshi
Nov 12 '18 at 4:34
I mean C code is transformed from an AST to LLVM code, which is a CFG. There's no such thing as LLVM code without basic blocks. So it's not like
C -> blockless LLVM-> LLVM-CFG
, it's just C -> LLVM
and that's your CFG. So identifying basic blocks isn't a separate step, it's part of generating LLVM.– sepp2k
Nov 12 '18 at 4:42
I mean C code is transformed from an AST to LLVM code, which is a CFG. There's no such thing as LLVM code without basic blocks. So it's not like
C -> blockless LLVM-> LLVM-CFG
, it's just C -> LLVM
and that's your CFG. So identifying basic blocks isn't a separate step, it's part of generating LLVM.– sepp2k
Nov 12 '18 at 4:42
1
1
Exactly, it just goes right from tree form to LLVM form, which is a CFG (at least I'm pretty sure that's the case as that's how LLVM is usually used - I haven't actually looked at clang's code, but I doubt that clang would define it's own TAC that's then translated to LLVM and there's definitely no blockless TAC form of LLVM).
– sepp2k
Nov 12 '18 at 4:55
Exactly, it just goes right from tree form to LLVM form, which is a CFG (at least I'm pretty sure that's the case as that's how LLVM is usually used - I haven't actually looked at clang's code, but I doubt that clang would define it's own TAC that's then translated to LLVM and there's definitely no blockless TAC form of LLVM).
– sepp2k
Nov 12 '18 at 4:55
|
show 3 more comments
1 Answer
1
active
oldest
votes
(This answer is mostly a summary of what I've already speculated in my comments, except it's more detailed and no longer speculation because I've now actually dived into clang's source code to verify that is what's happening)
LLVM code is always structured into basic blocks and the types used to represent LLVM code in the API already form a control flow graph. There is no such thing as an unstructured form of LLVM without basic blocks and thus no process of converting unstructured LLVM to a CFG. Clang directly translates the C AST to LLVM. So it does not find basic blocks in unstructured three-address code, it finds the basic blocks while translating from C to LLVM in a single step. Therefore the algorithm from Wikipedia does not apply.
What's happening instead is summarized by the following heavily simplified pseudo code loosely based on CodeGenFunction::EmitForStmt in CodeGen/CGStmt.cpp. This is the logic for translating a statement of the form for(init; cond; incr) body
. For simplicity we assume that neither cond
nor incr
are empty and that cond
is an expression, not a declaration.
- Create the new basic blocks:
conditionBlock
,bodyBlock
,incrBlock
andexitBlock
- append code for init to the current basic block, followed by a jump to
conditionBlock
- append code for
cond
toconditionBlock
, followed bybr i1 %cond, label %bodyBlock, label %exitBlock
- Push
break: exitBlock, continue: incrBlock
onto the break/continue stack - append code for
body
tobodyBlock
, followed by a jump toconditionBlock
- Pop the break/continue stack
- Set
exitBlock
as the current basic block
To get the output that you expected, it would have to emit the code for incr
into bodyBlock
instead of having a separate block for it. But then it couldn't push incrBlock
onto the break/continue stack. Of course that wouldn't matter in your case since your code does not contain any continue
statements, but in the general case the compiler needs the break/continue stack to know where to jump to in case of break
s or continue
s.
So the compiler simply always generates these blocks and then unnecessary blocks get merged during the optimization phase.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53255759%2fredundant-basic-blocks-in-llvm-ir%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
(This answer is mostly a summary of what I've already speculated in my comments, except it's more detailed and no longer speculation because I've now actually dived into clang's source code to verify that is what's happening)
LLVM code is always structured into basic blocks and the types used to represent LLVM code in the API already form a control flow graph. There is no such thing as an unstructured form of LLVM without basic blocks and thus no process of converting unstructured LLVM to a CFG. Clang directly translates the C AST to LLVM. So it does not find basic blocks in unstructured three-address code, it finds the basic blocks while translating from C to LLVM in a single step. Therefore the algorithm from Wikipedia does not apply.
What's happening instead is summarized by the following heavily simplified pseudo code loosely based on CodeGenFunction::EmitForStmt in CodeGen/CGStmt.cpp. This is the logic for translating a statement of the form for(init; cond; incr) body
. For simplicity we assume that neither cond
nor incr
are empty and that cond
is an expression, not a declaration.
- Create the new basic blocks:
conditionBlock
,bodyBlock
,incrBlock
andexitBlock
- append code for init to the current basic block, followed by a jump to
conditionBlock
- append code for
cond
toconditionBlock
, followed bybr i1 %cond, label %bodyBlock, label %exitBlock
- Push
break: exitBlock, continue: incrBlock
onto the break/continue stack - append code for
body
tobodyBlock
, followed by a jump toconditionBlock
- Pop the break/continue stack
- Set
exitBlock
as the current basic block
To get the output that you expected, it would have to emit the code for incr
into bodyBlock
instead of having a separate block for it. But then it couldn't push incrBlock
onto the break/continue stack. Of course that wouldn't matter in your case since your code does not contain any continue
statements, but in the general case the compiler needs the break/continue stack to know where to jump to in case of break
s or continue
s.
So the compiler simply always generates these blocks and then unnecessary blocks get merged during the optimization phase.
add a comment |
(This answer is mostly a summary of what I've already speculated in my comments, except it's more detailed and no longer speculation because I've now actually dived into clang's source code to verify that is what's happening)
LLVM code is always structured into basic blocks and the types used to represent LLVM code in the API already form a control flow graph. There is no such thing as an unstructured form of LLVM without basic blocks and thus no process of converting unstructured LLVM to a CFG. Clang directly translates the C AST to LLVM. So it does not find basic blocks in unstructured three-address code, it finds the basic blocks while translating from C to LLVM in a single step. Therefore the algorithm from Wikipedia does not apply.
What's happening instead is summarized by the following heavily simplified pseudo code loosely based on CodeGenFunction::EmitForStmt in CodeGen/CGStmt.cpp. This is the logic for translating a statement of the form for(init; cond; incr) body
. For simplicity we assume that neither cond
nor incr
are empty and that cond
is an expression, not a declaration.
- Create the new basic blocks:
conditionBlock
,bodyBlock
,incrBlock
andexitBlock
- append code for init to the current basic block, followed by a jump to
conditionBlock
- append code for
cond
toconditionBlock
, followed bybr i1 %cond, label %bodyBlock, label %exitBlock
- Push
break: exitBlock, continue: incrBlock
onto the break/continue stack - append code for
body
tobodyBlock
, followed by a jump toconditionBlock
- Pop the break/continue stack
- Set
exitBlock
as the current basic block
To get the output that you expected, it would have to emit the code for incr
into bodyBlock
instead of having a separate block for it. But then it couldn't push incrBlock
onto the break/continue stack. Of course that wouldn't matter in your case since your code does not contain any continue
statements, but in the general case the compiler needs the break/continue stack to know where to jump to in case of break
s or continue
s.
So the compiler simply always generates these blocks and then unnecessary blocks get merged during the optimization phase.
add a comment |
(This answer is mostly a summary of what I've already speculated in my comments, except it's more detailed and no longer speculation because I've now actually dived into clang's source code to verify that is what's happening)
LLVM code is always structured into basic blocks and the types used to represent LLVM code in the API already form a control flow graph. There is no such thing as an unstructured form of LLVM without basic blocks and thus no process of converting unstructured LLVM to a CFG. Clang directly translates the C AST to LLVM. So it does not find basic blocks in unstructured three-address code, it finds the basic blocks while translating from C to LLVM in a single step. Therefore the algorithm from Wikipedia does not apply.
What's happening instead is summarized by the following heavily simplified pseudo code loosely based on CodeGenFunction::EmitForStmt in CodeGen/CGStmt.cpp. This is the logic for translating a statement of the form for(init; cond; incr) body
. For simplicity we assume that neither cond
nor incr
are empty and that cond
is an expression, not a declaration.
- Create the new basic blocks:
conditionBlock
,bodyBlock
,incrBlock
andexitBlock
- append code for init to the current basic block, followed by a jump to
conditionBlock
- append code for
cond
toconditionBlock
, followed bybr i1 %cond, label %bodyBlock, label %exitBlock
- Push
break: exitBlock, continue: incrBlock
onto the break/continue stack - append code for
body
tobodyBlock
, followed by a jump toconditionBlock
- Pop the break/continue stack
- Set
exitBlock
as the current basic block
To get the output that you expected, it would have to emit the code for incr
into bodyBlock
instead of having a separate block for it. But then it couldn't push incrBlock
onto the break/continue stack. Of course that wouldn't matter in your case since your code does not contain any continue
statements, but in the general case the compiler needs the break/continue stack to know where to jump to in case of break
s or continue
s.
So the compiler simply always generates these blocks and then unnecessary blocks get merged during the optimization phase.
(This answer is mostly a summary of what I've already speculated in my comments, except it's more detailed and no longer speculation because I've now actually dived into clang's source code to verify that is what's happening)
LLVM code is always structured into basic blocks and the types used to represent LLVM code in the API already form a control flow graph. There is no such thing as an unstructured form of LLVM without basic blocks and thus no process of converting unstructured LLVM to a CFG. Clang directly translates the C AST to LLVM. So it does not find basic blocks in unstructured three-address code, it finds the basic blocks while translating from C to LLVM in a single step. Therefore the algorithm from Wikipedia does not apply.
What's happening instead is summarized by the following heavily simplified pseudo code loosely based on CodeGenFunction::EmitForStmt in CodeGen/CGStmt.cpp. This is the logic for translating a statement of the form for(init; cond; incr) body
. For simplicity we assume that neither cond
nor incr
are empty and that cond
is an expression, not a declaration.
- Create the new basic blocks:
conditionBlock
,bodyBlock
,incrBlock
andexitBlock
- append code for init to the current basic block, followed by a jump to
conditionBlock
- append code for
cond
toconditionBlock
, followed bybr i1 %cond, label %bodyBlock, label %exitBlock
- Push
break: exitBlock, continue: incrBlock
onto the break/continue stack - append code for
body
tobodyBlock
, followed by a jump toconditionBlock
- Pop the break/continue stack
- Set
exitBlock
as the current basic block
To get the output that you expected, it would have to emit the code for incr
into bodyBlock
instead of having a separate block for it. But then it couldn't push incrBlock
onto the break/continue stack. Of course that wouldn't matter in your case since your code does not contain any continue
statements, but in the general case the compiler needs the break/continue stack to know where to jump to in case of break
s or continue
s.
So the compiler simply always generates these blocks and then unnecessary blocks get merged during the optimization phase.
answered Nov 13 '18 at 21:40
sepp2ksepp2k
293k38593609
293k38593609
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53255759%2fredundant-basic-blocks-in-llvm-ir%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Maybe the compiler always creates a basic block for the loop increment, so it can use that as the jump target if there's a
continue
in the loop (and it does that even when there's not because it doesn't hurt anything)? Or maybe the code just worked out more conveniently that way.– sepp2k
Nov 12 '18 at 4:08
"I am assuming that the basic block construction is done by the algorithm mentioned here." That algorithm assumes that you sort existing three-address code into basic blocks, but LLVM code is always divided into basic blocks. LLVM frontends usually go from the AST to LLVM (I'm pretty sure clang does anyway), so we're talking about input that still contains loops and if statements, not jumps.
– sepp2k
Nov 12 '18 at 4:15
@sepp2k, "I am assuming that the basic block construction is done by the algorithm mentioned here.". By this, I meant the rules mentioned in that article regarding leaders, i.e. how splitting into basic blocks is done. Or do you mean llvm code is transformed from AST to a CFG?
– abjoshi
Nov 12 '18 at 4:34
I mean C code is transformed from an AST to LLVM code, which is a CFG. There's no such thing as LLVM code without basic blocks. So it's not like
C -> blockless LLVM-> LLVM-CFG
, it's justC -> LLVM
and that's your CFG. So identifying basic blocks isn't a separate step, it's part of generating LLVM.– sepp2k
Nov 12 '18 at 4:42
1
Exactly, it just goes right from tree form to LLVM form, which is a CFG (at least I'm pretty sure that's the case as that's how LLVM is usually used - I haven't actually looked at clang's code, but I doubt that clang would define it's own TAC that's then translated to LLVM and there's definitely no blockless TAC form of LLVM).
– sepp2k
Nov 12 '18 at 4:55