Flambda2 Ep. 6: Strict Boolean Ifs
Welcome to a new episode of The Flambda2 Snippets!
In the previous article, we explored the general topic of IR semantics and their evolution: what it means for an intermediate representation to have well-defined semantics, why establishing those details matters, and the challenges that arise when evolving them in a large codebase.
Today, we put those ideas into practice. We are going to examine a concrete
case of IR evolution within the OCaml compiler: the transition from a
Generalized If to a Strict Boolean If semantics in the Lambda
intermediate representation. We will break down the technical motivations, walk
through the changes, and discuss the social dynamics of contributing such a
change to the upstream compiler.
We suggest to our readers to use the Table of Contents below to navigate the article at their leisure.
Technical Context: The Semantics of If
In the Lambda IR of the OCaml compiler, the if conditional was not required
to be a boolean. It was similar to that of C. Its semantics was that, if the
value of the conditional was 0 the false branch was to be taken, and in any
other case, the true branch was taken instead. This made it rather
straightforward to compile efficiently as it matches the usual semantics of C.
Here are a few examples of what that would look like in different representations:
(* in OCaml *)
if cond then case1 else case2
(* in Lambda IR *)
(if cond case1 case2)
/* in asm */
cmpq $1, %rax
je .LBL_ELSE
/* Not equal: case1 */
movq %rbx, %rax
ret
.LBL_ELSE:
/* Equal: case2 */
movq %rdi, %rax
ret
Since the source language uses boolean values for conditions, at some point the compiler IRs will have to switch from one semantics to the other.
We will call Strict Boolean If the version of the semantics that can be found
in the source language, where conditions can only be either true or
false, and not any other value. We will call Generalized If the version
of the semantics where only the single value 0 is considered false, and any
other value is considered true.
In older versions of OCaml, the translation was done very early. In particular,
as mentioned earlier, the Lambda IR used to follow the Generalized If semantics.
This allowed the bytecode compiler to benefit from a few optimisations
including the following one:
(* in OCaml *)
let is_non_empty list =
match list with
| [] -> false
| _ :: _ -> true
(* in Lambda IR with OCaml 4.12 *)
is_non_empty = (function list (if list 1 0))
The empty list is represented using value 0. Non-empty lists are allocated
memory blocks, and are therefore represented using a (non zero) pointer. Here the
compiler uses this fact to translate a match ... with into a single if
construct.
Strict Boolean Ifs in Flambda2
Flambda2 has always used Strict Boolean If, so the earlier versions had to
convert Lambda's Generalized If constructions conservatively using an
additional equality test. This interfered a bit with optimisations so we pushed
instead to change the semantics of Lambda to a Strict Boolean If and
perform that transformation to Generalized If later in the backend.
We chose Strict Boolean If for Flambda2's if semantic for the sake of
simplicity. One way to look at it would be to say: the singleton set {1} is
much simpler than the set of OCaml values excluding the integer 0. These
sets arise when considering the true branch of an if. Indeed, in Strict Boolean If, we know that the condition is true , which is the integer 1 in
OCaml, whereas in Generalized If it's any value except the integer 0.
This simplicity manifests itself at different levels: when processing an if
(codebase level), when reasoning (architectural considerations), and when
communicating (social level).
Here's an example of a basic optimisation:
(* example 1 *)
let cond x =
if x then true else false
This code is equivalent to let cond x = x.
We would like to perform this optimisation in Flambda2. Although people do
not usually write this kind of code, this pattern can occur after the branches
of the if have been simplified.
For more on how
Flambda2handles simultaneous optimisations, see our earlier article on Speculative Inlining.
However if we try to perform the optimisation in a language with the Integer If semantics it's much more complicated: the IR term for the cond function
is exactly the same as the IR for the is_non_empty one, but this optimisation
is illegal for the is_non_empty one. It would return a list instead of a
boolean. This is a problem.
Here's an example of an existing function (Option.is_none from the standard
library) which actually benefits from switching to Strict Boolean If.
(* Code from the standard library *)
let is_none option =
match option with
| None -> true
| Some _ -> false
Here's the Lambda code for this function shown in Scheme syntax, compiled
with Generalized If semantics:
(define (is_none option)
(if
option
0 1))
You will notice that the if cannot be removed and the code will branch at
runtime.
The same function again but compiled with Strict Boolean If semantics:
(define (is_none option)
(if
(is_int option)
0 1))
This might look less efficient with an additional primitive (is_int) however
the Strict Boolean If semantics allows us to apply the previously mentioned
optimisation and replace the if statement with its condition, thus this code
will not branch at runtime.
; If we applied this optimisation to Lambda
(define (is_none option)
(is_int option))
Now that we have covered the reasoning and inception of the support for Strict Boolean If inside Lambda, we'll talk a little bit about what we had to do in
order to make it a reality inside the upstream OCaml compiler.
Contributing to the Compiler
Almost the full story of this change can be seen by following this PR, it includes both the technical aspects (code changes) and most of the social interactions through the comments and linked issues.
Technical Aspect: what changed
If we look at what the PR changed, we can easily see that it is relatively small.
Some of the main changes:
- introducing an equality check when testing whether a value is
0or not; - some conditionals were added to differentiate between native and bytecode paths;
- a lot of inlined comments;
We don't touch the language itself, nor any of its intermediate representations.
This reveals a counter-intuitive reality: changing the semantics of a given IR requires updating the producers and consumers, but not the IR itself, that is, its type definition.
Here's the complete lists of the technical changes included in this PR:
- Disabling optimisations that were only valid when using
Generalized Ifsemantic; - Adapting an existing optimisation to take into account the new pattern for compiling conditions;
- Enabling the optimisation that we mentioned earlier and which became
possible with
Strict Boolean If. - A lot of Comments;
The lesson here is: changing semantic is merely done by notifying other devs.
Social aspect: other devs
"The social aspect" we are talking about here is in fact an umbrella term for a few different things.
It can describe different channels on which the OCaml Core Team members discuss these matters. The channels are usually either in person (within the same team, or at conferences and meetings), or through a private mailing list, but nowadays they are mostly held on the official repository through pull-request discussions, issue threads and so on. Concerning this work, most information is available online although some private discussions also occurred.
There is also a "social aspect" to the actual contribution. In addition to the changing of code, we wrote comments, documentation and a changelog entry - things that are necessary in proper open-source projects.
Additionally, since we do not have a formal specification for the Lambda IR,
comments are particularly important here because they help people familiarize
themselves with a given change, and therefore help in convincing them that they
can easily wrap their head around it.
Concrete Consequences
This change in the IR allowed for cleaning the semantics, generating better code, and helped towards upstreaming the FL2 project.
Of all these things, the first noticeable effect has been the improved optimisation. Even though it is difficult to quantify the impact, when this optimisation triggers, it's always a strict improvement for the users. Now, from a compiler dev's point of view, this change didn't increase the complexity of the codebase and also made the reason why the compiler works actually simpler. A win-win situation overall.
Unfortunately, it is not all sunshine and rainbows.
We had to change the code that relied on the Generalized Ifs semantics. We
did not actually expect any outside code to depend on the part we changed
within the compiler, because the public API of a compiler is the source
language... Which didn't change. Yet it so happened that another open-source
project used the compiler internals and did rely on the Generalized If
semantic.
This occurred to us in the form of a bug report. In the end, since the now faulty code of that project happened to replicate code we had just modified in the compiler, the fix itself came down to replicating this change. We wanted to mention this situation to illustrate the socially-oriented nature of these endeavours.
Conclusion
We used the case of changing the semantics of If within the OCaml compiler to
illustrate the social nature of IR evolution. By changing the behaviour of a
single construct, people outside the core working group can see their programs
break unexpectedly.
As we mentioned earlier, having a formal specification helps avoid ambiguities. In the case of Lambda, this work sparked conversations about creating a formal semantics—a step in the right direction.
It's worth noting that some projects have taken formal semantics seriously from the very beginning. WebAssembly, for instance, was designed with a formal specification from day one, which has helped avoid many of these issues.
Finally, we'd like to leave you with this relevant xkcd comic as a reminder that every change, no matter how well-intentioned, will inevitably break someone's workflow. The art lies in making changes that benefit the majority while minimising disruption—and communicating clearly throughout the process.
About OCamlPro:
OCamlPro is a R&D lab founded in 2011, with the mission to help industrial users benefit from experts with a state-of-the-art knowledge of programming languages theory and practice.
- We provide audit, support, custom developer tools and training for both the most modern languages, such as Rust, Wasm and OCaml, and for legacy languages, such as COBOL or even home-made domain-specific languages;
- We design, create and implement software with great added-value for our clients. High complexity is not a problem for our PhD-level experts. For example, we helped the French Income Tax Administration re-adapt and improve their internally kept M language, we designed a DSL to model and express revenue streams in the Cinema Industry, codename Niagara, and we also developed the prototype of the Tezos proof-of-stake blockchain from 2014 to 2018.
- We have a long history of creating open-source projects, such as the Opam package manager, the LearnOCaml web platform, and contributing to other ones, such as the Flambda optimizing compiler, or the GnuCOBOL compiler.
- We are also experts of Formal Methods, developing tools such as our SMT Solver Alt-Ergo (check our Alt-Ergo Users' Club) and using them to prove safety or security properties of programs.
Please reach out, we'll be delighted to discuss your challenges: contact@ocamlpro.com or book a quick discussion.
Most Recent Articles
2026
2025
2024
- opam 2.3.0 release!
- Optimisation de Geneweb, 1er logiciel français de Généalogie depuis près de 30 ans
- Alt-Ergo 2.6 is Out!
- Flambda2 Ep. 3: Speculative Inlining
- opam 2.2.0 release!
- Flambda2 Ep. 2: Loopifying Tail-Recursive Functions
- Fixing and Optimizing the GnuCOBOL Preprocessor
- OCaml Backtraces on Uncaught Exceptions
- Opam 102: Pinning Packages
- Flambda2 Ep. 1: Foundational Design Decisions
- Behind the Scenes of the OCaml Optimising Compiler Flambda2: Introduction and Roadmap