Hello everyone,
It is already January 2023 and I am just sitting down to go through bookmarks accumulated during November and December. Sorry for the delay - too much stuff was going on at the end of the year.
Remembering Chris Seaton
I would like to start this issue by remembering Chris Seaton, a fellow compiler engineer and a wonderful human being who has unfortunately passed away in early December. Please take a moment to appreciate his work on TruffleRuby, an implementation of Ruby on top of Truffle framework. The news of Chris passing away left me devastated and heartbroken. It’s difficult to think that I will never ever encounter Chris on some random compiler conference. Seeing his name in acknowledgement sections of papers brings tears to my eyes. Rest in peace, Chris.
My personal plans for 2023
As some of you might know I spend most of my working days on Dart programming language - I have joined the team as a compiler engineer in 2012, but over the years my role and responsibilities slowly evolved and grew in scope. In 2012 I have spent most of my time cranking out C++ code in the Dart VM. In 2022 I have spent most of my time talking to people, aligning and unblocking various teams and writing Google Docs. Few times I managed to carve out some quality prototyping time and dug deep into a problem. The trend will probably continue in 2023 - more talking, strategising, aligning and Docs.
That being said, there are few technical problems in Dart that I have very keen interest in: one topic consistently on my mind is the performance of our program analysis tooling. In 2023 I would like to understand the performance of these tools end-to-end and find the ways to considerably improve it. These tools are currently written in Dart using a fairly classical OO design. Does this considerably slow these tools down? Can we improve this without rewriting these tools in a lower level language?
JavaScript tooling ecosystem is going through a process of ditching tools written in JavaScript and replacing them with tools written in Go and Rust. I would like to think that Dart should have enough horse powers under the hood to avoid rewrite in “foreign” languages - but this needs more investigation.
I have been poking around trying to find any literature which explores problems like parsing, type inference and AST traversals from the perspective of mechanical sympathy, but so far I have not really found any academic research on this topic. @pervognsen and @cfbolz provided some interesting thoughts on the topic - but I am interested to hear more if somebody has any pointers.
Disclaimer about content
You can always submit your links for future newsletters through this form.
If you have any feedback you can reach me on Twitter (@mraleph), Mastodon (@mraleph@mastodon.social) or by mail (me@mrale.ph).
Enjoy the reading and happy hacking!
Books
Given that it is essentially a EOY issue, I have decided to include a few relevant books.
SSA-based Compiler Design
[book]
This book about static single-assignment (SSA) form was in the making for more than a decade. I remember preordering it early 2011 when I heard about it for the first time. It finally become available for order in December 2022 🥳
The title of the book is somewhat misleading - don’t buy it expecting an equivalent of Appel’s Modern Compiler Implementation in ML (aka “Tiger Book“). Despite its title this book is not going to walk you through a compiler end-to-end.
A much better title for this book would probably be “Fundamental SSA-based Compiler Algorithms”. Read this book if you need a comprehensive overview of what SSA form is and how compiler can use it.
Understanding Software Dynamics
Systems Performance: Enterprise and the Cloud, 2nd Ed.
[book]
Working on a programming language with a “batteries included” runtime and complex tooling makes you encounter problems which go beyond straight-line computational performance. These two books are a great source of insight for such occasions. Understanding Software Dynamics by Richard Sites teaches you the mental framework for analysing performance of complex systems. Systems Performance by Brendan Gregg is a treasure chest filled with performance analysis tooling.
Links
blink virtual machine
[repo]
A miniature (<170kb) virtual machine for X86-64 instructions. Written in C. It is faster and smaller than qemu-x86_64 and even supports reverse debugging. 🤯 Utterly beautiful.
Why is Rosetta 2 fast?
[post]
Continuing on the topic of CPU emulation - an interesting post exploring the tricks Rosetta 2 (X86-on-ARM64) employs to achieve its remarkable performance.
deegen framework & luajit-remake
Authors of Copy and Patch Compilation, Haoran Xu et al, continue the journey to build a C++/LLVM based alternative to language implementation frameworks like RPython and Truffle.
Building upon their initial ideas from Copy&Patch they create a framework which automates creation of an efficient low-level interpreter from the C++ code which describes individual bytecode instructions. Deegen framework has builtin support for inline caching.
The authors have used Deegen to implement a new interpreter for the Lua programming language and show that it favourably compares against LuaJIT’s handwritten interpreter.
The reader must be cautioned though that this comparison is not entirely apples-to-apples as their interpreter implements hidden class based optimisations (inline caching), while LuaJIT does not. It might be interesting to compare Deegen based implementation against Luau instead as it actually implements inline caching
Nevertheless, results are impressive and I am looking forward to the next stage: derivation of multi-tiered method JIT from bytecode interpreter.
Fast memcpy, A System Design
Richard Sites (the same Richard Sites who wrote “Understanding Software Dynamics” book recommended above) describes his experience with optimising memory copy operation for Google servers. Turns out servers do spend a lot of time copying bytes around: “When I worked at Google, fleet-wide profiling revealed that 25-35% of all CPU time was spent just moving bytes around“.
Who You Gonna Call: Analyzing the Run-time Call-Site Behavior of Ruby Applications
[paper]
This paper by Sophie Kaleba et al explores call site polymorphism in the real world Ruby-on-Rails applications.
Finding JIT Optimizer Bugs using SMT Solvers and Fuzzing
Carl Friedrich Bolz-Tereick describes his experiments with using Z3 solver to find PyPy miscompilation bugs by generating random traces, optimising them with PyPy and then asking Z3 to prove equivalence of unoptimised and optimised traces.
Seeing through hardware counters: a journey to threefold performance increase
An interesting post from Netflix engineers that use monitoring and low-level profiling tools to (re)discover and fix performance bottleneck inside JVM’s implementation of subtype testing, which only manifest in highly concurrent scenarios.
Precise Stack Scanning in C++
Research report from the Chromium’s Oilpan GC team. They have experimented with using LLVM stack map machinery to build a precise stack scanning implementation for C++ code.
Andy Wingo on Emphemerons in Guile
[post #1] [post #2] [post #3] [post #4]
Andy Wingo describes his adventures supporting emphemerons in Guile GC.
The URL for the SSA-based compiler design book seems to be wrong. https://a.co/d/hdM6Lra ?