Hello everyone,
Welcome to the first issue of the PL Pragmatics newsletter - your crowdsourced window into the beautiful and complex world of programming languages, compilers, runtime systems and other related topics.
This issue mostly consists of things which I have personally found interesting in the past couple of months, but I hope to get more crowd submissions for the future issues.
Submit your links for future newsletters through this form.
If you have any feedback you can reach me on Twitter (@mraleph) or by mail (me@mrale.ph).
Enjoy the reading and happy hacking!
Interpreters
RegCPython: A Register-Based Python Interpreter for Better Performance
The topic of bytecode design is not very well covered in the literature, forcing language implementors to rely on a trial, error and anecdotes from their colleagues. “Is register based bytecode really faster to interpret than stack based one?” is one of the most commonly asked questions in this area.
The paper tries to answer this question in context of CPython - and the answer seems to be yes.
If you are interested in this topic also check out Quantifying the interpretation overhead of Python from the same authors.
CPython performance improvements in Python 3.11
[link]
Python 3.11 shipped a bunch of interesting performance optimisations in the interpreter. Some of these might seem obvious to any practitioner today, but applying them to a 30 years old code base is no small feat.
One of the most interesting parts of 3.11 release is PEP 659: Specializing Adaptive Interpreter.
Optimizations
Runtime and Compiler Support for HAMTs
[paper]
Hash Array Mapped Trie is a data structure commonly used to implement persistent dictionaries. HAMTs were originally described in Ideal Hash Trees and refined into Compressed Hash-Array Mapped Prefix-tree (CHAMP) in Optimizing Hash-Array Mapped Tries for Fast and Lean Immutable JVM Collections.
This paper describes the latest implementation of HAMT in Racket based on stencil vectors. Stencil vectors are essentially CHAMP tree nodes lifted into a built-in language primitive.
Allocation removal in the Toy Optimizer
[post]
Carl Friedrich Bolz-Tereick (@cfbolz) of PyPy continues his didactic journey through PyPy optimisations. He is building on his previous Toy Optimizer post and expanding the code to illustrate how allocation removal optimisation can be implemented.
Garbage Collection & Memory Management
Deep Dive into ZGC: A Modern Garbage Collector in JDK
[paper]
ZGC is the newest garbage collector added to HotSpot JVM. The code is out there in the OpenJDK git repository for anybody to read, but jumping into a code base of such complexity might prove challenging. This paper narrows the gap by providing a high-level overview of ZGC design.
LXR: Low-Latency, High-Throughput Garbage Collection
[paper]
LXR presents a different take on the problem of low-latency GC. It builds on reference counted Immix GC algorithm (Taking Off the Gloves with Reference Counting Immix) to optimise both responsiveness and throughput.
Optimal Heap Limits for Reducing Browser Memory Use
[paper] [via @tekknolagi]
A lot of GC research in the past years was done in the context of server workloads. If look at LXR paper above you’ll see benchmarks with heaps measuring in hundreds of megabytes.
It is refreshing to see a paper which evaluates GC heap sizing heuristics in a more consumer oriented setting - focusing specifically on the V8 JavaScript engine.
Towards the next generation of XNU memory safety: kalloc_type
Security engineers at Apple are sharing their experience hardening XNU memory allocator using type segregation.
Other topics
S6 Python JIT by DeepMind
[repo]
In September DeepMind research has open-sourced a Python JIT they have been experimenting with internally. S6 was an experiment to add a JIT compiler to CPython as a library. The project is no longer under active development, but compiler enthusiasts will certainly learn a thing or two from it.
Biased locking removal from HotSpot & real-world performance regressions
An interesting thread from openjdk hotspot-dev mailing list .
Biased locking is an optimisation which reduces the cost of uncontended locking. Biased Locking in HotSpot (2006) post by Dave Dice gives a good overview of the technique. The key insights are described in the original paper Quickly Reacquirable Locks while Eliminating synchronization-related atomic operations with biased locking and bulk rebiasing gives interesting refinements of the base technique.
In 2019 JEP 374 was created suggesting deprecation and removal of this optimisation from HotSpot JVM citing the change in performance landscape and high complexity of biased locking as motivation.
Magic-trace: Diagnosing tricky performance issues easily with Intel Processor Trace
[post] [via @LeafPetersen]
Intel Processor Trace is fascinating technology. An IPT data allows you observe precise flow of execution through your program down to the behaviour of each individual conditional branch. Combining this with timing information - which is also available in the IPT data - theoretically allows you to create a uniquely powerful tool.
This post describes magic-trace a tracing tool built on top of IPT by engineers at Jane Street.
That’s all for today. Thanks for reading :)
Wrt biased locking, the learning lock work takes it further, fixing some obvious deficiencies as well as constructing a framework for this style of optimization: https://doi.org/10.1145/2069172.2069179 Later similar work: https://doi.org/10.1145/2093157.2093184 The approach has cross over potential with reference counting.