Twotm8 (p.1): Introduction
- Series TL;DR
- Writing Native applications
- Why do all this
- We are deploying a Scala Native backend (with NGINX Unit) and Scala.js frontend (with Laminar) on Fly.io
- We are using my SN binding generator
- We are using Scala 3 heavily
- Code on Github
- Deployed app
- Roach - postgres bindings and interface
- Full series
If you have seen the introduction to my previous post I've confessed to being on a mission to deploy silly applications to various cloud providers preferably using non-JVM variations of Scala.
Part of it is trying to make things intentionally difficult for the sake of learning, part - exploring feasibility of things like Scala Native and Scala.js for various areas of development.
So here's the setup for this new series of posts.
We will be developing an arguably superior version of Twitter, with the following requirements:
All messages are maximum 128 characters long
Messages are called "twots"
Twots are always uppercase. You must shout your ideas from the rooftops
Users are called "thought leaders" and this is the place for them to thought lead (think lead? think have led?)
There are no likes - users can only react to twots by saying "uwotm8", thus enforcing a negative-only reaction landscape
uwotm8 = "you what, mate?" in the land of the Queen
Number of uwotm8s affects the physical visibility of the twot on the website - the formula will be determined later. It won't be AI, but I'm sure we can get VC funding for this little bit of math
Thought leaders must be able to register, login, and follow each other
To get serious for a moment, if I were to choose JVM backend with a few functional libraries like Cats Effect 3, Http4s, and Skunk, I would be able to knock this out in a few hours and not break a sweat - it's a truly great experience, and I love both JVM and Scala's functional libraries.
That said, 'tis the season of pain and we will go down a path less travelled.
Main language and platform will be Scala Native
What I want to achieve is to delegate the hardest parts of the backend to existing C libraries and use Scala as the glue language to put our business logic together on top of much more complicated existing C libraries
Our database of choice will be Postgres
This is not an adventurous choice - Postgres is an extremely popular database and has proven its performance and stability many times over. I also haven't worked with RDBMS in over 10 years so I can learn and fail like in the good old days
Our web server will be NGINX Unit
Normally, if I were using JVM, I would expose a naked JVM HTTP server (such as Http4s) and let the cloud provider terminate TLS connection for me. A functional runtime would take care of scheduling and concurrency, and I know for a fact that this pattern scales to millions of requests and dozens of instances.
The issue is that Scala Native doesn't have support for multi-threading (yet).
Instead we will use processed based parallelism - a web server (Unit) will launch multiple instances of our application (which, through the glory of Scala Native, will be a single small binary with instant startup), and distribute requests among them, handling failures, failovers, and load balancing.
Our cloud provider will be Fly.io
For one, I find the idea of a minimal Docker image deployment containing only our Scala Native binary and NGINX Unit's own setup - quite appealing.
It also provides simple creation of Postgres clusters and out-of-the-box replication.
And lastly, it promises a super simple setup with just CLI commands, remote Docker image building, and a Github Action to deploy from CI
This part is actually pretty simple - I am a big fan of Scala.js as a platform, and Laminar UI library in particular.
While I used Laminar multiple times to add user interface to internal developer tools, or to build projects I never had the patience to finish, I'm not very confident with it.
Neither I am confident with any sort of frontend development since the last time I did it for money (almost 15 years ago, dang).
To make the app feel fresh, I would like to make it a Single-Page Application. Gladly Nikita, Laminar's author, has the bases covered with a Waypoint library we can use to define our application's structure.
With the general strokes around backend out of the way, there are still a few hard parts we need to solve in the app itself:
Interaction with NGINX Unit will be done with SNUnit
Lorenzo Gabriele has built a minimal interface that wraps Unit's C API to handle requests, routing, and responses. We will use this library for the basic HTTP request/response models, and we'll add some helpers on top of it.
Interaction with Postgres will be done via libpq
Libpq is the official C interface to Postgres - given that we're using Scala Native we should be able to directly interact with it without things like JNI or Project Panama.
To support authentication, we will need things like SHA256 and HMAC, and OpenSSL will provide those
It's a C library with implementations of many cryptographic algorithms, and it's extensively tested. Part of the reason to use it is that Scala Native currently has no implementation of Java's
MessageDigestAPI, and OpenSSL is quite intimidating in its size and influence, so it's worth checking it out.
To build facades for libpq and openssl, I will use my own binding generator
I know this is cronyism and unfair, but nobody cares. The interfaces we will use are not actually that extensive, but doing this project led to several improvements in the generator itself, making it more stable and usable.
Writing Native applications
If your experience (like mine) mostly revolves around JVM, there are a few idiosyncracies that we need to address before we start doing actual coding.
First of all, our entire binary artifact with server-side logic will be a single file, 5-10MBs of machine code. Going from regular Scala code to this single binary file requires several steps, which are orchestrated by the Scala Native plugin for your build tool (I'm using SBT, which is officially published, but Mill works as well).
Scala compilation phase
The first step (and by far the quickest) is the Scala compilation phase.
┌────────────────┐ ┌──────────────────┐ │ Dependencies │ │ Scala sources │ │with *.nir files│ └──────────────────┘ └────────────────┘ │ │ └──────────┬───────────┘ ▼ ┌──────────────────────────────┐ │ Scala compiler │ │ │ │ ┌─────────────────────────┐ │ │ │ Scala Native compiler │ │ │ │ plugin │ │ │ └─────────────────────────┘ │ └──────────────────────────────┘ │ ┌──┘ │ Scala Native code ▼ generator ┌────────────────────┐ ┌──────────────┐ │Compile *.nir files │────────▶│ *.ll files │ └────────────────────┘ └──────────────┘
The extra files are opaque to the user - you never need to manage or look at them yourself, but it's helpful to know how this works.
*.nirfiles are Native Intermediate Representation - Scala Native's own format which represents Scala's language constructs in a format more suitable for subsequent emission of LLVM IR files
*.llfiles are LLVM IR. They are not special to Scala Native, and are the representation of code for compilation transformation and analysis.
The compilation itself differs very little from regular Scala 3 compilation on the JVM, apart from the things that are unrepresentable in the native code, in which cases the Scala Native compiler plugin will complain at you:
//> using platform "native" //> using nativeVersion "0.4.4" import scala.scalanative.unsafe.CFuncPtr1 @main def hello = def create: Int => Int = _ + 25 def createPtr(f: Int => Int): CFuncPtr1[Int, Int] = CFuncPtr1.fromScalaFunction(f) // [error] Function passed to method fromScalaFunction needs to be inlined // [error] CFuncPtr1.fromScalaFunction(f) // [error] ^
In this case the failure is specific to Scala Native - to create a C function pointer, the Scala function that defines it must be statically known.
Next phase, which happens completely invisibly to the user, is converting
.ll files, along with any C sources you would like to embed in your app into object files
on your target platform.
This is done by invoking
clang command and passing it all the source files
(be it actual C sources, or LLVM IR files generated from Scala code).
In any project you can set SBT to debug mode and see those invocations:
sbt> clean; debug; nativeLink .... [debug] Running [debug] /usr/bin/clang [debug] -O0 [debug] -fvisibility=hidden [debug] -fexceptions [debug] -fcxx-exceptions [debug] -funwind-tables [debug] -I/usr/local/include [debug] -I/opt/homebrew/include [debug] -Qunused-arguments [debug] -I/opt/homebrew/opt/openssl/include [debug] -Wno-override-module [debug] -c [debug] <..>/app/target/scala-3.1.1/native/2.ll [debug] -o [debug] <..>app/target/scala-3.1.1/native/2.ll.o
As a result of this step, object files
*.o are produced.
This is the step where object files, along with any static libraries, are bundled together into a single binary.
Various optimisations happen at this phase (and previous ones), and the linker will verify that all the methods invoked in your application code can be matched up to one of the
- Symbols you defined yourself
- Symbols that are part of static libraries you are linking with your app
- Symbols that come from dynamic libraries
- Symbols that are part of the kernel or whatever, I don't know much
Any functions that are not used in the app will be eliminated as dead code.
Point (3) is very important - a lot of development libraries (libpq and openssl being two of them) are distributed in the form of binary artifacts (dynamic libraries,
*.dylib on OSX,
*.so on Linux,
*.dll on Windows), and minimal C header files
*.h to define the interface with those libraries.
[debug] Running [debug] /usr/bin/clang++ [debug] -rdynamic [debug] -o [debug] <..>/app/target/scala-3.1.1/app-out [debug] -Wno-override-module [debug] <..>/app/target/scala-3.1.1/native/native-code-classes-0/scala-native/my.c.o [debug] <..>/app/target/scala-3.1.1/native/native-code-classes-1/scala-native/libpq.c.o [debug] <..>/app/target/scala-3.1.1/native/native-code-classes-2/scala-native/libhmac.c.o [debug] <..>/app/target/scala-3.1.1/native/native-code-classes-2/scala-native/libcrypto.c.o [debug] <..>/app/target/scala-3.1.1/native/4.ll.o ... [debug] -L/usr/local/lib [debug] -L/opt/homebrew/lib [debug] -L/opt/homebrew/opt/libpq/lib [debug] -L/opt/homebrew/opt/openssl/lib [debug] -lpthread [debug] -ldl [debug] -lpq [debug] -lcrypto [debug] -lunit
In the end, this will produce a nice, almost self-contained small binary file. It's "almost" self-contained because we use dynamic linking - we're producing a binary which expects to load a particular dynamic library to use the symbols from it.
Specifically, the two dynamic libraries we will be using are OpenSSL's
crypto and Postgres'
This means that the environment where our is deployed should at least have libpq installed, for our application to startup correctly.
To see the dynamic libraries your app will need, we can use the
otool command on OS X (or
ldd on Linux):
> otool -L twotm8 twotm8: /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1311.0.0) /opt/homebrew/opt/postgresql/lib/libpq.5.dylib (compatibility version 5.0.0, current version 5.14.0) /opt/homebrew/opt/[email protected]/lib/libcrypto.3.dylib (compatibility version 3.0.0, current version 3.0.0) /usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 1200.3.0)
So, at the very minimum our runtime environment for the backend must have
OpenSSL is included by default in the Debian distribution we will be using as the base of our application Docker image.
Why do all this
My thinking process started with discovering Fly.io and its bold promise of super simple docker deploys.
To fully test the idea, I decided that if I deploy something, it has to be real - cannot be a simple hello world app. It was roughly at that time that I was getting deeper into my experimental binding generator, and to showcase it I decided to generate Libpq bindings.
When they turned out to be functional, I gained immediate false confidence in my abilities and decided to take several (ambitious) bites at once - single-page applications, NGINX Unit deployment, OpenSSL.. My little wooden toy app started to feel like a real boy some time soon, and at that point I thought it will be easy to just do a write up of what I've done, given that the app is mostly ready.
It took me a month of on and off work to refactor and rewrite every single line I thought was already perfect. Lots of corners were cut to get this closer to actual public release, even though as I go through the editing stage on all the 5 parts, I can already see how clumsy some of the things are.
Instead of writing a Part 6: Conclusion, I will put my impressions here about the whole process:
Using Scala Native was pleasant enough
The compilation and linking times are not stellar, and the feedback loop you get cannot be compared to JVM.
That said, producing mostly self-contained binary artifacts is a huge win, and its memory usage was excellent, and for a single instance running in 256MBs of memory I was getting quite decent RPS without breaking the memory bank.
The codebase turned out to be mostly functional (without effects management, of course), and Scala 3's strong typesystem proved to be enough to guide me through implementation - I very rarely debugged the behaviour at runtime, instead relying on compilation reports 80% of the time.
Using Scala 3 is excellent1
I am a big fan of the braceless syntax, even though it has a lot of gotcha moments. Most of my personal projects of non-trivial size have been using this syntax, so I didn't struggle with it at all.
Anonymous givens, opaque types, contextual functions,
Mirrors, inlines - I've used all those features for well defined purposes and they worked really well
1: My main struggles were, I suspect, down to the fact that I used only Scala Native and Scala.js, which seems to have been causing issues with Metals. Occasional stale diagnostics, and symbols disappearing from SemanticDB were confusing at times, but not really major obstacles. Keep in mind that my threshold for tooling pain is quite high, so your experience might be more frustrating.
Laminar and Scala.js are very nice
Very few drawbacks here, after thrashing for a bit with Laminar's concepts, I was able to implement most of the functionality in a way I thought it has to be done, without much trial and error - and it worked.
Fly.io is a mostly positive experience
Things I enjoyed:
- Deployments are indeed that simple
flyctlCLI offers a lot of functionality and is easy to use
- Dashboards contain enough metrics, presented in a clear way
Things that didn't go so well:
Postgres became unresponsive twice
First time I traced it back to a Fly.io outage, reported by several users. Second time I restarted the postgres instance myself (via the CLI) - took some time but there was no data loss when it came back up
I didn't add any replicas to my cluster, and I haven't worked with RDBMS in a while, so perhaps I should've been ready for this.
Remote docker builder had its disk full
This prevented automatic deployments from GHA. I killed the builder from the dashboard and re-ran the job, which worked fine.