Twotm8 (p.1): Introduction

scalascala3scala.jsscala-nativefly.iosn-bindgencpostgresopensslseries:twotm8

Series TL;DR

Introduction

If you have seen the introduction to my previous post I've confessed to being on a mission to deploy silly applications to various cloud providers preferably using non-JVM variations of Scala.

Part of it is trying to make things intentionally difficult for the sake of learning, part - exploring feasibility of things like Scala Native and Scala.js for various areas of development.

So here's the setup for this new series of posts.

Our project

We will be developing an arguably superior version of Twitter, with the following requirements:

  • All messages are maximum 128 characters long

  • Messages are called "twots"

  • Twots are always uppercase. You must shout your ideas from the rooftops

  • Users are called "thought leaders" and this is the place for them to thought lead (think lead? think have led?)

  • There are no likes - users can only react to twots by saying "uwotm8", thus enforcing a negative-only reaction landscape

    uwotm8 = "you what, mate?" in the land of the Queen

  • Number of uwotm8s affects the physical visibility of the twot on the website - the formula will be determined later. It won't be AI, but I'm sure we can get VC funding for this little bit of math

  • Thought leaders must be able to register, login, and follow each other

Backend

To get serious for a moment, if I were to choose JVM backend with a few functional libraries like Cats Effect 3, Http4s, and Skunk, I would be able to knock this out in a few hours and not break a sweat - it's a truly great experience, and I love both JVM and Scala's functional libraries.

That said, 'tis the season of pain and we will go down a path less travelled.

  • Main language and platform will be Scala Native

    What I want to achieve is to delegate the hardest parts of the backend to existing C libraries and use Scala as the glue language to put our business logic together on top of much more complicated existing C libraries

  • Our database of choice will be Postgres

    This is not an adventurous choice - Postgres is an extremely popular database and has proven its performance and stability many times over. I also haven't worked with RDBMS in over 10 years so I can learn and fail like in the good old days

  • Our web server will be NGINX Unit

    Normally, if I were using JVM, I would expose a naked JVM HTTP server (such as Http4s) and let the cloud provider terminate TLS connection for me. A functional runtime would take care of scheduling and concurrency, and I know for a fact that this pattern scales to millions of requests and dozens of instances.

    The issue is that Scala Native doesn't have support for multi-threading (yet).

    Instead we will use processed based parallelism - a web server (Unit) will launch multiple instances of our application (which, through the glory of Scala Native, will be a single small binary with instant startup), and distribute requests among them, handling failures, failovers, and load balancing.

  • Our cloud provider will be Fly.io

    For one, I find the idea of a minimal Docker image deployment containing only our Scala Native binary and NGINX Unit's own setup - quite appealing.

    It also provides simple creation of Postgres clusters and out-of-the-box replication.

    And lastly, it promises a super simple setup with just CLI commands, remote Docker image building, and a Github Action to deploy from CI

Frontend

This part is actually pretty simple - I am a big fan of Scala.js as a platform, and Laminar UI library in particular.

While I used Laminar multiple times to add user interface to internal developer tools, or to build projects I never had the patience to finish, I'm not very confident with it.

Neither I am confident with any sort of frontend development since the last time I did it for money (almost 15 years ago, dang).

To make the app feel fresh, I would like to make it a Single-Page Application. Gladly Nikita, Laminar's author, has the bases covered with a Waypoint library we can use to define our application's structure.

Application

With the general strokes around backend out of the way, there are still a few hard parts we need to solve in the app itself:

  • Interaction with NGINX Unit will be done with SNUnit

    Lorenzo Gabriele has built a minimal interface that wraps Unit's C API to handle requests, routing, and responses. We will use this library for the basic HTTP request/response models, and we'll add some helpers on top of it.

  • Interaction with Postgres will be done via libpq

    Libpq is the official C interface to Postgres - given that we're using Scala Native we should be able to directly interact with it without things like JNI or Project Panama.

  • To support authentication, we will need things like SHA256 and HMAC, and OpenSSL will provide those

    It's a C library with implementations of many cryptographic algorithms, and it's extensively tested. Part of the reason to use it is that Scala Native currently has no implementation of Java's MessageDigest API, and OpenSSL is quite intimidating in its size and influence, so it's worth checking it out.

  • To build facades for libpq and openssl, I will use my own binding generator

    I know this is cronyism and unfair, but nobody cares. The interfaces we will use are not actually that extensive, but doing this project led to several improvements in the generator itself, making it more stable and usable.

Writing Native applications

If your experience (like mine) mostly revolves around JVM, there are a few idiosyncracies that we need to address before we start doing actual coding.

First of all, our entire binary artifact with server-side logic will be a single file, 5-10MBs of machine code. Going from regular Scala code to this single binary file requires several steps, which are orchestrated by the Scala Native plugin for your build tool (I'm using SBT, which is officially published, but Mill works as well).

Scala compilation phase

The first step (and by far the quickest) is the Scala compilation phase.

                        ┌────────────────┐            
┌──────────────────┐    │  Dependencies  │            
│  Scala sources   │    │with *.nir files│            
└──────────────────┘    └────────────────┘            
          │                      │                    
          └──────────┬───────────┘                    
                     ▼                                
     ┌──────────────────────────────┐                 
     │        Scala compiler        │                 
     │                              │                 
     │  ┌─────────────────────────┐ │                 
     │  │  Scala Native compiler  │ │                 
     │  │         plugin          │ │                 
     │  └─────────────────────────┘ │                 
     └──────────────────────────────┘                 
                     │                                
                  ┌──┘                                
                  │      Scala Native code            
                  ▼          generator                
       ┌────────────────────┐         ┌──────────────┐
       │Compile *.nir files │────────▶│  *.ll files  │
       └────────────────────┘         └──────────────┘

The extra files are opaque to the user - you never need to manage or look at them yourself, but it's helpful to know how this works.

  • *.nir files are Native Intermediate Representation - Scala Native's own format which represents Scala's language constructs in a format more suitable for subsequent emission of LLVM IR files
  • *.ll files are LLVM IR. They are not special to Scala Native, and are the representation of code for compilation transformation and analysis.

The compilation itself differs very little from regular Scala 3 compilation on the JVM, apart from the things that are unrepresentable in the native code, in which cases the Scala Native compiler plugin will complain at you:

//> using platform "native"
//> using nativeVersion "0.4.4"

import scala.scalanative.unsafe.CFuncPtr1

@main def hello = 
  def create: Int => Int = _ + 25

  def createPtr(f: Int => Int): CFuncPtr1[Int, Int] =
    CFuncPtr1.fromScalaFunction(f)
// [error]   Function passed to method fromScalaFunction needs to be inlined
// [error]     CFuncPtr1.fromScalaFunction(f)
// [error]                                 ^

In this case the failure is specific to Scala Native - to create a C function pointer, the Scala function that defines it must be statically known.

Clang compilation

Next phase, which happens completely invisibly to the user, is converting generated .ll files, along with any C sources you would like to embed in your app into object files on your target platform.

This is done by invoking clang command and passing it all the source files (be it actual C sources, or LLVM IR files generated from Scala code).

In any project you can set SBT to debug mode and see those invocations:

sbt> clean; debug; nativeLink
....
[debug] Running
[debug] /usr/bin/clang
[debug] 	-O0
[debug] 	-fvisibility=hidden
[debug] 	-fexceptions
[debug] 	-fcxx-exceptions
[debug] 	-funwind-tables
[debug] 	-I/usr/local/include
[debug] 	-I/opt/homebrew/include
[debug] 	-Qunused-arguments
[debug] 	-I/opt/homebrew/opt/openssl/include
[debug] 	-Wno-override-module
[debug] 	-c
[debug] 	<..>/app/target/scala-3.1.1/native/2.ll
[debug] 	-o
[debug] 	<..>app/target/scala-3.1.1/native/2.ll.o

As a result of this step, object files *.o are produced.

Linking

This is the step where object files, along with any static libraries, are bundled together into a single binary.

Various optimisations happen at this phase (and previous ones), and the linker will verify that all the methods invoked in your application code can be matched up to one of the

  1. Symbols you defined yourself
  2. Symbols that are part of static libraries you are linking with your app
  3. Symbols that come from dynamic libraries
  4. Symbols that are part of the kernel or whatever, I don't know much

Any functions that are not used in the app will be eliminated as dead code.

Point (3) is very important - a lot of development libraries (libpq and openssl being two of them) are distributed in the form of binary artifacts (dynamic libraries, *.dylib on OSX, *.so on Linux, *.dll on Windows), and minimal C header files *.h to define the interface with those libraries.

[debug] Running
[debug] /usr/bin/clang++
[debug] 	-rdynamic
[debug] 	-o
[debug] 	<..>/app/target/scala-3.1.1/app-out
[debug] 	-Wno-override-module
[debug] 	<..>/app/target/scala-3.1.1/native/native-code-classes-0/scala-native/my.c.o
[debug] 	<..>/app/target/scala-3.1.1/native/native-code-classes-1/scala-native/libpq.c.o
[debug] 	<..>/app/target/scala-3.1.1/native/native-code-classes-2/scala-native/libhmac.c.o
[debug] 	<..>/app/target/scala-3.1.1/native/native-code-classes-2/scala-native/libcrypto.c.o
[debug] 	<..>/app/target/scala-3.1.1/native/4.ll.o
...
[debug] 	-L/usr/local/lib
[debug] 	-L/opt/homebrew/lib
[debug] 	-L/opt/homebrew/opt/libpq/lib
[debug] 	-L/opt/homebrew/opt/openssl/lib
[debug] 	-lpthread
[debug] 	-ldl
[debug] 	-lpq
[debug] 	-lcrypto
[debug] 	-lunit

In the end, this will produce a nice, almost self-contained small binary file. It's "almost" self-contained because we use dynamic linking - we're producing a binary which expects to load a particular dynamic library to use the symbols from it.

Specifically, the two dynamic libraries we will be using are OpenSSL's crypto and Postgres' libpq.

This means that the environment where our is deployed should at least have libpq installed, for our application to startup correctly.

To see the dynamic libraries your app will need, we can use the otool command on OS X (or ldd on Linux):

> otool -L twotm8
twotm8:
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1311.0.0)
	/opt/homebrew/opt/postgresql/lib/libpq.5.dylib (compatibility version 5.0.0, current version 5.14.0)
	/opt/homebrew/opt/openssl@3/lib/libcrypto.3.dylib (compatibility version 3.0.0, current version 3.0.0)
	/usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 1200.3.0)

So, at the very minimum our runtime environment for the backend must have libpq and openssl installed. OpenSSL is included by default in the Debian distribution we will be using as the base of our application Docker image.

Why do all this

My thinking process started with discovering Fly.io and its bold promise of super simple docker deploys.

To fully test the idea, I decided that if I deploy something, it has to be real - cannot be a simple hello world app. It was roughly at that time that I was getting deeper into my experimental binding generator, and to showcase it I decided to generate Libpq bindings.

When they turned out to be functional, I gained immediate false confidence in my abilities and decided to take several (ambitious) bites at once - single-page applications, NGINX Unit deployment, OpenSSL.. My little wooden toy app started to feel like a real boy some time soon, and at that point I thought it will be easy to just do a write up of what I've done, given that the app is mostly ready.

It took me a month of on and off work to refactor and rewrite every single line I thought was already perfect. Lots of corners were cut to get this closer to actual public release, even though as I go through the editing stage on all the 5 parts, I can already see how clumsy some of the things are.

General impressions

Instead of writing a Part 6: Conclusion, I will put my impressions here about the whole process:

  • Using Scala Native was pleasant enough

    The compilation and linking times are not stellar, and the feedback loop you get cannot be compared to JVM.

    That said, producing mostly self-contained binary artifacts is a huge win, and its memory usage was excellent, and for a single instance running in 256MBs of memory I was getting quite decent RPS without breaking the memory bank.

    The codebase turned out to be mostly functional (without effects management, of course), and Scala 3's strong typesystem proved to be enough to guide me through implementation - I very rarely debugged the behaviour at runtime, instead relying on compilation reports 80% of the time.

  • Using Scala 3 is excellent1

    I am a big fan of the braceless syntax, even though it has a lot of gotcha moments. Most of my personal projects of non-trivial size have been using this syntax, so I didn't struggle with it at all.

    Anonymous givens, opaque types, contextual functions, Mirrors, inlines - I've used all those features for well defined purposes and they worked really well

    1: My main struggles were, I suspect, down to the fact that I used only Scala Native and Scala.js, which seems to have been causing issues with Metals. Occasional stale diagnostics, and symbols disappearing from SemanticDB were confusing at times, but not really major obstacles. Keep in mind that my threshold for tooling pain is quite high, so your experience might be more frustrating.

  • Laminar and Scala.js are very nice

    Very few drawbacks here, after thrashing for a bit with Laminar's concepts, I was able to implement most of the functionality in a way I thought it has to be done, without much trial and error - and it worked.

  • Fly.io is a mostly positive experience

    Things I enjoyed:

    • Deployments are indeed that simple
    • flyctl CLI offers a lot of functionality and is easy to use
    • Dashboards contain enough metrics, presented in a clear way

    Things that didn't go so well:

    • Postgres became unresponsive twice

      First time I traced it back to a Fly.io outage, reported by several users. Second time I restarted the postgres instance myself (via the CLI) - took some time but there was no data loss when it came back up

      I didn't add any replicas to my cluster, and I haven't worked with RDBMS in a while, so perhaps I should've been ready for this.

    • Remote docker builder had its disk full

      This prevented automatic deployments from GHA. I killed the builder from the dashboard and re-ran the job, which worked fine.