Open Source Scala I: versions, platforms, artifacts

scalascala3scala-nativescala-jsopen-sourceseries:open-source-scala

Introduction

A frequent (and not entirely incorrect) take on developing and maintaining Scala libraries is that it is unnecessarily complicated.

Rather than frothing against that notion, I would like to explore how the complexity of maintaining a Scala library can grow dramatically once we extend our reach further out to

  1. platforms other than JVM
  2. major, minor, and experimental Scala versions
  3. code quality tools (in a later installment)
  4. major releases of the dependencies we build upon
  5. documentation tools (in a later installment)

Each direction brings its own complexity. As any software engineer knows, solving several relatively simple problems does not prepare you for solving a combination of them.

Who is this for?

More than anything, this is a brain dump of a lot of things I found necessary to understand the ecosystem of modern Scala libraries. A lot of the things will be already known to the reader, some - less so. All of them were useful when I started contributing to existing open source projects, which boosted my confidence and skillset.

The tools and techniques I've learned studing Scala open source ecosystem directly led to improvements of development processes at work.

Users of Scala libraries

Where and how will my library be used?

For Scala, the answer can get pretty complex. In its simplest form, when you want to make your libraries only available on the JVM for a particular binary Scala version, your life will be very, very easy.

TL;DR: just publishing your library for JVM-only, Scala 2.13 will guarantee that it will reach most of Scala's users who have the need for such a library.

That said, in many cases the overhead of publishing for other Scala versions and platforms can be kept quite low, and can ensure an even wider audience.

Scala versions

Currently, the most used Scala versions are 2.12 and 2.13.

Spark users have been locked to Scala 2.11 for a long time, but more recent versions have started using 2.12 exclusively, and some work is ongoing to provide support for 2.13.

Within the 2.x lineage, minor releases are binary incompatible with each other. This means that if the library is only built using 2.13.*, then projects that are only built with 2.12.* will not be able to use it. On the other hand, patch versions are fully binary compatible with each other. That means that our library can continue being built with 2.12.2 and 2.13.0 and the users can run 2.12.14 and 2.13.6 respectively and consume our library's artifacts without any issues.

While the usage of Scala 2.12 is shrinking, there's still a huge amount of actively developed code in the wild that is locked to 2.12 for a variety of reasons (missing dependencies, incompatibilities, reliance on 2.12-only language features, the list goes on).

Because of that, most libraries are cross-published for Scala 2.12 and 2.13 at least. Some library maintainers go as a far as removing 2.12 from their builds, to ease the maintenance burden, but I personally feel it's very optimistic. That said, it may well be the lever that convinces more engineering managers to invest in upgrading to Scala 2.13.

As long as the library we're building is published for any patch version of both 2.12 and 2.13, we can cover a very large group of potential users.

Scala 3

Scala 3 (previously called Dotty) is a new Scala compiler, written completely from scratch.

A lot of effort has been put into making sure that vast majority of Scala 2 libraries can be built with Scala 3. A subset of Scala 3's syntax, type system, and features is compatible with Scala 2.13, which helped the transition.

The various communities within Scala ecosystem made significant efforts to provide Scala 3 artifacts of their libraries:

  • At the time of writing, most of the functional ecosystem (be it Cats/Cats Effect, Monix, and many libraries built on top of them) are fully available for Scala 3 and have been for some time. Trailblazers already report Scala 3 services running in production, which is very impressive and terrifying.

  • Play and Akka ecosystems are in the process of making their artifacts available for Scala 3. The process is usually stifled by the usage of Scala 2 macros, which are not supported and will be very hard to port to Scala 3's metaprogramming system.

    As of version 2.6.18, Akka publishes artifacts for Scala 3

  • Most of the com.lihaoyi ecosystem (upickle, pprint, requests-scala, os-lib) is also published for Scala 3. Ammonite added initial Scala 3 support in 2.4.0, while Fastparse is currently Scala 2 only as it relies heavily on Scala 2 macros

All of this is a long way of saying that if you want to future-proof your library today, it's better to make sure it's being published for Scala 3. We will come back to this subject in part II.

The good news that once Scala 3 pushes out most Scala 2 versions (years from now), the binary compatibility story will be much better - similar to the binary compatibilities that Scala.js has been offering:

...a library that was compiled against the Scala 3.2 standard library can be safely used with Scala 3.4. There is no need for library maintainers to re-publish when a new Scala 3.x minor release becomes available

As with many things, the compatibility is harder than it looks at first sight, so Scala 3 team is working on improving the forward compatibility after the experience of Scala 3.1 propagating through the ecosystem and forcing library maintainers and downstream users to upgrade.

Scala platforms

Java Virtual Machine (JVM)

The most popular platform and where experience is the most polished. Doesn't mean that it's perfect, but most library maintainers tend to put the most effort towards this platform as this is where vast majority of Scala users are.

Main pain points here are related mostly to the following:

  • Different versions of the Java platform (the good old days of assuming that nobody runs anything above JDK 1.8 are over)

    This mostly relates to incompatible versions of the bytecode produced when the library artifact was built, or the usage of APIs that are removed/deprecated/added in the version of the JDK that the users of the library will be using.

    The bytecode compatibility story is simpler, as by default the scala compiler will produce bytecode version 8, which can be read by all the versions of Java runtime above it. As long as you're not messing with the -target flag of the compiler.

    Scala 2 compiler maintains compatibility with Java 8 (while also adding support for newer JDKs) and possibly will retain it forever.

    In terms of features, one must be careful in two situations:

    • If you have Java sources in the project, make sure they're compiled targeting JRE 8 - by using a --release 8 flag. From SBT you can pass flags to the java compiler using javacOptions setting (we'll touch on that in a later post)

    • If you interact with some features of the JDK that were added in later versions, your users might not be able to run the binaries you produce. A useful site for comparing different JDK releases is Java Version Almanac

  • JVM's execution model with its super-late linking.

    What it means in practice is that prior to running your application, the code you wrote and the code from external libraries is represented as a loose, flat collection of .class files, which reference methods, classes, and values from each other by name.

    This means that if you (or the build tool, or the external library author) get this list of .class files wrong (incompatibilities in defined methods/classes/parameters lists, etc.), you will get a nasty, non-recoverable exception only at runtime.

JavaScript (Node or browser): Scala.js

A mature, well established and constantly growing platform - compiling Scala code to JavaScript, for both Node.js and browser use.

Potential difficulties that publishing for Scala.js brings are:

  • All the joys of JavaScript ecosystem, with things like different JS standards, module system, disparity in APIs between browsers and Node.js

  • Scala.js compiler's own versioning system.

    As the compiler is evolving, it might need to make some breaking changes in the APIs, meaning the libraries built for one version of Scala.js might not be usable on another.

    For example, here's the note about what a minor release means in Scala.js versioning system (i.e. a bump from 1.6.0 to 1.7.0):

    It is backward binary compatible with all earlier versions in the 1.x series: libraries compiled with 1.0.x through 1.6.x can be used with 1.7.0 without change.

    It is not forward binary compatible with 1.6.x: libraries compiled with 1.7.0 cannot be used with 1.6.x or earlier.

    It is not entirely backward source compatible: it is not guaranteed that a codebase will compile as is when upgrading from 1.6.x (in particular in the presence of -Xfatal-warnings).

    In practice this is usually quite simple - most projects often bump Scala.js to latest release without a second thought.

Native code via LLVM: Scala Native

An old experimental project that has recently been taken under the Scala Center's wing and has received numerous improvements, with added support for Scala 2.13 being the most welcome.

Experimental support for Scala 3 has been released in version 0.4.3-SNAPSHOT.

As Scala Native is more active than ever before, a lot of maintainers add Native support to their libraries.

(meta) <your-dependency-major-version> platform

Another way a library's build can become more complex is if we want to target two different incompatible versions of some major library. In that case, we need to produce two distinct artifacts (different in version and/or name itself) for users of different versions of the dependency.

Examples of that can be different versions of

  • AWS SDK (v1 and v2 are completely incompatible), or

  • Cats Effect (versions 2.x and 3.x are both source and binary incompatible, and both are used in the wild extensively), or

  • Http4s which has incompatible lineages for Cats Effect 2 and Cats Effect 3

  • Many other libraries that are following different bincompat guarantees (like Play ecosystem, which allows breaking changes in minor versions, e.g. 2.7.x to 2.8.x)

In some cases it might be impossible or unjustifiably difficult to create and maintain a codebase that caters and publishes for different versions of the same library.

As a personal note, if your dependencies maintain two binary lineages, then you can either do the same, or choose one and force the users to upgrade. With reality of open source maintenance often being a burden, choose what is right for your mental health and the amount of time you have to dedicate to OSS.

No one is getting any younger or healthier.

What are Scala artifacts?

The general process is always the same - someone wrote the code, that code was compiled, and resulting artifacts are packaged in some way and uploaded somewhere where user's build tool can find and download those dependencies:

The build tool, such as SBT or Mill, will be responsible for

  1. Discovery of source files depending on your module structure

  2. Interacting with the necessary compiler (Scala or Java) to produce .class files

  3. (optionally) Injecting Scala.js or Scala Native compiler plugins into the compilation pipeline to produce necessary intermediary files

    • Intermediary representation is what is necessary to lower a complicated language such as Scala into a simpler representation, such as LLVM for Scala Native or Javascript-compatible representation for Scala.js
  4. Packaging compiled .class files in .jar artifacts with necessary metadata

Platform-specific artifacts

The .jar format is used in an overwhelming majority of the scenarios, and it houses both regular .class files understood by the JVM, and the .sjsir and .nir intermediate files for Scala.js/Scala Native.

For example, here's the location of Cats' jar file on the Maven Central repository:

❯ curl -s -Lo cats.zip https://repo1.maven.org/maven2/org/typelevel/cats-core_2.13/2.6.1/cats-core_2.13-2.6.1.jar

❯ unzip -l cats.zip | grep class | head
     6357  2010-01-01 00:00   cats/Align$$anon$1.class
    29657  2010-01-01 00:00   cats/Align$$anon$2.class
     4737  2010-01-01 00:00   cats/Align$.class
      345  2010-01-01 00:00   cats/Align$AllOps.class
     4111  2010-01-01 00:00   cats/Align$Ops.class
     3373  2010-01-01 00:00   cats/Align$ToAlignOps$$anon$4.class
     1147  2010-01-01 00:00   cats/Align$ToAlignOps.class
     1279  2010-01-01 00:00   cats/Align$nonInheritedOps$.class
     3324  2010-01-01 00:00   cats/Align$ops$$anon$3.class
      959  2010-01-01 00:00   cats/Align$ops$.class

We can do the same trick if we use the location of the Scala Native version of this artifact:

❯ curl -s -Lo cats.zip https://repo1.maven.org/maven2/org/typelevel/cats-core_native0.4_2.13/2.6.1/cats-core_native0.4_2.13-2.6.1.jar

❯ unzip -l cats.zip | grep Align | head
     2641  2010-01-01 00:00   cats/Align$$Lambda$1.nir
     2571  2010-01-01 00:00   cats/Align$$Lambda$2.nir
     1889  2010-01-01 00:00   cats/Align$$Lambda$3.nir
     2567  2010-01-01 00:00   cats/Align$$Lambda$4.nir
     3237  2010-01-01 00:00   cats/Align$$Lambda$5.nir
     6357  2010-01-01 00:00   cats/Align$$anon$1.class
    17824  2010-01-01 00:00   cats/Align$$anon$1.nir
    29657  2010-01-01 00:00   cats/Align$$anon$2.class
    95940  2010-01-01 00:00   cats/Align$$anon$2.nir
     4737  2010-01-01 00:00   cats/Align$.class

And you can see the .nir files that Scala Native will need when linking (producing a single binary/dynamic library/static library) the application that depends on Cats. A similar picture can be seen in the Scala.js version of this artifact, but instead you'll see *.sjsir files.

.jar format being relatively simple, the craft lies in supplying the correct combination of compiler options, compile dependencies, Scala sources, etc., to ensure the produced artifacts can be pulled by the end user and relied on without problems.

The multitude of Scala versions and Scala platforms lead to questions about how those artifacts are named, resolved, and uniquely identified - and whether build tools need to be aware of those.

How are Scala dependencies resolved?

When it comes to dependency resolution, one of the goals of the build tool is to transform some metadata that we specify about a dependency into a physical URL of the JAR that could be located in one of the repositories specified in the build.

The formation of such URL is very much convention based, and that convention comes from the Maven build tool, and its notion of Maven coordinates.

Here's an example of defining a dependency on Cats, and the resulting URL that will be tried by the build tool.

We are using the dependency specification format used by SBT, but Mill has something similar, instead using : instead of % in most places.

In this case, Maven terminology defines these named components:

  • groupId = org.typelevel

  • artifactId = cats-core_2.13

  • version = 2.6.1

You can see that two transformations occurred:

  • . in groupId (also known as organization setting in SBT) are replaced with /

  • _2.13 was appended1 to cats-core

The latter point is very important:

Maven does not understand Scala's binary versions or Scala's platform - at its heart it's a flat storage of uniquely identified .jar files

Therefore to support the various incompatible Scala versions (2.12, 2.13, 3) and platforms (JVM, Scala.js, Scala Native), build tools publish and resolve artifacts using pre-defined suffixes in particular order.

Let's consider a few examples of how artifact name varies depending on Scala version and platform:

  • JVM platform, Scala 2.12: cats-core_2.12

  • JVM platform, Scala 2.13: cats-core_2.13

  • JVM Platform, Scala 3: cats-core_3 (no minor version)

  • Scala.js platform (version 1.x), Scala 2.13: cats-core_sjs1_2.13 (note the _sjs1 suffix)

  • Scala Native platform (version 0.4.x), Scala 2.12: cats-core_native0.4_2.12 (note the _native0.4 suffix)

The exact suffixes depend on the how committed the maintainers of Scala.js, Scala and Scala Native are to binary compatibility guarantees:

  • In case of Scala.js, the 1.x lineage maintains some level of binary compatibility, and therefore the artifacts don't need the full Scala.js version in the name

  • In case of Scala Native, the 0.4.x lineage is deemed stable, and therefore the suffix is _native0.4. I will speculate that once Scala Native reaches 1.x status, it will follow Scala.js' rules and practices.

  • Maintainers of main Scala 2 compiler commit to binary compatibility up to the minor version, and this is why we have _2.12 and _2.13 suffixes (which always come last).

  • Scala 3 changes the way binary compatibility work, and all Scala 3 artifacts are published with a sole _3 suffix. This is potentially a game changer in library maintainers sanity, as it means minor releases will no longer require maintainers to re-publish everything.

Note, however, that out of the box SBT only handles Scala versions in this artifactId transformation. Both Scala.js and Scala Native (at least their SBT plugin versions) depend on sbt-platform-deps which adds a new operator to SBT, %%%, which will produce the correct artifact depending on whether the project being built is a JVM, Scala.js, or a Scala Native one.

Where are the artifacts published?

Maven Central (= Sonatype Releases)

The most popular location, most trusted (implicitly, granted), the defaultest of the defaults in any build tool.

If you want to release your library and make it easily discoverable by your users, it has to be on Maven Central. I personally recommend using sbt-ci-release plugin which also includes detailed and easy to follow instructions on setting up your publishing credentials on Sonatype

Both SBT and Mill (and any other JVM build tool out there) have this repository enabled by default, without any user configuration.

It's managed by Sonatype OSS, and can be publicly searched.

A much better view of the same data is MVNRepository, which understands Scala artifacts very intimately, down to the different platforms and binary versions. In my experience is indispensable when upgrading dependencies and doing general updates management.

Another aggregator of this data (maintained by ScalaCenter) is called Scaladex, and it contains various platform matrices and ability to issue graphical badges to indicate latest versions of the artifact for each major Scala version/platform.

(not to) Bintray 🪦

Bintray was considered to be a lower barrier of entry for authors publishing JVM artifacts. In particular, it was the distribution mechanism of choice for authors of SBT plugins.

From May 1st, 2021 Bintray was shut down.

The process of shutdown was gradual, where at first new uploads were rejected, and by May 1st all Bintray services (download and upload) were shut down.

If you are in the process of helping someone's library to get up to speed with newer Scala versions and platform, it is possible you will discover the bintray publishing logic, which will no longer work.

You will have to work with the maintainer of the library to set up a Sonatype account, credentials on the CI, and the new publishing logic.

Other options

  • Companies set up their instances (sometimes public) of Maven-compatible services, using, for example JFrog's Artifactory

  • Jitpack allows on-demand building of artifacts based solely on their Github coordinates - and it supports SBT

Conclusions

Here are the key takeaways:

  • Scala can be used to write code which runs on the JVM, on any JavaScript runtime, or as native code
  • Dependencies and artifacts in Scala are just archives with .class files with special names, uploaded somewhere
  • Scala 2 has several main versions, and these versions are incompatible with each other
  • Scala 3 aims to make compatibility story easier for maintainers and users