Introduction
A frequent (and not entirely incorrect) take on developing and maintaining Scala libraries is that it is unnecessarily complicated.
Rather than frothing against that notion, I would like to explore how the complexity of maintaining a Scala library can grow dramatically once we extend our reach further out to
- platforms other than JVM
- major, minor, and experimental Scala versions
- code quality tools (in a later installment)
- major releases of the dependencies we build upon
- documentation tools (in a later installment)
Each direction brings its own complexity. As any software engineer knows, solving several relatively simple problems does not prepare you for solving a combination of them.
Who is this for?
More than anything, this is a brain dump of a lot of things I found necessary to understand the ecosystem of modern Scala libraries. A lot of the things will be already known to the reader, some - less so. All of them were useful when I started contributing to existing open source projects, which boosted my confidence and skillset.
The tools and techniques I've learned studing Scala open source ecosystem directly led to improvements of development processes at work.
Users of Scala libraries
Where and how will my library be used?
For Scala, the answer can get pretty complex. In its simplest form, when you want to make your libraries only available on the JVM for a particular binary Scala version, your life will be very, very easy.
TL;DR: just publishing your library for JVM-only, Scala 2.13 will guarantee that it will reach most of Scala's users who have the need for such a library.
That said, in many cases the overhead of publishing for other Scala versions and platforms can be kept quite low, and can ensure an even wider audience.
Scala versions
Currently, the most used Scala versions are 2.12 and 2.13.
Spark users have been locked to Scala 2.11 for a long time, but more recent versions have started using 2.12 exclusively, and some work is ongoing to provide support for 2.13.
Within the 2.x
lineage, minor releases are binary incompatible with each
other. This means that if the library is only built using 2.13.*
, then
projects that are only built with 2.12.*
will not be able to use it. On
the other hand, patch versions are fully binary compatible with each other. That
means that our library can continue being built with 2.12.2
and 2.13.0
and
the users can run 2.12.14
and 2.13.6
respectively and consume our library's
artifacts without any issues.
While the usage of Scala 2.12 is shrinking, there's still a huge amount of actively developed code in the wild that is locked to 2.12 for a variety of reasons (missing dependencies, incompatibilities, reliance on 2.12-only language features, the list goes on).
Because of that, most libraries are cross-published for Scala 2.12 and 2.13 at least. Some library maintainers go as a far as removing 2.12 from their builds, to ease the maintenance burden, but I personally feel it's very optimistic. That said, it may well be the lever that convinces more engineering managers to invest in upgrading to Scala 2.13.
As long as the library we're building is published for any patch version of
both 2.12
and 2.13
, we can cover a very large group of potential users.
Scala 3
Scala 3 (previously called Dotty) is a new Scala compiler, written completely from scratch.
A lot of effort has been put into making sure that vast majority of Scala 2 libraries can be built with Scala 3. A subset of Scala 3's syntax, type system, and features is compatible with Scala 2.13, which helped the transition.
The various communities within Scala ecosystem made significant efforts to provide Scala 3 artifacts of their libraries:
-
At the time of writing, most of the functional ecosystem (be it Cats/Cats Effect, Monix, and many libraries built on top of them) are fully available for Scala 3 and have been for some time. Trailblazers already report Scala 3 services running in production, which is very impressive and terrifying.
-
Play and Akka ecosystems are in the process of making their artifacts available for Scala 3. The process is usually stifled by the usage of Scala 2 macros, which are not supported and will be very hard to port to Scala 3's metaprogramming system.
As of version 2.6.18, Akka publishes artifacts for Scala 3
-
Most of the
com.lihaoyi
ecosystem (upickle
,pprint
,requests-scala
,os-lib
) is also published for Scala 3. Ammonite added initial Scala 3 support in 2.4.0, while Fastparse is currently Scala 2 only as it relies heavily on Scala 2 macros
All of this is a long way of saying that if you want to future-proof your library today, it's better to make sure it's being published for Scala 3. We will come back to this subject in part II.
The good news that once Scala 3 pushes out most Scala 2 versions (years from now), the binary compatibility story will be much better - similar to the binary compatibilities that Scala.js has been offering:
...a library that was compiled against the Scala 3.2 standard library can be safely used with Scala 3.4. There is no need for library maintainers to re-publish when a new Scala 3.x minor release becomes available
As with many things, the compatibility is harder than it looks at first sight, so Scala 3 team is working on improving the forward compatibility after the experience of Scala 3.1 propagating through the ecosystem and forcing library maintainers and downstream users to upgrade.
Scala platforms
Java Virtual Machine (JVM)
The most popular platform and where experience is the most polished. Doesn't mean that it's perfect, but most library maintainers tend to put the most effort towards this platform as this is where vast majority of Scala users are.
Main pain points here are related mostly to the following:
-
Different versions of the Java platform (the good old days of assuming that nobody runs anything above JDK 1.8 are over)
This mostly relates to incompatible versions of the bytecode produced when the library artifact was built, or the usage of APIs that are removed/deprecated/added in the version of the JDK that the users of the library will be using.
The bytecode compatibility story is simpler, as by default the scala compiler will produce bytecode version 8, which can be read by all the versions of Java runtime above it. As long as you're not messing with the
-target
flag of the compiler.Scala 2 compiler maintains compatibility with Java 8 (while also adding support for newer JDKs) and possibly will retain it forever.
In terms of features, one must be careful in two situations:
-
If you have Java sources in the project, make sure they're compiled targeting JRE 8 - by using a
--release 8
flag. From SBT you can pass flags to the java compiler usingjavacOptions
setting (we'll touch on that in a later post) -
If you interact with some features of the JDK that were added in later versions, your users might not be able to run the binaries you produce. A useful site for comparing different JDK releases is Java Version Almanac
-
-
JVM's execution model with its super-late linking.
What it means in practice is that prior to running your application, the code you wrote and the code from external libraries is represented as a loose, flat collection of
.class
files, which reference methods, classes, and values from each other by name.This means that if you (or the build tool, or the external library author) get this list of
.class
files wrong (incompatibilities in defined methods/classes/parameters lists, etc.), you will get a nasty, non-recoverable exception only at runtime.
JavaScript (Node or browser): Scala.js
A mature, well established and constantly growing platform - compiling Scala code to JavaScript, for both Node.js and browser use.
Potential difficulties that publishing for Scala.js brings are:
-
All the joys of JavaScript ecosystem, with things like different JS standards, module system, disparity in APIs between browsers and Node.js
-
Scala.js compiler's own versioning system.
As the compiler is evolving, it might need to make some breaking changes in the APIs, meaning the libraries built for one version of Scala.js might not be usable on another.
For example, here's the note about what a minor release means in Scala.js versioning system (i.e. a bump from 1.6.0 to 1.7.0):
It is backward binary compatible with all earlier versions in the 1.x series: libraries compiled with 1.0.x through 1.6.x can be used with 1.7.0 without change.
It is not forward binary compatible with 1.6.x: libraries compiled with 1.7.0 cannot be used with 1.6.x or earlier.
It is not entirely backward source compatible: it is not guaranteed that a codebase will compile as is when upgrading from 1.6.x (in particular in the presence of -Xfatal-warnings).
In practice this is usually quite simple - most projects often bump Scala.js to latest release without a second thought.
Native code via LLVM: Scala Native
An old experimental project that has recently been taken under the Scala Center's wing and has received numerous improvements, with added support for Scala 2.13 being the most welcome.
Experimental support for Scala 3 has been released in version 0.4.3-SNAPSHOT
.
As Scala Native is more active than ever before, a lot of maintainers add Native support to their libraries.
(meta) <your-dependency-major-version> platform
Another way a library's build can become more complex is if we want to target two different incompatible versions of some major library. In that case, we need to produce two distinct artifacts (different in version and/or name itself) for users of different versions of the dependency.
Examples of that can be different versions of
-
AWS SDK (v1 and v2 are completely incompatible), or
-
Cats Effect (versions
2.x
and3.x
are both source and binary incompatible, and both are used in the wild extensively), or -
Http4s which has incompatible lineages for Cats Effect 2 and Cats Effect 3
-
Many other libraries that are following different bincompat guarantees (like Play ecosystem, which allows breaking changes in minor versions, e.g. 2.7.x to 2.8.x)
In some cases it might be impossible or unjustifiably difficult to create and maintain a codebase that caters and publishes for different versions of the same library.
As a personal note, if your dependencies maintain two binary lineages, then you can either do the same, or choose one and force the users to upgrade. With reality of open source maintenance often being a burden, choose what is right for your mental health and the amount of time you have to dedicate to OSS.
No one is getting any younger or healthier.
What are Scala artifacts?
The general process is always the same - someone wrote the code, that code was compiled, and resulting artifacts are packaged in some way and uploaded somewhere where user's build tool can find and download those dependencies:
The build tool, such as SBT or Mill, will be responsible for
-
Discovery of source files depending on your module structure
-
Interacting with the necessary compiler (Scala or Java) to produce
.class
files -
(optionally) Injecting Scala.js or Scala Native compiler plugins into the compilation pipeline to produce necessary intermediary files
- Intermediary representation is what is necessary to lower a complicated language such as Scala into a simpler representation, such as LLVM for Scala Native or Javascript-compatible representation for Scala.js
-
Packaging compiled
.class
files in.jar
artifacts with necessary metadata
Platform-specific artifacts
The .jar
format is used in an overwhelming majority of the scenarios, and it
houses both regular .class
files understood by the JVM, and the .sjsir
and
.nir
intermediate files for Scala.js/Scala Native.
For example, here's the location of Cats' jar file on the Maven Central repository:
❯ curl -s -Lo cats.zip https://repo1.maven.org/maven2/org/typelevel/cats-core_2.13/2.6.1/cats-core_2.13-2.6.1.jar
❯ unzip -l cats.zip | grep class | head
6357 2010-01-01 00:00 cats/Align$$anon$1.class
29657 2010-01-01 00:00 cats/Align$$anon$2.class
4737 2010-01-01 00:00 cats/Align$.class
345 2010-01-01 00:00 cats/Align$AllOps.class
4111 2010-01-01 00:00 cats/Align$Ops.class
3373 2010-01-01 00:00 cats/Align$ToAlignOps$$anon$4.class
1147 2010-01-01 00:00 cats/Align$ToAlignOps.class
1279 2010-01-01 00:00 cats/Align$nonInheritedOps$.class
3324 2010-01-01 00:00 cats/Align$ops$$anon$3.class
959 2010-01-01 00:00 cats/Align$ops$.class
We can do the same trick if we use the location of the Scala Native version of this artifact:
❯ curl -s -Lo cats.zip https://repo1.maven.org/maven2/org/typelevel/cats-core_native0.4_2.13/2.6.1/cats-core_native0.4_2.13-2.6.1.jar
❯ unzip -l cats.zip | grep Align | head
2641 2010-01-01 00:00 cats/Align$$Lambda$1.nir
2571 2010-01-01 00:00 cats/Align$$Lambda$2.nir
1889 2010-01-01 00:00 cats/Align$$Lambda$3.nir
2567 2010-01-01 00:00 cats/Align$$Lambda$4.nir
3237 2010-01-01 00:00 cats/Align$$Lambda$5.nir
6357 2010-01-01 00:00 cats/Align$$anon$1.class
17824 2010-01-01 00:00 cats/Align$$anon$1.nir
29657 2010-01-01 00:00 cats/Align$$anon$2.class
95940 2010-01-01 00:00 cats/Align$$anon$2.nir
4737 2010-01-01 00:00 cats/Align$.class
And you can see the .nir
files that Scala Native will need when linking
(producing a single binary/dynamic library/static library) the application that
depends on Cats. A similar picture can be seen in the Scala.js version of this
artifact, but instead you'll see *.sjsir
files.
.jar
format being relatively simple, the craft lies in supplying the correct combination of compiler options,
compile dependencies, Scala sources, etc., to ensure the produced artifacts can
be pulled by the end user and relied on without problems.
The multitude of Scala versions and Scala platforms lead to questions about how those artifacts are named, resolved, and uniquely identified - and whether build tools need to be aware of those.
How are Scala dependencies resolved?
When it comes to dependency resolution, one of the goals of the build tool is to transform some metadata that we specify about a dependency into a physical URL of the JAR that could be located in one of the repositories specified in the build.
The formation of such URL is very much convention based, and that convention comes from the Maven build tool, and its notion of Maven coordinates.
Here's an example of defining a dependency on Cats, and the resulting URL that will be tried by the build tool.
We are using the dependency specification format
used by SBT,
but Mill has something similar, instead using :
instead of %
in most places.
In this case, Maven terminology defines these named components:
-
groupId =
org.typelevel
-
artifactId =
cats-core_2.13
-
version =
2.6.1
You can see that two transformations occurred:
-
.
ingroupId
(also known asorganization
setting in SBT) are replaced with/
-
_2.13
was appended1 tocats-core
The latter point is very important:
Maven does not understand Scala's binary
versions or Scala's platform - at its heart it's a flat storage of uniquely
identified .jar
files
Therefore to support the various incompatible Scala versions (2.12, 2.13, 3) and platforms (JVM, Scala.js, Scala Native), build tools publish and resolve artifacts using pre-defined suffixes in particular order.
Let's consider a few examples of how artifact name varies depending on Scala version and platform:
-
JVM platform, Scala 2.12:
cats-core_2.12
-
JVM platform, Scala 2.13:
cats-core_2.13
-
JVM Platform, Scala 3:
cats-core_3
(no minor version) -
Scala.js platform (version
1.x
), Scala 2.13:cats-core_sjs1_2.13
(note the_sjs1
suffix) -
Scala Native platform (version
0.4.x
), Scala 2.12:cats-core_native0.4_2.12
(note the_native0.4
suffix)
The exact suffixes depend on the how committed the maintainers of Scala.js, Scala and Scala Native are to binary compatibility guarantees:
-
In case of Scala.js, the
1.x
lineage maintains some level of binary compatibility, and therefore the artifacts don't need the full Scala.js version in the name -
In case of Scala Native, the
0.4.x
lineage is deemed stable, and therefore the suffix is_native0.4
. I will speculate that once Scala Native reaches 1.x status, it will follow Scala.js' rules and practices. -
Maintainers of main Scala 2 compiler commit to binary compatibility up to the minor version, and this is why we have
_2.12
and_2.13
suffixes (which always come last). -
Scala 3 changes the way binary compatibility work, and all Scala 3 artifacts are published with a sole
_3
suffix. This is potentially a game changer in library maintainers sanity, as it means minor releases will no longer require maintainers to re-publish everything.
Note, however, that out of the box SBT only handles Scala versions in this
artifactId transformation. Both Scala.js and Scala Native (at least their SBT
plugin versions) depend on
sbt-platform-deps which
adds a new operator to SBT, %%%
, which will produce the correct artifact
depending on whether the project being built is a JVM, Scala.js, or a Scala
Native one.
Where are the artifacts published?
Maven Central (= Sonatype Releases)
The most popular location, most trusted (implicitly, granted), the defaultest of the defaults in any build tool.
If you want to release your library and make it easily discoverable by your users, it has to be on Maven Central. I personally recommend using sbt-ci-release plugin which also includes detailed and easy to follow instructions on setting up your publishing credentials on Sonatype
Both SBT and Mill (and any other JVM build tool out there) have this repository enabled by default, without any user configuration.
It's managed by Sonatype OSS, and can be publicly searched.
A much better view of the same data is MVNRepository, which understands Scala artifacts very intimately, down to the different platforms and binary versions. In my experience is indispensable when upgrading dependencies and doing general updates management.
Another aggregator of this data (maintained by ScalaCenter) is called Scaladex, and it contains various platform matrices and ability to issue graphical badges to indicate latest versions of the artifact for each major Scala version/platform.
(not to) Bintray 🪦
Bintray was considered to be a lower barrier of entry for authors publishing JVM artifacts. In particular, it was the distribution mechanism of choice for authors of SBT plugins.
From May 1st, 2021 Bintray was shut down.
The process of shutdown was gradual, where at first new uploads were rejected, and by May 1st all Bintray services (download and upload) were shut down.
If you are in the process of helping someone's library to get up to speed with newer Scala versions and platform, it is possible you will discover the bintray publishing logic, which will no longer work.
You will have to work with the maintainer of the library to set up a Sonatype account, credentials on the CI, and the new publishing logic.
Other options
-
Companies set up their instances (sometimes public) of Maven-compatible services, using, for example JFrog's Artifactory
-
Jitpack allows on-demand building of artifacts based solely on their Github coordinates - and it supports SBT
Conclusions
Here are the key takeaways:
- Scala can be used to write code which runs on the JVM, on any JavaScript runtime, or as native code
- Dependencies and artifacts in Scala are just archives with
.class
files with special names, uploaded somewhere - Scala 2 has several main versions, and these versions are incompatible with each other
- Scala 3 aims to make compatibility story easier for maintainers and users