Smithy4s full stack (p.3): Backend testing

smithysmithy4sscalaweaverseries:smithy4s

Series TL;DR

Originally I planned to just completely ignore tests completely. Not just for the all important reason of laziness, but also because testing software is somehow a contentious topic.

Apart from the "write test first" vs "write implementation first" conflict, the one that online Scala community has warmly adopted is whether to use mocking or not. Mocking in this case refers to runtime deep-patching of method implementations and stubbing out calls.

Not wanting to write a whole piece on just that subject alone (it's been done before, and has been debated to unsavoury death), I will approach testing in a way I see fit given the very comfortable circumstances this app was born in:

  1. I'm the only developer, yay! No arguing about naming, structuring, frameworks, testing depth, etc.

  2. I contribute to and maintain Weaver, which turns my admiration for it into a visibly biased obsession

  3. I'm lazy! So it should be possible to set what it means for the codebase to be appropriately tested without having to satisfy a grumpy engineer waving a 30 year-old book in my face.

Another reason for doing testing is the relative ease with which we managed our backend - not really utilising the more complex features of Smithy, and aiming for well-defined happy path, there isn't enough meat on the bones of this project to warrant a multi-part series.

So here's our ambitious plan for testing:

  • Three levels of testing:

    • "Unit" testing - testing of pure functions and classes only containing pure functions

    • "Stub" testing - Testing side-effectful functions (and whole services) by turning them into pure with plugging the impure boundaries with in-memory fakes

      I was going to call that "Fake" testing but didn't want to infringe on the runtime mocking terminology.

    • "Integration" testing - where we test entire services against real parts of our system where side-effects are performed

  • "Stub" and "Integration" testing must share as much code as possible, specifically the entire specs must be exactly the same

  • Single framework for everything

For Stub testing there's not that many holes we need to plug up (making entire implementation pure) - we mostly just log things and access the database. The routes implementation can just be processing requests without starting any actual HTTP servers - thankfully Http4s' model is ideal for that.

For Integration, we would like to stand up and tear down a real Postgres database using Testcontainers, and the exercised spec should actually invoke HTTP endpoints for each service, with requests going through the actual network stack.

The only testing framework we will use will be Weaver, and we'll opt in for the Scalacheck integration as well:

lazy val server = 
  projectMatrix
    // ...
    .settings(
      libraryDependencies ++= Seq(
        "com.disneystreaming" %% "weaver-cats"         % Versions.Weaver,
        "com.disneystreaming" %% "weaver-scalacheck"   % Versions.Weaver
      ),
      testFrameworks += new TestFramework("weaver.framework.CatsEffect")
    )

Unit testing

There are some tests you want to get absolutely right - things like correct JWT config propagation, password-hashing roundtrip, config processing, input validation, etc. but they are easy to set up - there's rarely any interaction with anything but stateless library code. Think about verifying JWT tokens - the library itself is stateless and performs no side effects, your code takes at most 1 parameter (some static JWT config) and you directly assert on the result.

We won't demonstrate many of them here, but let's consider an example: writing property tests for validation logic.

To enable ScalaCheck-specific functionality on a Weaver spec, all we need is to mixin the Checkers trait:

package jobby
package tests
package unit

import weaver.*
import jobby.spec.*

import weaver.scalacheck.*
import org.scalacheck.Gen

object ValidationPropertyTests extends SimpleIOSuite with Checkers:
  override def checkConfig: CheckConfig =
    CheckConfig.default.copy(
      minimumSuccessful = 500,
      initialSeed = Some(13378008L)
    )

What we're additionally doing here is explicitly modifying the property checking config to require slightly more examples to succeed for each tests, and explicitly set the generator's seed for reproducibility - in case a subtle bug is introduced, it's better to be able to reliably break the tests.

Our approach to the property testing validation rules can be summarised as following:

Validation can either succeed, or some non-empty subset of distinct rules is violated

In the current state of our validation rules there's a duplication in terms of where the rules are mentioned. Take, for example, validateJobDescription:

def validateJobDescription(login: JobDescription) =
  val minLength = 100
  val maxLength = 5000

  val str = login.value.trim

  if str.length == 0 then err("Description cannot be empty")
  else if str.length < minLength || str.length > maxLength then
    err(
      s"Description cannot be shorter than $minLength or longer than $maxLength characters"
    )
  else ok
end validateJobDescription

There are two problems (that I see) with it:

  1. The min/max values are constants within the function - ideally they should be part of some configuration object, taken as (using ValidationConfig) for ergonomics

  2. The exact rules are expressed as boolean conditions, locked inside of the function and must necessarily be duplicated in our tests

As a side-project to this side-project, I would love to over-engineering this whole thing.

But let's see how a property test could look for this:

  test("jobs: description") {
    forall(org.scalacheck.Gen.asciiPrintableStr) { str =>
      val trimmed = str.trim
      val isValid =
        jobby.validation.validateJobDescription(JobDescription(str)).isRight

      expect(isValid) or
        expect(
          trimmed.trim.isEmpty
            || trimmed.length < 100
            || trimmed.length > 5000
        )
    }
  }

There's those duplicated constants again!

Running 1000 of these tests takes ~400ms on my laptop, but thankfully weaver runs them in parallel and the entire spec executes in less than a second:

[info] jobby.tests.unit.ValidationPropertyTests
[info] + users: username 382ms
[info] + users: password 411ms
[info] + companies: name 381ms
[info] + companies: description 362ms
[info] + jobs: title 407ms
[info] + jobs: description 396ms
[info] + jobs: salary range 420ms
[info] Passed: Total 7, Failed 0, Errors 0, Passed 7
[success] Total time: 1 s, completed 23 Jun 2022, 20:58:44

Apart from extracting the properties and constants, can we improve the usefulness of these tests?

Even if our validation functions and tests work perfectly, so far we have not confirmed that important parts of our system actually do invoke these validation functions.

If we take a look at the register function in the UserServiceImpl:

override def register(login: UserLogin, password: UserPassword) =
  val validation = (validateUserLogin(login), validateUserPassword(password))
    .traverse(IO.fromEither)

  validation *>
    Crypto
      .hashPassword(password)
      .flatMap { hash =>
        db.option(op.CreateUser(login, hash))
          .onError(ex => logger.error("Registration failed", ex))
          .adaptErr { case _ =>
            ValidationError("Failed to register")
          }
      }
      .void
end register

With a database implementation that always succeeds for CreateUser operation, this function can only fail if either the user login validation or user password validation fail.

So we could express our test differently - generate random logins and passwords, assert that register can only fail if either login or password don't match validation rules.

I will leave it as an exercise to the reader because scaling it seems hard and I don't wanna.

Stub and Integration testing

These tests are much more high level, and can be better expressed as testing user journeys and usecases. For example:

  1. Users can register and login, receiving valid auth tokens
  2. Users can't use incorrect credentials to login
  3. Users can create companies
  4. Users can only delete companies they created

etc.

To write such test cases succinctly we would like to provide high level tools available in test cases.

Probe (no, not that one 👆)

Those high level tools include:

  1. API client (Api)
  2. Data generator (Generator)
  3. Config used (to, say, compare returned values with one supposedly injected into the app) (AppConfig)
  4. Collection of common API "snippets" that we will call Fragments (i.e. createUser, createCompany, etc.)
  5. Something to inspect the logs accumulated by the app, if possible

We will group all of them under the same class called Probe:

case class Probe(
    api: Api,
    auth: HttpAuth,
    gen: Generator,
    config: AppConfig,
    getLogs: IO[Vector[scribe.LogRecord]]
):
  def fragments = Fragments(this)

API client

Let's start with the API client. One big promise of Smithy is that using exactly the same Scala interface generated for your services, you can construct a HTTP client and point it to an arbitrary URL.

Our API then is just an aggregation of the services we need:

case class Api(
    companies: CompaniesService[IO],
    jobs: JobService[IO],
    users: UserService[IO]
)

And all we need to build it is:

  1. Actual HTTP client implementation from Http4s (Client[IO])
  2. Base Uri

Using SimpleJsonRestBuilder from Smithy4s, we can construct Api like this:

object Api:
  def build(client: Client[IO], uri: Uri): IO[Api] =
    val companies = IO.fromEither(
      SimpleRestJsonBuilder(CompaniesService)
        .client(client, uri)
    )
    val jobs = IO.fromEither(
      SimpleRestJsonBuilder(JobService)
        .client(client, uri)
    )

    val users = IO.fromEither(
      SimpleRestJsonBuilder(UserService)
        .client(client, uri)
    )

    (companies, jobs, users).mapN(Api.apply)
  end build
end Api

In-memory logger

One feature of Weaver that I really like is the way it prints out the logs only if the tests have failed. One caveat - this applies only to logs written through Weaver's logger.

Regardless, I'm quite partial to the quiet, pristine view of test results - if tests at your job are not polluted by walls of SLF4J printouts, I envy you!

For these tests, I would like for all the logs sent to Scribe loggers to eventually be reported through Weaver's logger.

To do that, let's first create a log collector. Good news: if you have an instance of Scribe logger, you can give it a LogHandler, and do with the message as you please. Bad news: the interface of LogHandler is as such:

trait LogHandler {
  def log(record: LogRecord): Unit
}

No bother! We can use the excellent Dispatcher to execute any IO actions our logger requires. Those IO actions will just be writing to a Ref[IO, Vector[LogRecord]].

Note that Dispatcher doesn't strictly guarantee any ordering, until that is Cats Effect 3.4.0 lands with its configurable dispatchers. It's not an issue for us though, as each LogRecord comes with a timestamp we can order the logs by - this will give us "good enough for tests" results.

Our in-memory logger is a pair of two inter-connected things:

  1. A Scribe logger that writes to some Ref
  2. And action to read the current state of that Ref:
class InMemoryLogger private (
    val logs: IO[Vector[LogRecord]],
    val scribeLogger: Scribe[IO]
)

And here's how we build it:

object InMemoryLogger:
  def build: Resource[IO, InMemoryLogger] =
    // create a dispatcher
    Dispatcher[IO].evalMap { disp =>
      // create a ref 
      Ref.ofEffect(IO(Vector.empty[LogRecord])).map { ref =>

        // create a Scribe LogHandler, that uses the dispatcher 
        // to execute an `IO` action writing the log message into the 
        // ref
        val handler = scribe.handler.LogHandler(Level.Info) { msg =>
          disp.unsafeRunSync(ref.update(_.appended(msg)))
        }
        
        // an orphan logger with no handlers but the one we 
        // created above
        val logger =
          scribe.Logger.empty
            .orphan()
            .clearHandlers()
            .withHandler(handler)
            .f[IO]

        new InMemoryLogger(
          ref.get,
          logger
        )
      }
    }
end InMemoryLogger

Data generator

Our data space is very simple - we mostly operate on UUIDs and strings, with occasional restriction on length.

We'll also provide some helpers to work with newtypes that Smithy4s provides. First, our Generator class starts like this:

import cats.effect.*
import cats.effect.std.*
import cats.syntax.all.*

case class Generator private (random: Random[IO], uuid: UUIDGen[IO]):
  //...

A method that generates uuid-backed newtypes is simple:

  def id(nt: Newtype[UUID]): IO[nt.Type] =
    uuid.randomUUID.map(nt.apply)

Same with a method for int-backed newtypes (like MinSalary/MaxSalary):

  def int(nt: Newtype[Int], min: Int, max: Int): IO[nt.Type] =
    random.betweenint(min, max).map(nt.apply)

And now onto strings, where my main requirement was being able to easily identify the random strings generated for particular newtypes - so let's prefix them with newtype's name, while preserving the length requirements.

  def str(
      nt: Newtype[String],
      lengthRange: Range = 0 to 100
  ): IO[nt.Type] =
    for
      length <- random.betweenInt(lengthRange.start, lengthRange.end)
      chars  <- random.nextAlphaNumeric.replicateA(length).map(_.mkString)
      str = nt.getClass.getSimpleName.toString + "-" + chars
    yield nt(str.take(lengthRange.end))

Why go through all this ungodly trouble if we already have ScalaCheck in dependencies? I don't know. I really don't remember why.

Instantiating our Generator is simple:

object Generator:
  def create: IO[Generator] =
    (Random.scalaUtilRandom[IO], IO(UUIDGen[IO])).mapN(Generator.apply)

We've now defined everything we need to instantiate the Probe:

object Probe:
  def build(
      client: Client[IO],
      uri: Uri,
      config: AppConfig,
      logger: InMemoryLogger
  ) =
    Resource.eval {
      for
        gen <- Generator.create
        api <- Api.build(client, uri)
        auth = HttpAuth(
          config.jwt,
          logger.scribeLogger
        )
      yield Probe(api, auth, gen, config, logger.logs)
    }
end Probe

Weaver integration

Probe will be the resource that we share across individual tests in Stub tests, and across whole specs in Integration tests.

Let's provide a base trait for our specs, that will propagate Scribe logs into weaver logs.

The trait starts like this:

trait JobbySuite extends IOSuite:
  override type Res = Probe
  // ...

where we indicate that the shared resource is our Probe class.

We can then provide a probeTest method, that will delegate to one of the methods implemented by weaver - specifically the version that takes both the shared resources and the log as parameters:

  def probeTest(name: weaver.TestName)(f: Probe => IO[weaver.Expectations]) =
    test(name) { (probe, log) =>
    // ...

where f is the body of the test.

Let's write a sub-program that transfers the logs:

val dumpLogs = probe.getLogs.flatMap {
  _.sortBy(_.timeStamp).traverse_ { msg =>

    val msgText = msg.logOutput.plainText

    msg.level match
      case Level.Info  => log.info(msgText)
      case Level.Error => log.error(msgText)
      case Level.Warn  => log.warn(msgText)
      case _           => log.debug(msgText)
  }
}

We get the logs, sort them by timestamp, and write them into the Weaver logger.

Now all we need to do is run the test body, pass the logs, and re-raise any error or result back to Weaver's default test implementation:

f(probe).attempt
  .flatTap(_ => dumpLogs)
  .flatMap(IO.fromEither)

And that's it! If any of our stub tests fail, the logs for that test will be printed out. The output can certainly be tweaked, but this will do - you only see a wall of text in case of a failure.

Slow TimeCop

In part 2 we created an interface called TimeCop for performing the side-effect of getting the current date and time.

This was foreshadowing - the ability to override that interface will be important to us to avoid dealing with real time (our tests execute very fast) and the miniscule difference between the real timestamps. Instead, our TimeCop will be generating sequential timestamps, 1 day apart:

package jobby

import cats.effect.*
import java.time.OffsetDateTime
import java.time.ZoneOffset

object SlowTimeCop:
  def apply: IO[TimeCop] =
    IO.realTimeInstant.flatMap { inst =>
      val start = inst.atOffset(ZoneOffset.ofHours(0))

      Ref.of[IO, Int](0).map { daysRef =>
        new TimeCop:
          def nowODT = daysRef.getAndUpdate(_ + 1).map { days =>
            start.plusDays(days)
          }
      }
    }
end SlowTimeCop

In-memory database

One of the major places where side-effects happen in our app is the database. To provide fast feedback loop, we would like to provide an in-memory implementation that is just good enough for our tests.

We won't use something that can interpret SQL (like H2), for two reasons:

  1. Our SQL code is Postgres-centric and will remain so
  2. We use Skunk, which is not using a JDBC layer, making it harder to fit a JDBC-based connector into our current model

For those reasons, our database will be just backed by Scala data structures in memory.

The state model is quite simple:

object InMemoryDB:
  import jobby.spec.*
  case class State(
      jobs: Vector[Job] = Vector.empty,
      companies: Vector[Company] = Vector.empty,
      users: Vector[(UserId, UserLogin, HashedPassword)] = Vector.empty
  )
  // ...

And the database itself will need the state, data generator (for identifiers), and an instance of TimeCop to generate timestamps:

case class InMemoryDB(
    state: Ref[IO, InMemoryDB.State],
    gen: Generator,
    timecop: TimeCop
) extends Database:
  // ...

As a reminder, the only abstract method we need to implement is this:

trait Database:
  def stream[I, O](query: SqlQuery[I, O]): fs2.Stream[IO, O]

And all we need is to pattern match on query and implement the handling of various operations. You'll know when you're done when the compiler stops complaining - the SqlQuery class is sealed after all!

Let's define a small helper method that will help us express the situation where something is not found in the state:

private def opt[T](s: InMemoryDB.State => Option[T]): fs2.Stream[IO, T] =
  fs2.Stream
    .eval(state.get)
    .map(s)
    .flatMap(fs2.Stream.fromOption(_))

With that, our first operation (get company by ID) is implemented trivially:

  def stream[I, O](query: SqlQuery[I, O]) =
    query match
      case GetCompanyById(companyId) =>
        opt(_.companies.find(c => c.id == companyId))

Finding user credentials is simple as well:

case GetCredentials(login) =>
  opt(st => st.users.find(_._2 == login))
    .map { case (id, _, password) =>
      id -> password
    }

and so is creating the user:

case CreateUser(login, hashedPassword) =>
  val insert =
    gen.id(UserId).flatMap { userId =>
      val user = (userId, login, hashedPassword)
      state
        .update(st => st.copy(users = st.users.appended(user)))
        .as(userId)
    }

  fs2.Stream.eval(insert)

Adding a job is more complicated, we need to generate both the id and the timestamp:

case CreateJob(companyId, attributes, _) =>
  val insert = gen.id(JobId).flatMap { jobId =>
    timecop.timestampNT(JobAdded).flatMap { ja =>
      val job = Job(
        id = jobId,
        companyId = companyId,
        attributes = attributes,
        added = ja
      )

      state.update(st => st.copy(jobs = st.jobs.appended(job))).as(jobId)
    }
  }

  fs2.Stream.eval(insert)

You should be able to spot a deficiency here - we're not checking that the company with that id exists! In an integration test, the database will be enforcing this constraint (well, at least your code will hope that the constraint is enforced).

Through gradual improvements to in-memory DB you should achieve parity with your DB constraints, and keep them in sync because the same specs should be executing successfully against in-memory stubs and real DB.

The question is - is it worth it? I believe it is - implementing those constraints is significantly simpler than in the real database, it's low risk as it only affects tests, and you're getting a pretty functional in-memory DB out of it - something that can be published as part of service's testkit, useful for other components of the system.

Stub: fixture and resources

All we need to do now is to

  1. Create a method that will tie together all the stubs, fakes, and what have you, into a single Resource[IO, Probe].

  2. Fill in various configs we have lying around with bogus values - most of these values won't be asserted on anyways

Here's what the method will look like for our stub tests:

package jobby
package tests
package stub

// imports...

object Fixture:

  def resource(using natchez.Trace[IO]): Resource[IO, Probe] =
    for
      db      <- Resource.eval(InMemoryDB.create)
      timeCop <- Resource.eval(SlowTimeCop.apply)
      logger  <- InMemoryLogger.build
      // Create the app using our stubbed DB, logger and timecop
      routes <- JobbyApp(
        appConfig,
        db,
        logger.scribeLogger,
        timeCop
      ).routes

      // (1) sick!
      client = Client.fromHttpApp(routes)

      generator <- Resource.eval(Generator.create)

      // finally construct and return the probe
      probe <-
        Probe.build(
          client,
          Uri.unsafeFromString("http://localhost"),
          appConfig,
          logger
        )
    yield probe
    end for
  end resource
  1. Yes, it should say sick. We use the built-in method from Http4s that turns a HTTP app definition (HttpApp) into a Client[IO] which just invokes the desired endpoints directly from the HttpApp, without running any servers.

This Fixture.resource method is fully self-contained - we can run as many probes in parallel as we want.

And for our stub tests this might well be the ticket, because the setup doesn't require any global resources, like Postgres or running HTTP server. This means that we don't need to resort to Weaver's global resources - we can use per-suite resources, which are much easier to set up.

We can express this as a StubSuite trait:

package jobby
package tests
package stub

import weaver.*
import cats.effect.*

import natchez.Trace.Implicits.noop

trait StubSuite extends JobbySuite:
  override def sharedResource: Resource[IO, Res] = Fixture.resource

We now have everything to write our first actual spec!

Specifications

All of our specs will be expressed as traits that are mixed into some class or object along with JobbySuite (which contains our probeTest) method.

For example, for users we'd like to test that you need to use correct credentials to successfully login:

package jobby
package tests

import jobby.spec.*
import cats.effect.IO

trait UsersSuite:
  self: JobbySuite =>

  probeTest("Using wrong credentials") { probe =>
    import probe.*
    for
      // generate data
      login     <- gen.str(UserLogin, 5 to 50)
      login1    <- gen.str(UserLogin, 5 to 50)
      password  <- gen.str(UserPassword, 12 to 128)
      password1 <- gen.str(UserPassword, 12 to 128)
      // invoke API methods
      _         <- api.users.register(login, password)

      ok              <- api.users.login(login, password).attempt
      wrongLogin      <- api.users.login(login1, password).attempt
      wrongPass       <- api.users.login(login, password1).attempt
      everythingWrong <- api.users.login(login1, password1).attempt
    yield expect.all(
      ok.isRight,
      wrongLogin.isLeft,
      wrongPass.isLeft,
      everythingWrong.isLeft
    )
    end for
  }
  //...

And here's how you can test that returned access/refresh tokens can be used:

  probeTest("Registration and authentication") { probe =>
    import probe.*
    for
      login    <- gen.str(UserLogin, 5 to 50)
      password <- gen.str(UserPassword, 12 to 128)
      _        <- api.users.register(login, password)
      resp     <- api.users.login(login, password)

      // extract access and refresh token
      // from response
      refreshCookie <- IO
        .fromOption(resp.cookie)(
          new Exception("Expected a refresh cookie ")
        )
        .map(_.value)
      accessToken  = resp.access_token.value
      authHeader   = AuthHeader("Bearer " + accessToken)
      refreshToken = refreshCookie.split(";")(0).split("=", 2)(1)

      validAccess  <- auth.access(authHeader)
      validRefresh <- auth.refresh(RefreshToken(refreshToken))
    yield expect(validAccess == validRefresh)
    end for
  }

Note that we're directly invoking the methods we defined in the services - there's no JSON or HTTP serialisation happening at any point, we're just getting Scala values back and assert on them.

These tests by design don't test and will not catch protocol errors - be it JSON protocol or HTTP protocol. We're testing the business logic, operating purely with Scala values.

We can do the same for companies, for example verify that authenticated users can create companies, which will be associated with the user:

package jobby
package tests

import jobby.spec.*
import cats.effect.IO

trait CompaniesSuite:
  self: JobbySuite =>

  test("Creation by authenticated user") { probe =>
    import probe.*
    for
      authHeader <- fragments.authenticateUser
      userId     <- auth.access(authHeader)
      attributes <- fragments.companyAttributes

      companyId <- api.companies
        .createCompany(
          authHeader,
          attributes
        )
        .map(_.id)

      retrieved <- api.companies.getCompany(companyId)
    yield expect.all(
      attributes.name == retrieved.attributes.name,
      attributes.url == retrieved.attributes.url,
      attributes.description == retrieved.attributes.description,
      userId == retrieved.owner_id
    )
    end for
  }

Here we are referencing fragments, which weren't properly introduced yet. Fragments are just reusable parts of our specifications. For example, here's the fragment for user authentication:

package jobby
package tests

import jobby.spec.*

class Fragments(probe: Probe):
  import probe.*
  def authenticateUser =
    for
      login    <- gen.str(UserLogin, 5 to 50)
      password <- gen.str(UserPassword, 12 to 128)
      _        <- api.users.register(login, password)
      resp     <- api.users.login(login, password)

      refreshToken = resp.cookie
      accessToken  = resp.access_token.value
      authHeader   = AuthHeader(s"Bearer $accessToken")
    yield authHeader

It uses the same structure and same Probe as the tests themselves. It should be especially useful to extract things like attributes generator:

  def companyAttributes =
    for
      companyName        <- gen.str(CompanyName, 3 to 100)
      companyUrl         <- gen.str(CompanyUrl)
      companyDescription <- gen.str(CompanyDescription, 100 to 500)
      attributes = CompanyAttributes(
        companyName,
        companyDescription,
        companyUrl
      )
    yield attributes

Runnable Stub tests

To make the tests discoverable, we need to make them either

  1. Objects that extend weaver's IOSuite, or

  2. classes with a single GlobalRead parameter

For stub tests, there's no global resource sharing, so we can just make them objects:

package jobby
package tests
package stub

import weaver.*

object UsersTests
    extends StubSuite
    with jobby.tests.UsersSuite

object CompaniesTests
    extends StubSuite
    with jobby.tests.CompaniesSuite

object JobsTests
    extends StubSuite
    with jobby.tests.JobsSuite

And we can now run our tests in SBT using this command

sbt:root> backend/testOnly jobby.tests.stub.*
[info] jobby.tests.stub.UsersTests
[info] + Using wrong credentials 251ms
[info] + Registration and authentication 250ms
[info] jobby.tests.stub.JobsTests
[info] + Creating jobs by authenticated company owner 87ms
[info] + Listing latest jobs 216mss
[info] jobby.tests.stub.CompaniesTests
[info] + Creation by authenticated user 20ms
[info] + Deletion by the owner 40ms
[info] Passed: Total 6, Failed 0, Errors 0, Passed 6
[success] Total time: 2 s, completed 25 Jun 2022, 13:53:08

Let's alias it in the build.sbt:

addCommandAlias("stubTests", "backend/testOnly jobby.tests.stub.*")
addCommandAlias("unitTests", "backend/testOnly jobby.tests.unit.*")
addCommandAlias(
  "fastTests",
  "backend/testOnly jobby.tests.stub.* jobby.tests.unit.*"
)

Note that in fastTests I didn't rely on the already defined commands because I want SBT and weaver to run all the tests interleaved and in parallel - not, say, unit tests first and then stub tests.

And I'm pleased to say that for our integration tests we won't need to touch the specs or fragments at all!

Integration: fixture and resources

What exactly do we mean by integration tests? We want to test different components that talk to the outside world (i.e. network, filesystem, any kinds of I/O) working together.

This means significant changes to how our Probe is constructed:

  1. We no longer wish to use in-memory database - this should be real Postgres database, with latest schema

  2. Requests processed in memory need to be replaced with serialising and sending the request over the socket

To solve the database problem we'll use TestContainers - JVM interface to launching and managing containers for popular services, like Redis, Postgres, MySQL, etc.

There even exists a Scala wrapper for it, with a bit of extra type safety and idiomatic APIs.

Running actual Postgres will require DB schema migrations as well, so we need same dependencies that we use for our app, but in tests:

libraryDependencies ++=
  Seq(
    "com.dimafeng" %% "testcontainers-scala-postgresql" % Versions.TestContainers,
    "org.postgresql"       % "postgresql"          % Versions.Postgres,
    "org.flywaydb"         % "flyway-core"         % Versions.Flyway
    "org.http4s"          %% "http4s-blaze-client" % Versions.http4s,
    "org.http4s"          %% "http4s-blaze-server" % Versions.http4s,
  ).map(_ % Test)

Note that we also added actual HTTP server and client implementations as well - for we will be exercising the HTTP layer in tests now.

We need to define a new lifecycle resource for our integration tests, which will still have the signature of Resource[IO, Probe], but it will do much more when that resource is used:

  1. Start Postgres container with TestContainers (capture the JDBC url and credentials)
  2. Point Flyway at that Postgres instance and run the migrations
  3. Parse the JDBC URL into a config our own Skunk connector can understand
  4. Connect to the database
  5. ... proceed with the rest of initialisation

Starting the container is easy:

package jobby
package tests
package integration

// ..imports..

object Fixture:
  // ...
  private def postgresContainer: Resource[IO, PostgreSQLContainer] =
    val start = IO(
      PostgreSQLContainer(dockerImageNameOverride =
        DockerImageName("postgres:14")
      )
    ).flatTap(cont => IO(cont.start()))

    Resource.make(start)(cont => IO(cont.stop()))

Could be even shorter if we didn't try to use the latest and greatest in what Postgres has to offer.

Note that we're making it a resource to make sure there's no lingering containers even if our tests have failed.

Flyway migration is equally easy, providing we have all the necessary credentials:

private def migrate(
    url: String,
    user: String,
    password: String
): IO[MigrateResult] =
  IO(Flyway.configure().dataSource(url, user, password).load()).flatMap { f =>
    IO(f.migrate())
  }

Note that I am not an expert (or even a confident user) of Flyway, so I'm not sure if there's anything else this method needs to do (word baseline comes to mind but I don't know what it means).

Combining these two operations and returning a workable Skunk-backed Database implementation just need an extra method to parse the JDBC URL correctly:

private def parseJDBC(url: String) = IO(java.net.URI.create(url.substring(5)))

def skunkConnection(using
    natchez.Trace[IO]
): Resource[IO, (PgCredentials, Database)] =
  postgresContainer // start Postgres
    .evalMap(cont => parseJDBC(cont.jdbcUrl).map(cont -> _)) // read the configuration
    .evalTap { case (cont, _) =>
      // run flyway migrations
      migrate(cont.jdbcUrl, cont.username, cont.password)
    }
    .flatMap { case (cont, jdbcUrl) =>
      // parse configuration into our own config object
      val pgConfig = PgCredentials.apply(
        host = jdbcUrl.getHost,
        port = jdbcUrl.getPort,
        user = cont.username,
        password = Some(cont.password),
        database = cont.databaseName
      )
      
      // create a Skunk-backed Database instance
      SkunkDatabase.load(pgConfig, skunk).map(pgConfig -> _)
    }

Then to get our Probe the lifecycle is similar to what we had for stubs, except the whole database initialisation:

def resource(using natchez.Trace[IO]): Resource[IO, Probe] =
  for
    res <- skunkConnection
    pgConfig  = res._1
    db        = res._2
    appConfig = AppConfig(pgConfig, skunk, http, jwt, misc)
    generator <- Resource.eval(Generator.create)
    timeCop   <- Resource.eval(SlowTimeCop.apply)
    logger    <- InMemoryLogger.build
    routes <- JobbyApp(
      appConfig,
      db,
      logger.scribeLogger,
      timeCop
    ).routes
    // ..

But now these routes need to be used to launch an actual HTTP server, to which we need to point our HTTP client:

    uri <- BlazeServerBuilder[IO]
      .withHttpApp(routes)
      .bindHttp()
      .resource
      .map(_.baseUri)
    client <- BlazeClientBuilder[IO].resource
    probe <-
      Probe.build(
        client,
        uri,
        appConfig,
        logger
      )
  yield probe
  end for
end resource

Now if you use this resource, you will receive a fully functioning Probe that can will execute HTTP requests that will write to the actual database with your actual schema. I like this so much, it's wild.

So how do we make the final leap from having our specs and this new probe definition, to something that Weaver can actually run?

Runnable integration tests

We have two options:

  1. Use this as a per-spec resource, meaning that if you have 3 specs (e.g. Users, Companies, Jobs) then you'll have 3 Postgres containers and 3 HTTP servers running in parallel

  2. Utilise weaver's global resource sharing to launch only 1 HTTP server and 1 Postgres container, no matter how many specs you have.

At the time of writing, I think option (1) is actually not as bad as I initially thought - it definitely consumes a lot more resources but theoretically you have more control over how the probe is initialised, if you want to make changes to, say, configuration, or routes, or both.

Option (2) is great because it doesn't require a great deal of setup, and it's a lot lighter on consumed resources, meaning the difference between 5 and 50 different specs running in parallel is not as severe.

To make it a global resource, first thing we'll need to do is tell Weaver how to initialise the resource:

package jobby
package tests
package integration

import cats.effect.*
import cats.effect.std.*
import cats.syntax.all.*

import jobby.spec.*

import natchez.Trace.Implicits.noop
import weaver.*

object Resources extends GlobalResource:
  override def sharedResources(global: GlobalWrite): Resource[IO, Unit] =
    baseResources.flatMap(global.putR(_))

  def baseResources: Resource[IO, Probe] = Fixture.resource

Two things make this work:

  1. Weaver is able to reflectively find all objects that extends the special GlobalResource trait - and initialise it before all the tests start up.

The baseResources method is not strictly necessary, but we'll use it later for convenience.

  1. When Weaver invokes Resources.sharedResources(..), the initalised resource is written (using type as key) into the storage maintained by the framework

Now, our first (inconvenient) implementation of IntegrationSuite base class can look like this:

abstract class IntegrationSuiteWrong(global: GlobalRead) extends JobbySuite:
  override def sharedResource = global.getOrFailR[Probe]()
end IntegrationSuiteWrong

Where we retrieve the resource we need by its type (Probe).

It works well when both Resources and classes implementing IntegrationSuiteWrong are in the same package and you run the entire package, e.g. sbt> testOnly jobby.tests.integration.*.

But if you run an individual spec, like testOnly jobby.tests.integration.UserTests, then the framework actually cannot pick up Resources object, because the build tool doesn't pass it along. In that scenario, our only option is to re-initialise the required shared resources within the spec itself.

So let's rewrite it as so:

abstract class IntegrationSuite(global: GlobalRead) extends JobbySuite:
  // Provides a fallback to support running individual tests via testOnly
  private def sharedResourceOrFallback(read: GlobalRead): Resource[IO, Probe] =
    read.getR[Probe]().flatMap {
      case Some(value) => Resource.eval(IO(value))
      case None        => Resources.baseResources
    }

  override def sharedResource = sharedResourceOrFallback(global)
end IntegrationSuite

Defining our actual tests is almost as easy as the stub tests:

package jobby
package tests
package integration

import weaver.*

class UsersTests(global: GlobalRead)
    extends IntegrationSuite(global)
    with jobby.tests.UsersSuite

class CompaniesTests(global: GlobalRead)
    extends IntegrationSuite(global)
    with jobby.tests.CompaniesSuite

class JobsTests(global: GlobalRead)
    extends IntegrationSuite(global)
    with jobby.tests.JobsSuite

And it sure runs!

2022.06.25 19:29:00:888 io-compute-6 INFO :whale: [testcontainers/ryuk:0.3.3]
    Creating container for image: testcontainers/ryuk:0.3.3
    Container testcontainers/ryuk:0.3.3 is starting: 8be6f912a462fdd5f3873d27a946e92de4c538ddefb145b794af1a82641fcbb6
    Container testcontainers/ryuk:0.3.3 started in PT0.599771S
[info] jobby.tests.integration.CompaniesTests
[info] + Creation by authenticated user 794ms
[info] + Deletion by the owner 866ms
[info] jobby.tests.integration.UsersTests
[info] + Registration and authentication 46ms
[info] + Using wrong credentials 115ms
[info] jobby.tests.integration.JobsTests
[info] + Creating jobs by authenticated company owner 286ms
[info] + Listing latest jobs 1s
[info] Passed: Total 6, Failed 0, Errors 0, Passed 6
[success] Total time: 6 s, completed 25 Jun 2022, 19:29:06

I've purposefully silenced some, but not all of the loggers, to demonstrate that the containers are indeed started. The way you can silence loggers in Scribe is this by the way:

import scribe.{Logger, Level}
val silenceOfTheLogs =
  Seq(
    "org.http4s",
    "org.flywaydb.core",
    "org.testcontainers",
    "🐳 [postgres:14]"
  )

silenceOfTheLogs.foreach { log =>
  Logger(log).withMinimumLevel(Level.Error).replace()
}

And with this, I believe our main goals are achieved - we are using the same test specifications to run in-memory tests as well as tests against running services.

As the number of specifications grows, the execution time for integration tests will grow much quicker than that of stub tests, which can be used in quick feedback loops during feature development.

In fact, to test this difference I added 1000 copies of the same tests to one of the spec:

  1. stubTests finished in 3 seconds, with each test take 10-40ms
  2. integrationTests took 13 seconds to just report the fact that Blaze's wait queue was overfilled, failing 737 out of 1007 tests

Now, we can easily restrict the number of concurrent requests to 256 in our tests, but even successful tests took 2-3 seconds on average due to severe resource contention over a very limited physical network resource.

To extend this further, you can imagine end-to-end tests, where the fixture is instantiated without a database at all, just a HTTP client pointing at the services. The actual URL can come from environment or a configuration file.

All still using the same test specifications, with perhaps little modifications to add retries.


Obviously, none of this is relevant to you if you are writing functional Scala - if it compiles, then running it is no longer your responsibility or concern. You didn't spend 10 years studying pure mathematics (as some believe to be a forced pre-requisite for this code) to write pesky tests or worry about impure runtimes.