Type Class Derivation in Scala 3

Ron Aharoni
Riskified Tech
Published in
9 min readDec 15, 2021

--

Scala 3 added several powerful mechanisms which can be used for type class derivation. In this post, we’ll see how using several of them have enabled us to save time and effort when writing tests.

Riskified is a data company — we collect data from our customers and eCommerce merchants, and analyze it. We’re mainly working in the Scala ecosystem: Akka, Spark, and Cats.

A lot of data means a lot of data models. Many of our models, encoded by case classes and sealed traits in Scala, are generated from Avro and Protobuf schemas. These classes represent large, nested data models.

Our team also writes a lot of tests, and having to work with large data models in tests is inconvenient. We need to create the model and populate it with data, most of which is irrelevant for a specific test case. A good example of this is one of our core domain models, which has 59 fields spread across 7 (nested) classes. Each test is usually concerned with just a few of them.

One solution for this problem is to create random instances: instances in which fields are populated with random values according to their types. This way the test writer only needs to override the specific fields relevant to the test with meaningful values.

How can we create such instances without writing random generation code for each data type? In some languages, this would be solved with reflection, but Scala has a powerful capability — macros — which enables us to do this at compile time. This has advantages over reflection, like type safety and performance.

As macros are low level and complicated, there are two main libraries in Scala — Shapeless and Magnolia — that enable deriving type classes without using them. Shapeless takes a “logic programming” approach, where constraints expressed through implicit parameters are being solved by the compiler, resulting in an implementation of a type class for a certain type. Magnolia’s api is simpler, which makes code easier to read and write. We covered Magnolia in a previous post.

At Riskified, we actually started using Shapeless to solve the random instances problem, and migrated to Magnolia for a reduction in compile times. We were happy with the solution, but when Scala 3 came out we saw an opportunity to revisit the solution using several new features — generic tuples, enums, and inline — which enable type class derivation at the language level without using macros or libraries.

Deriving Random

To formalize the problem we are trying to solve, we want to derive an implementation for the following type class:

trait Random[A] {
def generate(): A
}

For those of you unfamiliar with type class derivation, it is a method of creating an implementation of a type class, in our case the Random type class, with the aid of the compiler. The compiler will need some information from us, such as how to create a random Long or a random String, as well as rules on how to combine those into more complex types.

These more complex data types are product types (case classes containing multiple fields), sum types (sealed traits with well known descendants), and combinations of them. Data types composed using sum and product types are known as “algebraic data types” (ADTs).

In our examples, we’ll use a Scala 3 addition, the enum construct, which enables describing ADTs. Let’s use it to describe a site member entity, with two subtypes:

enum SiteMember:
case RegisteredUser(id: Long, email: String, isAdmin: Boolean)
case AnonymousUser(session: String)

The Scala 2 representation would be a SiteMember sealed trait with two inheriting case classes.

The Algorithm

What is the logic required to create Random instances for a SiteMember?

First of all, we need to “roll a dice”, and decide which case of the enum to choose. Naturally, having two subtypes, we would expect to get each one 50% of the cases. Let’s say we chose RegisteredUser. RegisteredUser is a case class with three fields: a long, a string, and a boolean. We need to create a random instance of each field, and instantiate a RegisteredUser with those instances. To summarize, we need three capabilities to implement our random generation algorithm:

  • Inspect Enum's and get their subtypes:
  • Inspect case classes and get their field types:
  • Instantiate a case class with a list of its parameter values:

Using Mirror to inspect ADTs

All of these capabilities can be implemented using a type level mechanism called Mirror. Mirror is a type class in the Scala 3 standard library that provides information about ADTs. We can ask the compiler for a mirror by adding an implicit parameter. Here’s the mirror for theSiteMember enum and for the RegisteredUser case class(some fields were omitted):

// Mirror for the SiteMemeber enum
new
Mirror.Sum:
type MirroredElemTypes = (RegisteredUser, AnonymousUser)
// ...
// Mirror for the RegisteredUser case class
new
Mirror.Product:
type MirroredElemTypes = (Long, String, Boolean)
def fromProduct(p: Product): MirroredMonoType =
new RegisteredUser(...)
// ...

The mirror field we’ll need is MirroredElemTypes, which is a type level tuple containing types. Those types are the types of a case class parameters or the subtypes of an enum. This answers our first two requirements. The fromProduct method, which enables us to instantiate a case class from a tuple of constructor parameter values, answers the third.

A mirror is generated for every case class, sealed trait, and enum in your code, so it needs to be low footprint. That’s one of the reasons it’s implemented only as a compile time mechanism. Note that MirroredElemTypes is not a val or a def, it’s a type. That presents a difficulty — how can we work with types that exist only at compile time, as opposed to (runtime) values?

Running code at compile time using inline

The problem presented above means we need to call our random generating logic at compile time. That’s where another mechanism comes into play — inline. inline is a powerful new keyword that can be used in several ways — the main thing inline enables is to perform computations at compile time. The way it does that is indirect — the compiler will inline code recursively, actually calling our logic at compile time.

In our case, this means taking the MirroredElemTypes type level tuple and passing it through an inline function, which will convert it to a value level list of instances of the Random trait (type class). For the RegisteredUser case class, that looks like the following:

The implementation of summonAll is dense with new Scala 3 features, so we’ll go through it step by step.

inline def summonAll[A <: Tuple]: List[Random[_]] =
inline erasedValue[A] match
case
_: EmptyTuple => Nil
case _: (t *: ts) => summonInline[Random[t]] :: summonAll[ts]

Starting with the general structure and signature: summonAll is a recursive method that goes over a type level tuple, A, destructures it to head and tail, calls a function on the head values and collects the results in a list, which is returned.

erasedValue is a construct that enables us to match on types, instead of values. This is required since the match is on A, a type. Note that we can’t write anything other than underscores in the cases, as that would make the function not compile.

We also use new Scala 3 syntax for destructuring tuples, getting the head and tail of the tuple with the *: operator. The stopping condition for the recursion is met when we encounter the EmptyTuple type.

All of this is a structure that enables us to repeatedly apply summonInline, a built-in function that searches the implicit scope for implementations of a type class. summonInline[Random[Int]] for example, would give us a Random[Int] instance, if one can be found. Otherwise we get a compile error.

The summonAll function will be used by the algorithm for both enums, going from (RegisteredUser, AnonymousUser) to List(Random[RegisteredUser], Random[AnonymousUser]), and for case classes, going from (Long, String, Boolean) to (Random[Long], Random[String], Random[Boolean]) .

Transforming the type level tuples found in the mirror instances to value level lists of Random instances takes us most of the way in our derivation journey.

Tying it all together

We’re now going to write the entry point to our derivation code, which will in turn call one of the two algorithms recursively: either derive for an enum or for a case class:

object Random:
inline given derived[A](using m: Mirror.Of[A]): Random[A] =
lazy val instances = summonAll[m.MirroredElemTypes]
case s: Mirror.SumOf[A] => deriveSum(s, instances)
case p: Mirror.ProductOf[A] => deriveProduct(p, instances)

The method name derived is part of our contract with the Scala compiler. It will be used if you use a derives clause on an enum, or it can be called directly. This method is inline since it’s going to be called recursively if you’re deriving a nested structure, and it also enables inlining at call sites.

Continuing with the signature, we are requesting a Mirror for the type we’re deriving Random for, and in turn we provide an implicit (given) instance of Random for A.

First, the implementation runs the type-to-value level transformation discussed in the previous section. The same is true for enums and case classes, but the logic of what to do with the result is not, so we branch, using the Mirror to distinguish sum types from product types.

For sum types, the implementation will pick an item from the list containing the Random instances for subtypes, and call its generate method.

def deriveSum[A](
s: Mirror.SumOf[A],
instances: => List[Random[_]): Random[A] =
new Random[A]:
def generate(): A =
instances(Scala.util.Random.nextInt(instances.size))
.asInstanceOf[Random[A]]
.generate()

For product types, the implementation calls generate for each item in the instances list, which corresponds to the case class parameter types. We then feed that into the Mirror’s fromProduct method to instantiate the case class:

def deriveProduct[A](
p: Mirror.ProductOf[A],
instances: => List[Random[_]): Random[A] =
new Random[A]:
def generate(): A =
p.fromProduct(
toTuple(instances.map(_.generate()), EmptyTuple)
)

Note that we need to convert the list of random values to a tuple to be able to call fromProduct. The implementation is very similar to summonInline, and similarly makes use of inline and tuple destructuring. You can find it in the complete code example on Github.

We now have the ability to derive Random instances for ADTs — enums and case classes — but that wouldn’t be useful without random instances for the data contained within those structures, such as integers, longs, strings, and other primitives.

When providing those implementations, we’ll make common-sense choices (for example, we generate strings of length 5). The users can override those when using our derivation library. Here’s an example of such implementations:

import scala.util.Random as ScalaRandomgiven randString: Random[String] with
def
generate(): String =
ScalaRandom.alphanumeric.take(5).mkString
given randLong: Random[Long] with
def
generate(): Long = ScalaRandom.nextLong(1000000)

Using the library

Let’s use the library to derive a Random instance for the SiteMember enum. summon is the Scala 3 equivalent to implicitly, used to get an instance from the implicit scope.

@main def deriveRandom(): Unit =
println(summon[Random[SiteMember]].generate())
println(summon[Random[SiteMember]].generate())
println(summon[Random[SiteMember]].generate())
// output
RegisteredUser(291184,yCpq7,true)
AnonymousUser(8K51y)
RegisteredUser(110007,qQkmk,false)

Calling generate a few times, we get both subtypes of the enum and random values for the fields of each.

The full code can be found here. I also gave a talk about the subject at a meetup (Hebrew). The Scala 3 docs contain more information about derivation and the other features used.

Summary

The Random type class is just one that can be derived using the new mechanisms in Scala 3.

There are rough edges and issues with type safety in the derivation code — these are solved by using libraries such as Shapeless 3, which make derivation even more streamlined. With type class derivation being more accessible in Scala 3, it will be interesting to see more examples show up.

--

--

Ron Aharoni
Riskified Tech

Software engineer, I love Scala, functional programming, and describing myself in 160 characters