Phantom Types without phantom pain

Scala Matters
Jun 27, 2023
12 min read

Updated: Sep 27, 2023

Written by Jaroslav Regec

Introduction

Phantom types is an interesting feature of the Scala programming language. While extremely useful for library authors, they have a reputation of being too confusing, complex and unnecessary, especially for programmers coming from different backgrounds - weakly typed languages.

The idea behind phantom types is quite simple and various use cases turn out to be very powerful. So in the blog post we'll take a closer look at what phantom types are and how they can be useful even for day to day programming on client projects. Then we'll deep dive into more advanced use cases that are to be found in ZIO SQL library and at the end we'll peek into how they power up ZIO's environment type.

Phantom Types 101

Phantom type is a type for which there is no value. Since after type erasure all the types are forgotten at runtime, you might be wondering what's the point of such a type then? It is true that phantom type won't exist at runtime. Its sole purpose is to provide more type safety during compile time.

Let's imagine we are building an http client library where we have an Endpoint case class which contains the url and the port.

private[phantom] case class Endpoint(url: String, port: Int)

Endpoint without the url or the port doesn't really make sense, so we want to restrict creation of class by making it package private and provide a builder to create an instance of an Endpoint.

  final case class EndpointBuilder(url: Option[String], port: Option[Int]) {
    def withUrl(url: String) = EndpointBuilder(Some(url), port)
    def withPort(port: Int) = EndpointBuilder(url, Some(port))
    def build = Endpoint(url.get, port.get)
 }
object EndpointBuilder {
   def apply() = EndpointBuilder(None, None)
}

However there are some problems with this builder implementation. There is nothing that restricts us from constructing EndpointBuilder and not providing a URL or a port or even calling the build method right away.

EndpointBuilder().withUrl("localhost").build
EndpointBuilder().withPort(8080).build
EndpointBuilder().build

Such a code would throw NoSuchElementException at runtime. This is where phantom types are coming into the picture. They will make the code above not compile. First let's introduce two types URL and Port and also the P type parameter to EndpointBuilder.

type URL
type Port

final case class EndpointBuilder[P](url: Option[String], port: Option[Int]) {
   // code omitted
}

object EndpointBuilder {
    def apply(): EndpointBuilder[Any] = EndpointBuilder[Any](None, None)
}

Type parameter P in the snippet above is our phantom type. As you can see there is no value of type P. We need to propagate this change across the whole snippet so that the apply method now returns EndpointBuilder[Any], withUrl returns EndpointBuilder[P with URL] and withPort returns EndpointBuilder[P with Port].

 def withUrl(url: String) = EndpointBuilder[P with URL](Some(url), port)

 def withPort(port: Int) = EndpointBuilder[P with Port](url, Some(port))

After each of these calls we are constructing a new EndpointBuilder while adding additional information to the phantom type P. Our goal is that before we call build our phantom type will be Any with Port with URL. Note that Any does not add anything to the intersection type. We can prove it by calling the following:

implicitly[URL with Port with Any =:= URL with Port]

Here we are asking the compiler with =:= to prove that the types on the left and right hand side are equal.

Last thing to rewrite is the build method where we are using a so-called runtime proof <:<. We are essentially asking the compiler to create an instance of <:< type, in case phantom type P is a subtype of intersection type URL with Port.

 def build(implicit ev: P <:< (URL with Port)) = Endpoint(url.get, port.get)

Compiler is able to do that only when our phantom type is URL with Port- so in other words, that we have called withUrl and withPort methods before we are calling build method.

Now when we test our builder, only the first line of the following code snippet will compile, effectively restricting users of our API from constructing incorrect data structures at compile time.

EndpointBuilder().withUrl("localhost").withPort(8080).build
EndpointBuilder().withUrl("localhost").build
EndpointBuilder().withPort(8080).build
EndpointBuilder().build

Phantom Types and correct SQL queries

Phantom types can form much more complex structures than just simple abstract type members like URL and PORT in the preceding example. One of the most interesting use cases of phantom type usage - which I didn't find in any other codebase - was in ZIO SQL library, so let's dive into that.

Basically, when building queries, the so-called Features phantom types are added to the higher-kinded query type. Then, there are type classes that verify structures of phantom types and allow only certain operations on certain kinds of queries.

However we don't need to go too deep into ZIO SQL in order to understand this pattern. Let's see the phantom type structure first:

object Features {
  type Aggregated[_]
  type Union[_, _]
  type Source[ColumnName, TableType]
  type Literal
  type Function0
}

These types are part of many data types in ZIO SQL like Expr, Selection, Subselect etc. But again, there are no values of these types. Let's take Expr[F, TableType, A] as an example where F is our phantom type, TableType is a unique identifier of a SQL table and A is a generic type that expression produces. Expr[_, _, _] is a recipe of any SQL related expression or computation e.g. literal of value 21, column age in a table Person, where clause expression describing that age is bigger than 21, sql function max etc.

Let's see examples of how phantom type differs by each Expr.

// describes number 21
val lit: Expr[Literal, Any, Int] = Expr.literal(21)

// column `age` in `Person` table
val age: Expr[Source["age", Person], Person, Int] = ???

// where clause `age > 18`
val whereClause: Expr[Union[Source["age", Person], Literal], Person, Boolean] = Expr.Relational(age, 18, GreaterThan)

// aggregated SQL function `avg(number)`
val aggregatedFunction: Expr[Aggregated[Source["quantity",OrderDetail]], OrderDetail, Double] = Avg(quantity)

// non aggregated function with 0 params
val lowerCase: Expr[Function0, Any, ZonedDateTime] = PostgresFunctionDef.Now()

So these phantom types here are basically compile time descriptions of what kind of expr we are dealing with. Now when we further design the library, we can easily restrict calls to some methods just for some specific Expr.

Simplest example would be again to use the runtime proof <:< that we already know. Let's say we are building support for sql updates, which looks like the this:

UPDATE table_name
SET column1 = value1, column2 = value2, ...
WHERE condition;

We would model it with some case class Update that would hold runtime values as pure data. Later on, we could use this data and transform it to SQL String query.

final case class Update[A](table: Table.Aux[A], set: List[Set[_, A]], whereExpr: Expr[_, A, Boolean]) {

  def set[F1, F2, V](column: Expr[F1, A, V], value: Expr[F2, A, V])(
    implicit ev1: F1 <:< Features.Source[_, _],
    ev2: F2 <:< Features.Literal
    ): Update[A] = copy(set = set :+ Set(column, value))

  def where(where: Expr[_, A, Boolean]): Update[A] = copy(whereExpr = whereExpr && where)
}

For now, our main concern is to make sure that users won't call the set method with something nonsensical. We ask the compiler to create implicit evidence that phantom type F1 is a Source[_, _] and phantom type F2 is a Literal which makes sure that our users are setting values into columns.

We can even get much fancier than that. Let's say that we want to allow users to use aggregation functions in selections.

select(name, Sum(price))
    .from(product)

The problem with this is that we can't allow users to be able to execute such a query, because this would fail when run on the database. Before the execution, the user needs to call GROUP BY on all non-aggregated columns. Therefore we need to somehow keep track of all non-aggregated columns from the selection and force the user to call .groupBy(name) next.

Ideally, we would like the compiler to remember the phantom types of all the non-aggregated columns after the user calls select.

 def select[F, A, Unaggregated](selection: Selection[F, A]) =
    Select[F, A, Unaggregated](selection)

The types from our example query would be the following:

type F = Union[Source["name", Product], Aggregated[Source["price", Product]]]
type A = Product
type Unaggregated = Source["name", Product]

So now we could say that we want to only be able to execute queries for which F phantom type does not contain any Aggregated[_] types. Also, if we propagate the Unaggregated type, we would know exactly which Expr[F, _, _] our users have to call the group by with.

However we don't know what the type of unaggregated is at the point where we are calling select and also we don't know whether our selection is aggregated or not. So let's create a typeclass IsPartiallyAggregated parameterized over some type A with an abstract type member Unaggregated.

sealed trait IsPartiallyAggregated[A] {
    type Unaggregated
}

object IsPartiallyAggregated {

    type WithRemainder[F, R] = IsPartiallyAggregated[F] {
      type Unaggregated = R
    }

    implicit def AggregatedIsAggregated[A]: IsPartiallyAggregated.WithRemainder[Aggregated[A], Any] =
      new IsPartiallyAggregated[Aggregated[A]] {
        override type Unaggregated = Any
      }

    implicit def UnionIsAggregated[A, B](implicit
      inA: IsPartiallyAggregated[A],
      inB: IsPartiallyAggregated[B]
    ): IsPartiallyAggregated.WithRemainder[Union[A, B], inA.Unaggregated with inB.Unaggregated] =
      new IsPartiallyAggregated[Union[A, B]] {
        override type Unaggregated = inA.Unaggregated with inB.Unaggregated
      }

    implicit val LiteralIsAggregated: IsPartiallyAggregated.WithRemainder[Literal, Any] =
      new IsPartiallyAggregated[Literal] {
        override type Unaggregated = Any
      }

    implicit val DerivedIsAggregated: IsPartiallyAggregated.WithRemainder[Derived, Any] =
      new IsPartiallyAggregated[Derived] {
        override type Unaggregated = Any
      }

    implicit val FunctionIsAggregated: IsPartiallyAggregated.WithRemainder[Function0, Any] =
      new IsPartiallyAggregated[Function0] {
        override type Unaggregated = Any
      }

    implicit def SourceIsAggregated[ColumnName, TableType]: IsPartiallyAggregated.WithRemainder[
      Features.Source[ColumnName, TableType],
      Features.Source[ColumnName, TableType]
    ] = new IsPartiallyAggregated[Features.Source[ColumnName, TableType]] {
      override type Unaggregated = Features.Source[ColumnName, TableType]
    }
}

The code will look a little bit more scary here, because we are using an auxiliary type type WithRemainder[F, R] which basically moves the type member Unaggregated into a type parameter. It's useful to return an auxiliary type as a return type from implicit methods, otherwise the compiler tends to forget type information inside type members.

IsPartiallyAggregated is a sealed trait so we as library authors are in full control of all the instances of this typeclass. We want to create an IsPartiallyAggregated[F] instance for each of Features phantom types, while collecting all the unaggregated types as intersection types in Unaggregated type member.

For Literal and Function0, Unaggregated type can be Any because Any does not add any information to intersection type. It's like we leave Unaggregated type empty. The same applies for Aggregated[_] as we are collecting only unaggregated types. Union is interesting, because we require an instance of typeclass for both A and B types of the Union. Unaggregated type member then is inA.Unaggregated with inB.Unaggregated where inA is an instance of a type class for A type and inB is an instance for B type. FInally for Features.Source[_, _] we simply collect this whole phantom type to Unaggregated type member.

Now if we go back, our select changes a bit:

  def select[F, A](selection: Selection[F, A])(implicit i: Features.IsPartiallyAggregated[F]
   ): Select[F, A, i.Unaggregated] =
    Select[F, A, i.Unaggregated](selection)

With the help of IsPartiallyAggregated instance we can propagate to Unaggregated as the path dependent type on i to the Select. We can test that this works with a simple test method.

 type F = Union[Source["name", Product], Aggregated[Source["price", Product]]]

  def test[F](implicit i: IsPartiallyAggregated[F]): i.Unaggregated = ???

  val x : Source["name", Product] = test[F]

In the snippet above, we can see that the type of x is Source['name', Product] so we can use this information further. And indeed, let's look just one level down at the implementation of Select.

case class Select[F, A, Unaggregated](selection: Selection[F, A]) {

   def groupBy[F1, B](expr: Expr[F1, A, B])(
        implicit ev: F1 =:= Unaggregated
      ) =  ???

   def groupBy[F1, F2, B1, B2](expr1: Expr[F1, A, B1], expr2: Expr[F1, A, B2])(
        implicit ev: F1 with F2 =:= Unaggregated
      ) =  ???

   // other groupBy arities
}

As we can see, it's pretty easy to verify that groupBy was called with the right columns (exprs that represent columns in this case). All that we need is implicit evidence that intersection types of F coming from Expr are equal to the Unaggregated type summoned before.

The real ZIO SQL code to verify GROUP BY and HAVING is slightly more complicated than this - and in the latest version of the library, most of this machinery was replaced by macros in order to provide better error messages - but nevertheless, the phantom types are playing the key role in achieving type safety in this amazing library.

Phantom Types in ZIO

When the core ZIO contributors team first introduced the third type parameter R to ZIO, it was kind of controversial. The pushback from the community was based on the fact that ZIO already had 2 type parameters - E and A - describing error and success values, which seems like enough. And also, there was an argument that R just bakes Reader monad into ZIO. While comparing ZIO's R with Reader monad capabilities might be the right intuition to think about the R parameter, there is a very fundamental distinction. In ZIO, the R type parameter is in fact a phantom type. This allowed ZIO authors to add some useful operators to the library, making it a much more powerful data type than reader monad. First, let's talk about Reader monad, how we can use it and then we compare it to ZIO's R.

Reader Monad

Reader monad is a data type usually represented as Reader[R, A]. It needs an instance of R in order to be executed. In other words when we run Reader with R we get back an A. In the area of pure functional programming, this is considered useful because Reader allows us to model pure functions - functions that don't depend on any outside context, don't perform any side effects - they just compute the return value. Indeed this is useful e.g. when you want to test a particular function in a class that depends on some global service. Instead of mocking that service, we can just call the function - which returns a Reader - and then supply the instance of a service to the Reader.

If we push this idea of using Reader Monad to pass around global services further, we end up with something like the following:

 val customerRepo = new CustomerRepository {}

 val customer = CustomerService
    .findById("123")
    .run(customerRepo)

 object CustomerService {
   def findById(
        id: String
   ): Reader[CustomerRepository, Option[Customer]] =
      Reader { (repo: CustomerRepository) =>
        repo.findById(id)
     }
 }

We can easily test CustomerService#findById method with the test implementation of CustomerRepository. So as we can see, we are using Reader monad to pass dependencies around.

Now, let's say that for each customer, we want to find some discount coupons and send them to the customer's email.

     for {
          customer <- CustomerService.findById("123")
          coupon   <- AdvertisementService.findCouponFor(customer.get.id)
          _        <- EmailService.sendCoupon(customer.get.email, coupon)
      } yield ()

Now here comes the problem, because the return type of the above for comprehension is

Reader[CustomerRepository & AdvertisementRepository & EmailClient, Unit]

Reader monad composes in for comprehension and it correctly infers R type to be the intersection type of the dependencies. However, how can we run such a reader? If all three dependencies are classes, it's impossible to create an instance of such an intersection type, because in Scala we cannot extend more than one class. This would be possible if we design our application only with traits and then we could create an instance of mixins of traits, but that's a serious design commitment.

Instead, the best practice when using a Reader is to build some wrapper type that would contain all of our dependencies.

final case class DependencyWrapper(customerRepo: CustomerRepository, adsRepo: AdvertisementRepository, client: EmailClient)

Now we want the type of our main program Reader to be

DependencyReader[Wrapper, Unit]

But still, we probably don't really want CustomerService#findById method to depend on a DependencyWrapper while all we really need is CustomerRepository. This problem is solved by calling the local method on a Reader, which maps reader context to a more specific one. However, inside for comprehension the compiler cannot infer the wrapper type so we need to specify it ourselves.

for {
   customer <- CustomerService.findById(cardId)
     .local[DependencyWrapper](_.customerRepo)
   coupon <- AdvertisementService.findCouponFor(customer.get.id)
     .local[DependencyWrapper](_.adsRepo)
   _ <- EmailService.sendCoupon(customer.get.email, coupon)
     .local[DependencyWrapper](_.client)
 } yield ()

What I really love about Scala is the type inference and low boilerplate, so if you ask me, I think this looks terrible and it's probably the reason why I never really saw Reader be commonly used for propagating dependencies through applications. In my opinion, the Reader monad - at least in this form - is generally not an idiomatic solution for dependency injection in Scala.

Now, let's take a peek into how ZIO handles this problem and see how with the use of phantom types, all Reader monad problems go away.

Environment type R

As already mentioned before, the R parameter in ZIO[R, E, A] is a phantom type, so there exists no value of R. But first, let's rewrite the Reader monad example to ZIO. So instead of Reader[R, A] all of our functions will now return ZIO[R, Throwable, A].

If we compose all of our functions into for comprehensions, ZIO will infer values in the same fashion as Reader monad did. R type parameter is equal to the intersection type of CustomerRepository, AdvertisementRepository and EmailClient.

val program: ZIO[
    CustomerRepository with AdvertisementRepository with EmailClient, 
    Throwable, 
    Unit] = {
  for {
    customer <- CustomerService.findCustomerByCardId("123")
    coupons  <- AdvertisementService.findCouponsFor(customer.id)
    _        <- EmailService.sendCoupons(customer.email, coupons)
  } yield ()
}

As ZIO is only a description of our program, in order to do something useful, we need to run ZIO by ZIO runtime. However, only programs that have Any as an environment type ZIO[Any, _, _] are executable.

In order to turn ZIO[CustomerRepository with AdvertisementRepository with EmailClient, Throwable, Unit] into ZIO[Any, Throwable, Unit] we still sort of need to provide the environment value but not directly as a whole intersection type, but instead, we need to build a so-called Layer data type. Layer describes how service is constructed - it turns the constructor into a value.

// provide 3 layers as varargs to provide method
program.provide(
   CustomerRepository.layer,
   AdvertisementRepository.layer,
   EmailClient.layer
)

// alternate way of providing one layer with intersection type
program.provideLayer(
   CustomerRepository.layer ++
   AdvertisementRepository.layer ++
   EmailClient.layer
)

Coming back to phantom types, we can see that there is no type of CustomerRepository with AdvertisementRepository with EmailClient that users need to provide. Mechanism is under the hood more complicated than in case of the Reader monad, but from the usability perspective, ZIO infers beautifully and users just need to build and provide necessary layers.

So in other words, if the type of our ZIO program is ZIO[R1 with R2 with R2, E, A] then we need to either provide ULayer[R1], ULayer[R2] and ULayer[R3] or build and provide ULayer[R1 with R2 with R3] that describes the whole environment. As there is no actual value of type R1 with R2 with R3, it is a phantom type that is internally used by ZIO to verify that we have provided all the required environment in order to run our program.

Even if the usage of phantom types in ZIO's Environment is not as explicit as in the case of builder pattern at the beginning of the blog post or in ZIO SQL DSL, we saw that the idea of R being just a phantom type is the core building block of ZLayers.

Summary

Phantom type is a type without a value. Its purpose is to bring more type safety to your scala code. Whether it is a library's DSL or any API at your client project, phantom types can bring a lot of power and correctness to the codebase.

In this blog post we went on a journey to explore phantom types and their various use cases. We saw the very simple use case in the builder pattern, then how phantom types can assure correct construction of SQL queries and in the end we witnessed how ZIO uses environment phantom type R to provide a much superior solution to Reader monad.

Hopefully, this blog post triggered a bit of interest in you to explore possible use cases of phantom types more deeply and also to use phantom types in your next project.