Using projections in your data-access layer

If you care about your data-access layer to make it as fast as possible, you should consider using projections for reading data. Check a simple example together. We have cities where certain people live in. This is how one can model this with JPA.

@Entity
public class Person {
    @Id
    @GeneratedValue
    private Long id;

    @Column(name = "language")
    private String language;

    @ManyToOne(fetch = FetchType.LAZY)
    @JoinColumn(name = "city_fk", referencedColumnName = "id")
    private City city;

    // getters / setters omitted
}
@Entity
public class City {
    @Id
    private Long id;

    @OneToMany(mappedBy = "city", cascade = CascadeType.ALL)
    private Set<Person> people = new HashSet<>();

    // getters / setters omitted
}

Let’s say we want figure out the all the spoken languages of the people living in a city. One can come up with the following solution:

public Collection<String> getLanguagesOfCity(final Long cityId) {
    TypedQuery<Person> query = entityManager.createQuery("SELECT p FROM Person p WHERE p.city.id = :id", Person.class);
    query.setParameter("id", cityId);
    Collection<Person> people = query.getResultList();
    return people.stream().map(Person::getLanguage).distinct().collect(toList());
}

This code will result in the following query:

select
    person0_.id as id1_1_,
    person0_.city_fk as city_fk3_1_,
    person0_.language as language2_1_ 
from
    person person0_ 
where
    person0_.city_fk=?

If you run this with Oracle, you will get the following execution plan:

----------------------------------------------------------------------------
| Id  | Operation         | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |        |     1 |    33 |     2   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| PERSON |     1 |    33 |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------

As you can see, Hibernate will issue a query to fetch all – except lazy – attributes from the database despite the fact that you will only use the language attribute of the resulting entities. Of course this is not the most effective way as you make unnecessary mandatory work for your database.

If you fetch only the data you need, you will speed up your application as well as your database. Imagine if you only need the language, why not fetch the language attribute only? For sure, you can do this, those selections which fetch only a subset of the data are called projections. With JPA, Spring Data, you can easily create projection objects as I will show it in the next couple of minutes.

Benefits

There are 3 main benefits of using projections – if you use them properly:

  • Hibernate doesn’t have to manage entity state transitions. In case you fetch hell lot of data, you will end up with a slow application just because Hibernate has work to do. If you use projections, you can eliminate this cost.
  • Memory footprint. Let’s say the query will result in lot of data. If you fetch the entities itself, the database have to fetch all the data from the disk, has to send it back to your application, your application has to store it in the memory. It’s a waste, isn’t it?
  • Covering index. It’s not any kind of magic, basically the idea here is that you fetch the data which you have index on. In this case, the database can skip the whole table access and go with faster index access.

Defining projections

Okay, now let’s see how can we define projections within JPA with JPQL or Criteria API and with Spring Data JPA.

What you need for a JPA projection is a simple POJO with a proper constructor.

public class Language {
    private String code;

    public Language(final String code) {
        this.code = code;
    }

    // getters omitted
}

Then in JPQL, you can easily write the following query:

SELECT DISTINCT NEW com.arnoldgalovics.blog.domain.Language(p.language) FROM Person p WHERE p.city.id = :id

Important note that you have to use fully qualified class name as the JPA provider doesn’t know anything about data holder classes other than entities.

In Criteria API, you can go with the following implementation:

CriteriaBuilder cb = entityManager.getCriteriaBuilder();
CriteriaQuery<Language> query = cb.createQuery(Language.class);
Root<Person> root = query.from(Person.class);
query.select(cb.construct(Language.class, root.get("language")));
query.where(cb.equal(root.get("city").get("id"), cityId));
query.distinct(true);
entityManager.createQuery(query).getResultList();

Both implementations will result in the following SQL queries:

select
    distinct person0_.language as col_0_0_ 
from
    person person0_ 
where
    person0_.city_fk=?

The execution plan will look like the following:

----------------------------------------------------------------------------
| Id  | Operation         | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |        |     1 |    20 |     2   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| PERSON |     1 |    20 |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------

If you compare the memory consumption to the first approach, it’s almost half in size. Of course it’s not that big of a deal now as we operate on really small tables but if you think about dozens of attributes unnecessarily fetched, it would be more visible. If you don’t believe me, check out Lukas Eder’s article.

It’s even easier to create a projection with Spring Data. You only need to create an interface which represents the projection, in this case:

public interface LanguageProjection {
    String getLanguage();
}

The Repository looks like this:

public interface PersonRepository extends JpaRepository<Person, Long> {
    Collection<LanguageProjection> findDistinctByCityId(Long cityId);
}

It’s really really easy to do this with Spring Data. If you call this repository method, it will execute the following query:

select
    distinct person0_.language as col_0_0_ 
from
    person person0_ 
left outer join
    city city1_ on person0_.city_fk=city1_.id 
where
    city1_.id=?

As you can see, Spring Data generates a little less efficient query as it joins the whole city table despite the fact it’s not necessary. I’ve filed a ticket to take a look on this, maybe I’m missing something.

Conclusion

If you want to have a fast data-access layer, there are several things you have to care about. This includes using projections as in this way you can optimize to fetch only the data you need. For reading purposes, I’d really suggest using projections instead of entities as it can boost up the performance of your application. For changing states, use entities.

The source can be found on GitHub. Feel free to reach me out in case of questions in the comments or on Twitter.

2 Replies to “Using projections in your data-access layer”

    1. Arnold Galovics says:

Leave a Reply

Your email address will not be published. Required fields are marked *