TechnologyNovember 28, 2018

Getting started with GraphQL and Apache Cassandra®

Getting started with GraphQL and Apache Cassandra®

GraphQL has become more and more popular among developers to implement APIs (specially frontend developers) as it brings both simplicity and discoverability on the client side. But, what about server side? If you have already worked with Apache Cassandra® you know that designing the proper data model is key. You also know that you need to know your queries very well in order to ensure performance. Is it possible to create an API on top of Apache Cassandra with GraphQL allowing only valid queries? Let's have a look.

GraphQL positioning

GraphQL Logo

Yet another API specification

GraphQL is a query langage created in 2012 by Facebook. They used it multiple years in production before giving it to the open source community in July 2015. Some web giants adopted it right away like Pinterest but the big shift happened in September 2016 when Github announced they were embracing the technology. Most developers are now accustomed to developing REST API using OpenAPI standard (swagger-ish). What makes GraphQL so appealing? The main advantages are the following

  1. GraphQL is strongly typed : You can validate queries before firing them against the server. It allows the development of tools like GraphiQL to help you interact with your API
  2. GraphQL is less verbose : Clients will ask exactly for the data they need. Requests are forged on client side where they specified which data they need. Servers will send back the expected Payload not only filtering unwantering attributes but also a hierachical chunk of objects. Not only can you limit the number of data for each entity but you can also limit the number of requests by providing everything as a tree in a single call. Those features are really keen on mobile development or any use where the bandwith or data transfer are important.
  3. Your API is discoverable. That means that the client can ask the API to describe itself, to communicate about the entities and the functions available. You don't need some contract, the client can discover it at runtime.

But, GraphQL, like others technologies, is not a silver bullet and some pain points remain :

  1. There is a single endpoint to access all the operations. This could be both an advantage for the simplicity but this is mostly a drawback because this does not allow versioning.
  2. Even if strongly typed (which could allows all kinds of code generation engines) the tooling is limited. Technology is young. The schema is quite complex to comprehend, we wiil see that later in this article
  3. As each query may be unique and hierarchical there is no caching mechanism in place on the server side.
  4. There were no asynchronous operations previously, but this is becoming possible with last specification version

In the Advocates team we have been working with multiple formats and technologies to expose services like REST, gRPC, oData or GraphQL for our reference application KillrVideo. While GraphQL looks promising, it is not a silver bullet and should only be used with proper use cases. Our team had a talk on the subject you can view the slides and video. Here is our take on this topic:

  • GraphQL seems relevant when the network and bandwith matter (mobile) as there will be less calls and data on the wire,
  • GraphQL seems relevant when you need a typed schema (less flexibility) and some kind of contract between you and your consumer or maybe you don't know your consumer (public API)
  • GraphQL seems relevant when you have a highly connected or hierarchical data. Relations and traversal can be retrieved as a single query
  • GraphQL seems relevant when you need to aggregate multiple data sources (mashup).
  • For CRUD operations, plain old REST is still the way to go
  • For streaming, asynchronous and action/command oriented API, gRPC is our best choice

Enough with the chit chat, let's get our hands dirty.

Sample Application

Cassandra

KillrVideo Application and Data Access Object (DAO)

Without surprise we will leverage our reference application Killrvideo. We will focus on read and wite operations for our Comments section that can be found here. Here is a simplified of the version we will leverage:

public interface CommentDseDao {

/** Provide videoid and paging information to retrieve a set of comments. */ ResultListPage findCommentsByVideoId(QueryCommentByVideo query);

/** Provide userid and paging information to retrieve a set of comments. */ ResultListPage resultComments = findCommentsByUserId(QueryCommentByUser query);

/** Insert new comment on all expected tables from a web bean. */ void insertComment(Comment myComment);

// [..]
}

GraphQL Schema

The first thing to do is to create the GraphQL schema defining all enties and available operations. There are two types of services QUERY and MUTATIONS. Mutations seem familiar right? It's indeed the exact same notion we find with Apache Cassandra. The key takeaway here is to notice by default everything is forbidden and only allowed operations declared in the schema will be available. THIS, is a perfect match for Cassandra usage where we don't allow users to execute forbidden queries like select * from myHugeTable and make our database blow up by allowing for a full table scan.

# Killrvideo GraphQL API
schema {
   query: Query
   mutation: Mutation }

# Searches and read-only operations on KillrVideo Keyspace #-------------------------------------------------------------

type Query {

    # Search in table comment_by_video, eventually with Pagination.
    getVideoComments(videoid: String!, commentid: String, pageSize: Int , pageState: String): ResultPageCommentGQL! # Search in table comment_by_user, eventually with Pagination.
   getUserComments(userid: String!, commentid: String, pageSize: Int , pageState: String): ResultPageCommentGQL!
}

# Operation that will updated data in DB
#------------------------------------------------------------- type Mutation {
   # Add a comment for dedicated video and known user.
   commentOnVideo(commentid: String!, videoid: String!, userid: String!, text: String!): CommentGQL!
}

For now we will have to declare the entities used as input and outputs. There are a number of types available in the schema (Scalar, Object, Interface, Union, InputObject, Enum) and you can details here.

# Represent a video comment in GraphQL format type CommentGQL {
   #Unique identifier for a user (tech id) commentid: ID!
   # Unique identifier for a user (required) userid: String!
   # Unique identifier for a video (required) videoid: String!
   # Text of the comment comment: String
   # Insertion Date
   dateOfComment: String } type ResultPageCommentGQL {
   listOfResults: [CommentGQL]!
   nextPage: String
}

Spring Boot and GraphQL

With this GraphQL schema we are all set to start implementing. The same schema can be used with many different languages. In our sample we will use Java and graphql-javaframework. Instead of mapping everything manually we will leverage on relevant spring-boot-starter and define expected beans. First thing to do is to declare the following dependencies in your pom.xml file.

<!-- GraphQL -->
<dependency>
 <groupId>com.graphql-java</groupId>
 <artifactId>graphql-java</artifactId>
</dependency>
<dependency>
 <groupId>com.graphql-java</groupId>
 <artifactId>graphql-java-tools</artifactId>
</dependency>
<!-- SpringBoot -->
<dependency>
 <groupId>com.graphql-java</groupId>
 <artifactId>graphql-spring-boot-starter</artifactId>
</dependency>
<dependency>
 <groupId>com.graphql-java</groupId>
 <artifactId>graphiql-spring-boot-starter</artifactId>  <!-- += We will speak about it in a minute -->
</dependency>

Define the associated POJO :

public class CommentGQL implements Serializable {
   private static final long serialVersionUID = -4032110812123661790L;
   protected String userid;
   protected String videoid;
   protected String comment;
   protected String commentid;
   private Date dateOfComment;
   // getters, setters
}
public class ResultPageCommentGQL {
   private List < CommentGQL > listOfResults = new ArrayList<>();
   private String nextPage;
// getters, setters
}

Define the expected 2 beans com.coxautodev.graphql.tools.GraphQLMutationResolver and com.coxautodev.graphql.tools.GraphQLQueryResolver. We use the method names declared in the schema file. From there this is simply mapping parameters and invoking the DAO. Some parts of code are simplifier, full source code can be found here.

@Component
public class KillrvideoMutation implements GraphQLMutationResolver {

   @Autowired
   private CommentDseDao commentDseDao;

   public CommentGQL commentOnVideo(String commentid, String videoid, String userid, String text) {
   // Ommited parameters validation
   Comment newComment = new Comment(); // Populating bean
   commentDseDao.insertComment(newComment);
   return new CommentGQL(newComment);
  }
}

@Component
public class KillrvideoQuery implements GraphQLQueryResolver {

   @Autowired
   private CommentDseDao commentDseDao;

   public ResultPageCommentGQL getVideoComments (String videoid, String commentid, int pageSize, String pageState) {
   QueryCommentByVideo qcbv = new QueryCommentByVideo();
   // Mapping ommited
   ResultListPage resultComments = commentDseDao.findCommentsByVideoId(qcbv);
   ResultPageCommentGQL result = new
   ResultPageCommentGQL();

 resultComments.getPagingState().ifPresent(result::setNextPage);
   result.setListOfResults(
      resultComments.getResults().stream()
            .map(CommentGQL::new)
            .collect(Collectors.to
List()));
      return result;

}

   public ResultPageCommentGQL getUserComments (String userid, String commentid, int pageSize, String pageState) {
   QueryCommentByUser qcbu = new QueryCommentByUser();
   // Mapping ommited ResultListPage resultComments = commentDseDao.findCommentsByUserId(qcbu); ResultPageCommentGQL result = new ResultPageCommentGQL(); resultComments.getPagingState().ifPresent(result::setNextPage); result.setListOfResults(
               resultComments.getResults().stream()
                          .map(CommentGQL::new)
                          .collect(Collectors.toList()));
      return result;

   }
}

We have now everything we need to make it work. If you look at the github repository you will find some boiler plate code like a main and configuration loading here and there. To start the application, execute the standard mvn spring-boot:run. Your api is now started and is available at http://localhost:8083/gql . The API can be invoked from clients but there is still no user interface to test.

GraphiQL

GraphiQL is a graphical interactive in-browser GraphQL IDE. It allows you to discover existing GraphQL endpoints and fire sample requests. To enable this interface in our sample we added the dependency graphiql-spring-boot-starter in our pom.xml. You can now access the api at http://localhost:8083/gql/graphiql. On the right hand side of the screen you can see the different functions we define, the entities but also the comment we added in the schema file.

Documentation Explorer

Testing our Application

Testing is now straight forward, build your queries and execute using the run button at the top of the page. You will notice that there is some autocompletion in the panel on the right if you use the shortcut CTRL+SPACE.

GraphiQL

userid and videoid. To do so we use DataStax Studio and query the table comments_by_video. You can see in the picture that we can use the existing video id 172219b0-1662-4f11-9232-f2ba5ecec16b and the existing userid : c98a0721-c8ce-408e-9065-45be9511771c

Create a Comment

Let's create a mutation to insert a comment using the schema. commentid is expected to be a valid TimeUUid. To generate one you can either execute the following test in your favourite IDE System.out.println(UUIDs.timeBased());or use cqlsh or DataStax Studio with the following command : SELECT now() FROM killrvideo.comments_by_video LIMIT 1

Create Insert Comment Mutation

Ok, now query the list of comments for our user c98a0721-c8ce-408e-9065-45be9511771c and observe the expected 3 comments. We only needs 2 attributes which are the date of the comment and the text so let's filter :

Query and Filter

Takeaways

We have seen that GraphQL can be easily used with a Cassandra backend.You can't implement CRUD operations, but you can allow queries you want to enforce good performance on your Cassandra clusters.

Happy coding!

One-Stop Data API for Production GenAI

Astra DB gives developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.