A new architecture for APIs – The New Stack

0

Having been in the API space for the past decade (at Apigee, Google, and now StepZen), and having done databases for two decades prior (at IBM and during my PhD at Berkeley), I can safely say security that there are two tricks that databases have worked well, this will revolutionize the way APIs are built and managed.

Anant Jhingran

Anant is the founder and CEO of StepZen, a startup with a new approach to simplifying how developers access the data they need to power digital experiences.

  1. The first is that databases work declaratively. This means you’re telling the database what to do, not how to do it. You follow this principle whether you are creating data or querying it. APIs, on the other hand, were mostly created programmatically.
  2. Second, just as important, many databases know how to federate, which means that if your data is scattered across two database systems, then one can run a query on both as if it were coming from one . The query you submit is scattered across multiple backend databases and the results are aggregated to look as if they came from a centralized system.

In this article we will discuss the latter, we have already written articles on the former.

API Federation

I had first heard of this concept when I heard how Netflix talked about their API level, specifically Dan Jacobsen, which is now at the New York Times. Netflix had “Domain APIs” that reflected how backends viewed and surfaced their data, and they had “Experience APIs” that reflected how apps wanted to access data. Each experience API made the right call to a set of (dispersed) domain APIs and then combined the (gathered) results into one.

But scattering/gathering is not easy. In traditional APIs, it’s hard-wired into how each API is built. If the Experience API needed more data, someone would come in and program that API to now disperse to another backend. If the backend API changed implementation, someone would come in and reprogram the experience API. Backend errors? Program. Execution problems? Schedule (add cache), etc., etc.

Also, why are there two levels, and does only two levels make sense? If you look at the World Wide Web, it’s an interconnection of pages, grouped into sites, grouped into domains, and so on. The structure of interconnectivity (href) is the same – and it allows arbitrary complex relationships to be formed.

This must be the new world of APIs. However, for this world to form, there is still a fundamental change that must occur. Scattering and collecting, when each backend produces arbitrary structures, is almost impossible. There must be “normalization” in one form or another. And that normalization is GraphQL.

GraphQL, which stands for Graph Query Language, has two wonderful features.

  1. It allows to assemble data:

return customer and order data in a single query, just like a federated database query.

  1. It returns the data exactly in the form of the query. No more no less.

Now imagine a GraphQL API graph:

A GraphQL query at any level can be dispersed into the next level’s subgraphs. Their responses are in the exact form of the subgraph subqueries sent to them. Putting them together (sew) is trivial, there are no shapes to struggle with, no logic to write. And it can continue down.

As you can see, this is a completely new API architecture. It is a federated graph of APIs and can be used to create a large supergraph – or a single graph of graphs – and many smaller graphs of graphs, which can be tailored to the right structure for an organization. It’s a very clean and easy concept. And this is the future.

However, that’s not all. This architecture has huge positive implications for performance, governance, and multicloud.

Performance

By federating and sending GraphQL subqueries, you don’t ship unnecessary data into the supergraph. In database terms, this is called pushdown – you let each subsystem do the maximum and only return the results of the calculation for the collection step. This is the difference between calculating the total amount of a member’s order by retrieving all orders and sending it to the customer subgraph (and letting it calculate the total), versus calculating the total in the member subgraph and only sending the total to the client subgraph. Good GraphQL systems understand what each subgraph can do and try to do maximum pushdowns. In traditional API architectures, this knowledge must be hard-wired into APIs higher in the chain.

Governance

With this federation model, data does not leave the subgraph unless it has to. Imagine you have an EU subgraph and a US subgraph. The broadcast phase ensures that the UE subgraph receives a request in its domain and can decide what data it can send upstream. Queries such as “what is the total amount for a customer” can return the total amount, without violating any privacy concerns or disclosing specific order data. Or the subgraph may decide to obfuscate some data to preserve confidentiality.

In addition to privacy concerns, a federated model is simply better for governance. Each team decides what their subgraph looks like. It can have a more detailed subgraph for internal use and expose less upstream functionality. Of course, since the data returned needs to make sense, that doesn’t preclude the need for lightweight governance across subgraphs, but it’s a lot less than it would be if it was all a jumble of logic programming.

multi-cloud

Imagine if some of your services were on Google Cloud, some on AWS, and some on-premises. You would want to manage them for governance and for performance, separately. In this world, this federated API structure is the only way to go.

Summary

APIs are great. However, API architectures have not evolved. With GraphQL, a new way to form a graph of graphs is emerging. This architecture leads to simpler design, better performance, simpler governance, and graceful migration to the cloud. This is the way to go.

Picture by Time1337 from Pixabay

Share.

Comments are closed.