Azure Cosmos DB Performance: Direct vs. Gateway mode
A performance comparison between the two different modes that Cosmos DB offers: Direct and Gateway mode. We use the default settings with these two modes and see how that affects application performance and scalability.
Background
A colleague came to me the other day with an interesting challenge. He had a customer who was using a Java Spring Boot application and running a simple test against Cosmos DB to test the performance and scalability of their web application. They were finding that using the Gateway mode resulted in better performance and scalability than the Direct mode.
My reaction was similar to his when he first heard of it: Whaaattt??!!!
The Gateway mode introduces an additional component between the client and Cosmos DB (visualized below in green). In addition, the Gateway mode also mandates the use of HTTP from the client. So, we have an additional network hop and the use of HTTP instead of TCP which the Direct mode uses. So with the Gateway mode, you’d expect a performance decrease due to the increased network latency at the very least.

The customer had a reproduction setup where there was a web application (Java Spring Boot) that inserted a single random number into Cosmos DB. And they configured the application to use the Gateway mode or Direct mode and ran a test using JMeter.
The customer claimed that not only did they see a better requests per second (RPS) with the Gateway mode, but also that the response times (95th percentile) were better than with Direct mode.
They had originally started with the Direct mode, but when they saw unexpected scalability issues, they tried the Gateway mode and saw (claimed) better performance and better scalability.
That didn’t sound right.
Primer: Why use Gateway mode?
Let’s first examine why the Gateway mode is there in the first place.
The Gateway mode exists for two main reasons:
- A corporate firewall and/or security policy prevents client applications from using anything else.
- Clients are neither using .NET nor Java as a development platform and are instead using Node.js or Python or a REST-based client.
There is an additional consideration for choosing the Gateway mode — that is when the number of socket connections (on the client) is limited.
Troubleshooting: Where to start?
The customer had deployed a bunch of virtual machines (with accelerated networking) to run the Spring Boot application in the same region as Cosmos DB. So the communication between the client and Cosmos DB was via the Azure backbone and not the internet.
They monitored the JVM CPU, memory and garbage collection statistics and saw nothing that indicated a bottleneck on the web server side.
From a high level, starting with the Cosmos DB Azure Service itself and working your way backwards to the client, we discussed the various components in the picture.
- Cosmos DB
- The Cosmos DB Gateway
- Cosmos DB Client library (and settings)
- The Client runtime stack (and networking)
Obviously, Microsoft provides 1, 2 and 3 in the list above. While there are likely to be bugs and issues with software components, Cosmos DB isn’t exactly a brand new service. And it seemed more likely that the source of the problems were nearer the bottom of the list rather than the top. This doesn’t imply that we eliminate 1, 2 and 3.
Reproducing the problem = couldn’t reproduce
We started out creating a simple setup in .NET and Java (not Spring Boot). Additionally, we used two identical Azure App Service instances, one for .NET and other for Java in order to host the application and load test it using standard functionality in Azure DevOps.
We also deployed the application to virtual machines running IIS and ASP.NET Core as well as VMs running the latest stable version of Java & Tomcat to eliminate that VMs somehow behave/scale differently than Azure App Services.
The simple set up did nothing more than what the customer’s reproduction did — it inserted a random number in Cosmos DB. And the client application was configured with out of the box defaults except for choosing the Gateway mode using HTTP in one case and the Direct mode using TCP in the other case.
We tried running a load test with different number of simulated users — one such test is below with 500 users:




Contrary to customer claims, all of these results were similar across the board showing that Direct mode was outperforming the Gateway mode and also scaling much better.
As one would expect.
Conclusion
So why was the customer claiming to see the reverse? We don’t really know — except that they had changed some settings for the embedded Tomcat that Spring Boot uses. This affected the number of threads in the thread pool available for both queueing as well as servicing requests.
While thread pool settings could adversely affect the scalability of the application, it still doesn’t explain why under a minimal load, the Gateway mode should have outperformed the Direct mode — as the customer originally claimed.
Every test we conducted — from a small handful of users to 1000s indicated otherwise.
The customer re-ran the tests and saw the same behavior we did: Direct mode outperforms and outscales the Gateway mode. They tried to go back to their original configuration that showed otherwise.
And guess what: they couldn’t reproduce their original claim either … :)