Synthetic testing is a powerful tool for monitoring the performance and availability of your application, providing a high level of confidence that it is working as expected. However, deploying independent processes to host synthetic tests doesn’t help identify issues that occur only in real user scenarios. The solution is to run synthetic tests in production, but how can we do this without negatively impacting the production environment?
Teams often express concerns about the potential impact on production systems—a valid consideration that should be carefully weighed. For example, you may want to limit tests to a cadence of every few minutes or every hour to minimize any potential disruption. Once you’ve decided to proceed, determining how to run the tests effectively becomes crucial. Introducing logically partitioned state machines can be beneficial in this context.
This pattern assumes that you are using state machine replication-based systems, such as those found in Aeron Cluster and sequencer-based systems. By partitioning the system using specific identifiers, you can host live production tests on an isolated logical partition. Let’s consider an example of a trading venue. In this venue, there are two logical partitions: one for real trading activities and one for synthetic testing. The synthetic testing partition is isolated from real users by enforced routing rules within the state machine. Importantly, the code runs in the same production environment as the real users', making it subject to the same bugs, performance issues, and other problems.
When accepting orders, only authorized users can interact with their respective logical partitions. Test users can access only test products in the test partition, while production users can access only production products in the production partition. This setup ensures that the synthetic testing partition is isolated from the production partition, eliminating concerns about cross-contamination and allowing you to run synthetic tests in production safely.
While synthetic testing in production is valuable, it can sometimes overlook issues that real users experience. For instance, if synthetic tests don’t follow the exact same network paths as real users, they may not detect the same latency or connectivity problems. Therefore, synthetic testing should be used in conjunction with real user monitoring. If real users are less active, experiencing higher latencies, or encountering connectivity issues, it should indicate that something is wrong despite successful synthetic tests.
By using logical partitioning and combining synthetic testing with real user monitoring, you can confidently run synthetic tests in your production environment. This approach provides valuable insights without compromising the integrity of your production systems.