Ginbits scales SQL databases with Vitess
Author: Kostas Papanikolaou
Ginbits scales SQL databases with Vitess
Solving scalability challenges takes up a large portion of the time and resources invested within the development community. Back in 2010, the team at YouTube faced several MySQL scalability challenges and Vitess was created to solve said challenges. Ginbits uses it since it allows us to:
- Scale our SQL databases by allowing us to shard it while keeping the changes made to our applications to a minimum
- Migrate for baremetal to a private or public cloud
- Deploy and manage a large number of SQL database instances
Creating lightweight connections, Vitess uses Go and specifically its concurrency support to map these lightweight connections. This is called “connection pooling” and it allows Ginbits to handle thousands of connections easily.
Ginbits also uses Go to perfect its API Gateway. You can learn more about how we exploit what Golang has to offer on our blog:
What is Vitess?
A little bit of history and the reasons why Vitess came to be paired with terminology help us understand what it is and offers. Vitess is “a database solution for deploying, scaling and managing large clusters of open-source database instances”, as per the official documentation. At the time of writing this article, Vitess supported MySQL, Percona, and MariaDB. It is architected to run effectively both in public and private cloud architecture, as well as on dedicated hardware. Vitess uses the scalability of the NoSQL database to combine and extend several important SQL features.
As mentioned above, it was created to solve the scalability challenges presented to the YouTube team. Below, we feature a short description of the events that led to Vitess being created:
- The MySQL database of YouTube reached a point when peak traffic would eventually exceed the serving capacity of the database. YouTube created a master database for write traffic and a replica database for read traffic, to temporarily alleviate the issue
- With demand at an all-time high (specifically for cat videos, no joke), read-only traffic remained high enough so that it overloaded the replica database, leading to the creation of even more replicas
- As expected, write traffic became too high for the master YouTube database to handle, requiring the company to shard the data to handle the incoming traffic
- The application layer of YouTube was finally modified so that prior to the execution of any database operation, the code would be able to identify the correct database shard to receive that query
What Vitess did was remove that logic entirely from the source code. It introduced a proxy between the application and the database that would route and manage database interactions. This led to a great increase of capacity to serve pages, process new videos, and more. What is even more important is that Vitess as a platform continues to scale.
What’s even more interesting is that Vitess works greatly with Kubernetes, yet another technology we use at Ginbits. Find out more about what it allows us to achieve at our blog:
Fundamental Vitess Features
What better way to understand why Ginbits uses Vitess for scalability than presenting some of its features? After all, these are the fundamental reasons that it was created and is being adopted by more and more companies worldwide.
When it comes to performance-related features, Vitess offers the following:
- Connection pooling – Multiplex front-end application queries onto a pool of MySQL connections to optimize performance
- Query de-duping – Reuse results of an in-flight query for any identical requests received while the in-flight query was still executing
- Transaction manager – Limit the number of concurrent transactions and manage timeouts to optimize overall throughput
Regarding protection, Vitess has to offer a stellar stack of features and perks. Here’s what the main ones look like:
- Query rewriting and sanitization – Adds limits and avoids non-deterministic updates
- Query blacklisting – Customizes rules to prevent potentially problematic queries from hitting the database
- Query killer – Terminates queries that take too long to return data
- Table ACLs – Specifies access control lists (ACLs) for tables based on the connected user
Checking your database and its health is of vital (sic) importance. Vitess includes performance analysis tools that let us monitor, diagnose, and analyze our database performance
Regarding sharding, Vitess excels on that front as well. It includes virtually seamless dynamic re-sharding, vertical and horizontal sharding support, and multiple sharding schemes that have the ability to plug-in custom ones.
Vitess “vs” Vanilla MySQL and NoSQL
When talking and exploring solutions regarding deploying, scaling, and managing large clusters of open-source database instances, comparison among said solutions is necessary. When compared to Vanilla MySQL and NoSQL, Vitess has a lot to offer, improving their implementation greatly.
When compared to Vanilla MySQL, Vitess:
- Creates lightweight connections which thanks to its pooling feature that uses the concurrency support of Go, it maps these connections and can easily handle thousands of them
- Employs an SQL parser that uses a configurable set of rules to rewrite queries that might hurt the performance of the database
- Supports a variety of sharding schemes and migrates tables into different databases, scaling up or down the number of shards, doing so non-intrusively, and managing to complete most data transitions within mere seconds of read-only downtime
- Manages the lifecycle of database scenarios, while supporting and automatically handling various scenarios, including primary failover and data backups
- Using topology backed by a consistent data store it maintains the cluster view up-to-date constantly, as well as ensuring it is consistent. It also provides a proxy that routes queries efficiently to the most appropriate MySQL instance
Regarding Vitess when compared to NoSQL, we see that:
- It supports complex query semantics like where clauses, JOINS, aggregation functions, and more
- Supports transactions
- Adds very little variance to MySQL, a database that most people already know how to work with
- Allows us to use all the indexing functionality that MySQL has to offer, which in turn optimizes query performance