A Simple Redis Cluster on Windows

Recently I’ve worked on a project where we enabled Redis caching for our web applications. So for that, I had a chance to develop a custom reactive spring-boot Redis cluster client library (for a Spring webflux project) and integrate it. The problem I faced was the AWS Redis cluster nodes were not available for my office VPN (maybe ssh-ing to the bastion server and trying to connect Redis will work, I haven’t tried it). When testing the new service on the local set was a nightmare as I don’t have the Docker desktop(due to the subscription).

Hence I tried to create a small Redis cluster on windows as I can use this for my local dev testing with the Redis client library (currently, I’m developing on the windows platform). In this article, I will give an introduction to Redis, its topologies, its clustering configurations and some database theories(just for fun ๐Ÿ˜ ). You can skip straight to the Lets build a Redis cluster.

Let’s talk about some Database theories. Here I’ll be going to discuss database types, Data fragmentation and replication. So What is a database? according to the definition; “Database is a collection of organized data which is structured and stored electronically on a computer system”. DBMS is used to store a huge amount of data and enables to search and locate the data immediately. Database types can be categorized based on model, locations, processing powers and hostings.

When talking about db types based on model, there are three types; Relational, No-Sql(Non relational) and object.

Relational DB table
Relational DB table

This is the most oldest and used DB type. Here data is stored in tables as rows and values as an attribute in columns. The SQL language is used to query the data. These DBs are stable and support a range of data types and are ACID compliant. These are mainly used in online transactions, data warehouses. OracleDB, MySQL,Postgres are some popular relational databases.

Non-Relational DB types
Non-Relational DB types

In non-relational DBs, there are no tables. Data can be stored in flexible data models. Commonly used No-SQL databases are; Documented databases, Key-value stores, wide-column based DBs and graphs DBs. The main features are; support for multiple data structures, flexibility due to non/semi-structured data and scalability. Those types of DBs are mainly used in Real-time systems. Popular NoSQL DBs are; MongoDB, Redis, CouchDB.

There are different types of database based on physical locations;

Centralized DB
Centralized DB (scaler.com)

All data and information are stored in a single location where it can be accessed from numerous locations. The main features are; Data integrity as data is in single location and reduces redundancy, simplicity and cost-effectiveness. Those are mainly used in onsite school and universities, government organizations etc.

Centralized DB
Centralized DB (scaler.com)

A distributed database is multiple interconnected DBs spread out in different locations connected by a network. As the DBs are all connected end users see them as a single DB. These can scale horizontally by adding multiple nodes which offer greater availability. A centralized distributed database management system (DDBMS) manages all the data as its in one place and allows to synchronize of all data operations among the DBs. The main features are; location isolation, distributed query processing and transaction management.

There are mainly two type of distributed database types;

  1. Homogeneous DDBS - A network of identical databases stored in multiple locations where all share the same OS and DB schemas. These systems are easy to manage and design.
  2. Heterogeneous DDBS - This results when different individual systems have implemented their own DBs to integrate later. They uses different schemas, DDBMs. A communication translation is required to establish communication between different DBMSs.

Partitioning
Partitioning
When a database grows, it affects the overall DBMS performance. As a solution for the problem partitioning can be used. Partitioning is a database process that splits the large database into small chunks by separating rows or columns. The forming new tables are called “Partitions” (or shards). By creating smaller tables, queries can scan new tables with less latency as there are fewer data to scan.

  • Vertical Partitioning
    • This is mainly used in SQL DBs to increase performances when a query is retrieving all columns from a table that contains many lengthy text or BLOBs. To reduce the access time, it can be splitted into smaller tables containing a partition key and the huge BLOB/text.
  • Horizontal Partitioning
    • This is replicating the schema of the original table and then dividing the data based on a partition key into separated tables/ DBMS nodes which will spread the load to the original server. It can be split based on a range, hashing function etc.

So whats the difference between Sharding and Partitioning?
Partitioning is a generic term used to define the split large data in to smaller chuncks based on different physical entities for the performance. The difference is that sharding (type of horizontal partitioning) implies that data is spread across different DBMS systems and regions while partitioning is all spliting data in same DBMS.

Redis (REmote DIctionary Server) is an opensource in-memory key-value data store written in C, where it can be used as a database,cache,message broker. These are the main data types available on Redis1;

Redis Data type
  1. Strings - Binary safe, basic data type in Redis. Can store upto 512Mb in one string.
  2. Lists - list of strings, sorted by insertion order. Can push elements to head or tail.
  3. Sets - unordered collection of strings. Can compute union,intersection etc. in very short period
  4. Hashes - Map with string key and value. Best for represet objects.
  5. Streams - Data sturcutre like appends only logs. useuflu for record streaming events in order.

In default, Redis persist data on the memory. But it has two type of disk persistence options;

Redis Disk Persistence
  1. RDB (Redis Database) - creates point-in-time snapshot of your data set at specific intervals
  2. AOF (Append Only File) - writes every write operations received by the server. On startup thiw will be played to reconstruct the original database
User can run Redis in RDB, AOF, no persistance or RDB+AOF configurations.

There are mainly four types of Redis topologies. Standalone,Replicate,Sentinel and Cluster.

Redis Standalone
Redis Standalone
This is the most basic and simple Redis setup which consit only one Redis node.

Pros:

  • Easy to deploy

Cons:

  • Can scale vertically by using bigger hardwares

Redis Replicate
Redis Replicated
In this Redis Replication2 configuration, there is one master and many replication nodes. These replica nodes try to keep a copy of their master data whenever possible and it’s asynchronous and non-blocking. There is no communication between these replica nodes and those are Read-Only.
Replication
“Redis uses by default asynchronous replication, which being low latency and high performance, is the natural replication mode for the vast majority of Redis use cases.”
Pros:

  • Easy to setup
  • Data safety,there’s always a backup of your data
  • Read scalibility as its distributed.

Cons:

  • Writing operations are done only by the master node
  • Need manual operations once the master is down (changing master node or restarting it)
  • Not guarantee the data consistency

Redis Sentinel
Redis Sentinel
Redis Sentinel3 is a distributed system which addresses the problem of resilience. It’s a monitoring system that runs in the background and monitors the Redis instances masters, slaves nodes and alerts if any node fails and recovers the failuer as per the Sentinel’s configuration. Sentinel should always use as a cluster. Sentinal has a built-in system to track and nominate a master once it failed called the Quoram. There should be a minimum of 3 nodes (or more odd count of nodes) to support one failure. There can be one sentinal master with two sentinal replicas which are believed to be in three separate independent systems(or virtual machines).

The Quorum
“The quorum is the number of Sentinels that need to agree about the fact the master is not reachable, in order to really mark the master as failing, and eventually start a failover procedure if possible.”

Pros:

  • High Availability and automatic resilience
  • Built in Redis and easy to setup
  • Service discovery and aware of current Redis topology

Cons:

  • Need to handle nodes seperately
  • Not scalable as all the writes are going to the master node
  • Not guarantee the data consistency as the replications is done in asynchronous
  • Clients needs to support Redis sentinel.

Redis Cluster
Redis Cluster
Redis Cluster4 is a solution for both resilience and scaling. This has a multi-master architecture, where masters share 16K hash buckets within them. Hence the data is automatically partitioned over master nodes and their replicas which gives the data reliability. A cluster must have at least three or more master nodes with one or more replicas for each master. If a master node becomes unavailable due to a failure, a replica of that master node will be promoted to become the new master. In practice, when there are Write-heavy workloads, it is better to keep two replicas per master and increase the master nodes. If there are Read-heavy workloads increase the replicas per master and use smart clients to read between replicas

Pros:

  • High Availability and scalibility
  • Decentralized architecture, data are distributed among the nodes
  • Automatic failuer recovery

Cons:

  • Need minimum of 6 nodes to setup
  • Clients should be aware of the Redis cluster
  • Data is replicated asynchronusly

Redis cluster is providing an automatic way to share the data across multiple Redis nodes. This provides “some degree” of availability during failuers4. Redis cluster uses a form of sharding where every key is a part of a hash slot5. There are 16384 hash slots available in the Redis cluster and when computing the slot Redis uses CRC16 of key modulo 16384.

HASH_SLOT = CRC16(key) mod 16384

Using hash slots, users able to add/remove new master slots to the cluster, all you have to do is move some hash key slots from the 3 master node to new master node with zero down time. Mean time what will happen if one of the master node fails? Lets say Master 2 Node got crashed or instance got stopped. We will loose hash keys 5462-10923 and the data stored in the keys. To over come this issue we uses Redis cluster master-replica model.

In our example we have created 3 masters and 3 slaves which acts as a replica for each master nodes. Ideally these master nodes and its replica are grouped in different instances. for example; we create a instance for Master node 1 with Slave(3), Master(2) with Slave(1 ) and Master(3) with slave(2). If Master(2) failed, its replica, slave(2) node will promoted as the new master(2) and continue to work. But however if both Master(2) and Slave(2) got failed at same time, the cluster will loose all the keys.

Redis Cluster
Redis Cluster (src: intro2libsys.com)

When Redis is in cluster mode each node has two TCP sockets running; the first is a standard Redis protocol for client connection (default 6379) and the second TCP port first Port+10000 (default 16379) is for cluster bus, node-to-node communication.

To create a cluster, we need a few instances of Redis which are running in cluster mode. To setup it, these are the minimum configurations

1
2
3
4
5
port 7000
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 5000
appendonly yes

Setting the cluster-enabled to yes, enables the Redis clustering mode. Each cluster node generates a cluster-config-file` to keep node-related data. For minimal cluster configurations, there should be at least 3 master nodes

In this tutorial, we are going to create a 6-node Redis cluster with 3 masters and 3 slave nodes and assign 3 separate slot ranges for each master as shown in the figure.

Redis Cluster
Redis Cluster
Cluster details
  1. Master Node 1 (M1) - allocated hash slots 0-5460
  2. Master Node 2 (M2) - allocated hash slots 5461-10922
  3. Master Node 3 (M3) - allocated hash slots 10923-16383
  4. Slave Node 1(S1) - replicates M1
  5. Slave Node 2(S2) - replicates M2
  6. Slave Node 3(S3) - replicates M3
  1. First get the latest Redis for the windows zip from here

  2. Create 6 seperate folders for each nodes (M1…S3) and paste the Redis binaries.

  3. Update and uncomment the following fields in Redis.windows.conf for each folder; Change the port for each master and slave. M1-7000,M2-7001,M3-7002,S1-7003,S2-7004,S5-7006. (sample config)

    1
    2
    3
    4
    5
    6
    
              port 7000 #change this on each nodes
              cluster-enabled yes  # uncomment the following
              cluster-config-file nodes.conf
              cluster-node-timeout 5000
              appendonly yes
            

  4. Now open the CMD for each folder location and Redis-server.exe Redis.windows.conf to start the Redis servers. For this I have created a small script which enables to run all the Redis servers at once in a minimized cmd. copy this to the root folder (where all the node folders contains) and execute.

  5. Now we have 6 running Redis instances. All we need is to initiating the cluster. For that you can run the following command;

    1
    2
    
       Redis-cli --cluster create 127.0.0.1:7000 127.0.0.1:7001 127.0.0.1:7002 127.0.0.1:7003 127.0.0.1:7004 127.0.0.1:7005 --cluster-replicas 1 
       
    In here we’re using --cluster-replicas 1 which means for every master created, Redis will create a replica. Redis-cli will show the current setup and you can type YES to proceed and create the cluser. If everything goes correctly you will recieve this message,

[OK] All 16384 slots covered

.The script will looks like this

If you want to create a cluster with a defined master and slaves node, you have to define which will be master nodes and which will be the slaves, so for that, you have to run the command with master IP addresses with --cluster-replicas 0. Then you need to take the node ids of the master and assign slaves for each master node. To retrieve the master node’s node-id, use Redis-cli -c -h 127.0.0.1 -p <PORT> cluster nodes and get the node name with myself,master tag. Then assign the slave for the master nod id as;

1
Redis-cli --cluster add-node 127.0.0.1:7003 127.0.0.1:7000 --cluster-slave --cluster-master-id <MASTER-NODE-ID> 
. I have added an automated script for this, which will get the master nodes and assign them to each slaves.

  1. To connect with a master server, you need to use Redis-cli with the -c to support the redirections between cluster nodes.

    1
    2
    
            Redis-cli -c -h 127.0.0.1 -p 7000
            

  2. Finally you need a script to kill all running Redis-server processes.

So that’s all for this article. See you soon!!!๐Ÿ‘‹

  1. data-types - redis.io
  2. persistence - redis.io
  3. Parallel and distributed databases
  4. FUNDAMENTALS OF Database Systems PDF - chapter 2.5
  5. DBMS Tutorial
  6. Database Types Explained - phoenixnap.com
  7. What Is a Distributed Database? - phoenixnap.com
  8. Distributed Database Concepts - oracle.com
  9. What Redis deployment do you need - octo.com
  10. What Is Sharding? - hazelcast.com
  11. Database Sharding vs. Partitioning: Whatโ€™s the Difference? - singlestore.com

  1. Redis Introduction ↩︎

  2. Redis Replication - Redis.io ↩︎

  3. Redis Sentinel - Redis.io ↩︎

  4. Redis Clustering 101 - Redis.io ↩︎

  5. Redis Cluster data sharding  ↩︎