A Simple Redis Cluster on Windows

Namila included in category blog

2022-08-09 2022-08-09 2504 words 12 minutes

/blog/2022-08/a-simple-redis-cluster-on-windows/feature-img.png

Contents

Recently I’ve worked on a project where we enabled Redis caching for our web applications. So for that, I had a chance to develop a custom reactive spring-boot Redis cluster client library (for a Spring webflux project) and integrate it. The problem I faced was the AWS Redis cluster nodes were not available for my office VPN (maybe ssh-ing to the bastion server and trying to connect Redis will work, I haven’t tried it). When testing the new service on the local set was a nightmare as I don’t have the Docker desktop(due to the subscription).

Hence I tried to create a small Redis cluster on windows as I can use this for my local dev testing with the Redis client library (currently, I’m developing on the windows platform). In this article, I will give an introduction to Redis, its topologies, its clustering configurations and some database theories(just for fun 😁 ). You can skip straight to the Lets build a Redis cluster.

Some Database Theories First

Let’s talk about some Database theories. Here I’ll be going to discuss database types, Data fragmentation and replication. So What is a database? according to the definition; “Database is a collection of organized data which is structured and stored electronically on a computer system”. DBMS is used to store a huge amount of data and enables to search and locate the data immediately. Database types can be categorized based on model, locations, processing powers and hostings.

Database types based on model

When talking about db types based on model, there are three types; Relational, No-Sql(Non relational) and object.

1. Relational Database

This is the most oldest and used DB type. Here data is stored in tables as rows and values as an attribute in columns. The SQL language is used to query the data. These DBs are stable and support a range of data types and are ACID compliant. These are mainly used in online transactions, data warehouses. OracleDB, MySQL,Postgres are some popular relational databases.

2. Non-Relational Database (NoSQL)

In non-relational DBs, there are no tables. Data can be stored in flexible data models. Commonly used No-SQL databases are; Documented databases, Key-value stores, wide-column based DBs and graphs DBs. The main features are; support for multiple data structures, flexibility due to non/semi-structured data and scalability. Those types of DBs are mainly used in Real-time systems. Popular NoSQL DBs are; MongoDB, Redis, CouchDB.

Database types based on location

There are different types of database based on physical locations;

1. Centralised Database

All data and information are stored in a single location where it can be accessed from numerous locations. The main features are; Data integrity as data is in single location and reduces redundancy, simplicity and cost-effectiveness. Those are mainly used in onsite school and universities, government organizations etc.

2. Distributed Database

A distributed database is multiple interconnected DBs spread out in different locations connected by a network. As the DBs are all connected end users see them as a single DB. These can scale horizontally by adding multiple nodes which offer greater availability. A centralized distributed database management system (DDBMS) manages all the data as its in one place and allows to synchronize of all data operations among the DBs. The main features are; location isolation, distributed query processing and transaction management.

There are mainly two type of distributed database types;

Homogeneous DDBS - A network of identical databases stored in multiple locations where all share the same OS and DB schemas. These systems are easy to manage and design.
Heterogeneous DDBS - This results when different individual systems have implemented their own DBs to integrate later. They uses different schemas, DDBMs. A communication translation is required to establish communication between different DBMSs.

Database Partitioning (Sharding)

When a database grows, it affects the overall DBMS performance. As a solution for the problem partitioning can be used. Partitioning is a database process that splits the large database into small chunks by separating rows or columns. The forming new tables are called “Partitions” (or shards). By creating smaller tables, queries can scan new tables with less latency as there are fewer data to scan.

Vertical Partitioning
- This is mainly used in SQL DBs to increase performances when a query is retrieving all columns from a table that contains many lengthy text or BLOBs. To reduce the access time, it can be splitted into smaller tables containing a partition key and the huge BLOB/text.
Horizontal Partitioning
- This is replicating the schema of the original table and then dividing the data based on a partition key into separated tables/ DBMS nodes which will spread the load to the original server. It can be split based on a range, hashing function etc.

So whats the difference between Sharding and Partitioning?
Partitioning is a generic term used to define the split large data in to smaller chuncks based on different physical entities for the performance. The difference is that sharding (type of horizontal partitioning) implies that data is spread across different DBMS systems and regions while partitioning is all spliting data in same DBMS.

What is Redis?

Redis (REmote DIctionary Server) is an opensource in-memory key-value data store written in C, where it can be used as a database,cache,message broker. These are the main data types available on Redis¹;

Redis Data type

Strings - Binary safe, basic data type in Redis. Can store upto 512Mb in one string.
Lists - list of strings, sorted by insertion order. Can push elements to head or tail.
Sets - unordered collection of strings. Can compute union,intersection etc. in very short period
Hashes - Map with string key and value. Best for represet objects.
Streams - Data sturcutre like appends only logs. useuflu for record streaming events in order.

In default, Redis persist data on the memory. But it has two type of disk persistence options;

Redis Disk Persistence

RDB (Redis Database) - creates point-in-time snapshot of your data set at specific intervals
AOF (Append Only File) - writes every write operations received by the server. On startup thiw will be played to reconstruct the original database

User can run Redis in RDB, AOF, no persistance or RDB+AOF configurations.

Types of Redis Topologies

There are mainly four types of Redis topologies. Standalone,Replicate,Sentinel and Cluster.

1. Redis Standalone

This is the most basic and simple Redis setup which consit only one Redis node.

Pros:

Easy to deploy

Cons:

Can scale vertically by using bigger hardwares

2. Redis Replicated

In this Redis Replication² configuration, there is one master and many replication nodes. These replica nodes try to keep a copy of their master data whenever possible and it’s asynchronous and non-blocking. There is no communication between these replica nodes and those are Read-Only.

Replication

“Redis uses by default asynchronous replication, which being low latency and high performance, is the natural replication mode for the vast majority of Redis use cases.”

Pros:

Easy to setup
Data safety,there’s always a backup of your data
Read scalibility as its distributed.

Cons:

Writing operations are done only by the master node
Need manual operations once the master is down (changing master node or restarting it)
Not guarantee the data consistency

3. Redis Sentinel

Redis Sentinel³ is a distributed system which addresses the problem of resilience. It’s a monitoring system that runs in the background and monitors the Redis instances masters, slaves nodes and alerts if any node fails and recovers the failuer as per the Sentinel’s configuration. Sentinel should always use as a cluster. Sentinal has a built-in system to track and nominate a master once it failed called the Quoram. There should be a minimum of 3 nodes (or more odd count of nodes) to support one failure. There can be one sentinal master with two sentinal replicas which are believed to be in three separate independent systems(or virtual machines).

The Quorum

“The quorum is the number of Sentinels that need to agree about the fact the master is not reachable, in order to really mark the master as failing, and eventually start a failover procedure if possible.”

Pros:

High Availability and automatic resilience
Built in Redis and easy to setup
Service discovery and aware of current Redis topology

Cons:

Need to handle nodes seperately
Not scalable as all the writes are going to the master node
Not guarantee the data consistency as the replications is done in asynchronous
Clients needs to support Redis sentinel.

4. Redis Cluster

Redis Cluster⁴ is a solution for both resilience and scaling. This has a multi-master architecture, where masters share 16K hash buckets within them. Hence the data is automatically partitioned over master nodes and their replicas which gives the data reliability. A cluster must have at least three or more master nodes with one or more replicas for each master. If a master node becomes unavailable due to a failure, a replica of that master node will be promoted to become the new master. In practice, when there are Write-heavy workloads, it is better to keep two replicas per master and increase the master nodes. If there are Read-heavy workloads increase the replicas per master and use smart clients to read between replicas

Pros:

High Availability and scalibility
Decentralized architecture, data are distributed among the nodes
Automatic failuer recovery

Cons:

Need minimum of 6 nodes to setup
Clients should be aware of the Redis cluster
Data is replicated asynchronusly

Bit about Redis Clustering

Redis cluster is providing an automatic way to share the data across multiple Redis nodes. This provides “some degree” of availability during failuers⁴. Redis cluster uses a form of sharding where every key is a part of a hash slot⁵. There are 16384 hash slots available in the Redis cluster and when computing the slot Redis uses CRC16 of key modulo 16384.

`HASH_SLOT = CRC16(key) mod 16384`

Using hash slots, users able to add/remove new master slots to the cluster, all you have to do is move some hash key slots from the 3 master node to new master node with zero down time. Mean time what will happen if one of the master node fails? Lets say Master 2 Node got crashed or instance got stopped. We will loose hash keys 5462-10923 and the data stored in the keys. To over come this issue we uses Redis cluster master-replica model.

In our example we have created 3 masters and 3 slaves which acts as a replica for each master nodes. Ideally these master nodes and its replica are grouped in different instances. for example; we create a instance for Master node 1 with Slave(3), Master(2) with Slave(1 ) and Master(3) with slave(2). If Master(2) failed, its replica, slave(2) node will promoted as the new master(2) and continue to work. But however if both Master(2) and Slave(2) got failed at same time, the cluster will loose all the keys.

When Redis is in cluster mode each node has two TCP sockets running; the first is a standard Redis protocol for client connection (default 6379) and the second TCP port first Port+10000 (default 16379) is for cluster bus, node-to-node communication.

To create a cluster, we need a few instances of Redis which are running in cluster mode. To setup it, these are the minimum configurations

1
2
3
4
5


port 7000
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 5000
appendonly yes

Setting the cluster-enabled to yes, enables the Redis clustering mode. Each cluster node generates a cluster-config-file` to keep node-related data. For minimal cluster configurations, there should be at least 3 master nodes

Lets Build A Cluster

In this tutorial, we are going to create a 6-node Redis cluster with 3 masters and 3 slave nodes and assign 3 separate slot ranges for each master as shown in the figure.

Cluster details

Master Node 1 (M1) - allocated hash slots 0-5460
Master Node 2 (M2) - allocated hash slots 5461-10922
Master Node 3 (M3) - allocated hash slots 10923-16383
Slave Node 1(S1) - replicates M1
Slave Node 2(S2) - replicates M2
Slave Node 3(S3) - replicates M3

First get the latest Redis for the windows zip from here
Create 6 seperate folders for each nodes (M1…S3) and paste the Redis binaries.

Update and uncomment the following fields in Redis.windows.conf for each folder; Change the port for each master and slave. M1-7000,M2-7001,M3-7002,S1-7003,S2-7004,S5-7006. (sample config)

1
2
3
4
5
6


          port 7000 #change this on each nodes
          cluster-enabled yes  # uncomment the following
          cluster-config-file nodes.conf
          cluster-node-timeout 5000
          appendonly yes
        

Now open the CMD for each folder location and Redis-server.exe Redis.windows.conf to start the Redis servers. For this I have created a small script which enables to run all the Redis servers at once in a minimized cmd. copy this to the root folder (where all the node folders contains) and execute.
Now we have 6 running Redis instances. All we need is to initiating the cluster. For that you can run the following command;
1 2

Redis-cli --cluster create 127.0.0.1:7000 127.0.0.1:7001 127.0.0.1:7002 127.0.0.1:7003 127.0.0.1:7004 127.0.0.1:7005 --cluster-replicas 1
In here we’re using --cluster-replicas 1 which means for every master created, Redis will create a replica. Redis-cli will show the current setup and you can type YES to proceed and create the cluser. If everything goes correctly you will recieve this message,

`[OK] All 16384 slots covered`

.The script will looks like this

If you want to create a cluster with a defined master and slaves node, you have to define which will be master nodes and which will be the slaves, so for that, you have to run the command with master IP addresses with --cluster-replicas 0. Then you need to take the node ids of the master and assign slaves for each master node. To retrieve the master node’s node-id, use Redis-cli -c -h 127.0.0.1 -p <PORT> cluster nodes and get the node name with myself,master tag. Then assign the slave for the master nod id as;

1

Redis-cli --cluster add-node 127.0.0.1:7003 127.0.0.1:7000 --cluster-slave --cluster-master-id <MASTER-NODE-ID>

. I have added an automated script for this, which will get the master nodes and assign them to each slaves.

To connect with a master server, you need to use Redis-cli with the -c to support the redirections between cluster nodes.
1 2

Redis-cli -c -h 127.0.0.1 -p 7000
Finally you need a script to kill all running Redis-server processes.

So that’s all for this article. See you soon!!!👋

Contents

A Simple Redis Cluster on Windows

Some Database Theories First

Database types based on model

1. Relational Database

2. Non-Relational Database (NoSQL)

Database types based on location

1. Centralised Database

2. Distributed Database

Database Partitioning (Sharding)

What is Redis?

Types of Redis Topologies

1. Redis Standalone

2. Redis Replicated

3. Redis Sentinel

4. Redis Cluster

Bit about Redis Clustering

HASH_SLOT = CRC16(key) mod 16384

Lets Build A Cluster

[OK] All 16384 slots covered

References and Read More

`HASH_SLOT = CRC16(key) mod 16384`

`[OK] All 16384 slots covered`