Dawaai is an online pharmacy that is moving towards a goal of one-stop platform to ease the consumption of medicines for b2c and b2b customers along with the tele-consulting. Dawaai consists of a customer facing platform and an internal operations portal comprising multiple web and mobile applications.
Dawaai’s operation portal is a monolith with a bunch of cron jobs performing scheduled tasks. As our system grew and new features came in, it became difficult for us to manage and scale a large monolith. So, we decided to upgrade this monolith to micro service based architecture which could be easily maintained, scaled and deployed.
First thing we do is evaluate our new feature requests and decide whether it should be broken to an individual service or should it be a part of the same monolith? What impacts could it have? Ideal technology to implement it? Once we are clear on the above, it’s time to rock
One of the use cases we had was integrating an ERP with our system in which we had to continuously send data from one system to another.
A perfect candidate for our first event driven microservice!!
We decided to use CDC (change data capture) pattern to capture data from mysql database and send it to ERP
CDC and Debezium
Change Data Capture is a software design pattern which records inserts, updates and deletes applied to database tables, and makes a record available of what changed, where, and when.
Debezium is an open source project that provides a low latency data streaming platform for change data capture (CDC).
It has a Kafka source connector that streams the changes from database to kafka topics. It supports MySQL, MongoDB, PostgreSQL and Oracle.
The flow starts when relevant data is updated on our system, debezium reads that change and produces data on appropriate kafka topics. Those topics are then consumed by a kafka consumer service that posts data to the ERP. As shown in figure 1-A.
Similarly, ERP webhook produce data on kafka topics using the kafka producer service which is then consumed by the kafka consumer service and sent to our system. As shown in figure 1-B.
Challenges and How We Tackled Them
One of the challenges we faced was to make debezium tolerant to machine failures. If Debezium fails, then the Kafka connect framework fails the connector over to another machine in the cluster. When the failover occurs, we need to avoid the data loss that happened till the connector was restarted.
Following were the key steps to avoid data loss:
- Set snapshot.mode property of debezium connector to initial:
It ensures the connector runs a snapshot only when no offsets have been recorded for the logical server name and read change events from the binlog
- Set binlog retention hour in mysql
By default binlog retention hour property is null which can be harmful if binlog is flushed immediately and not found by the connector after a restart, resulting in data loss till the connector was down. So we set it to appropriate time in which our connector, if down , is restarted and finds the last binlog it remember
- Cron Job
A cron job is setup which after an interval, checks the connector health and restarts it, if connector was stopped
- A retrying strategy for the events that are consumed but didn’t completed the job
- Better ways to avoid debezium connector failure
As we have decided to move towards microservices and more towards event driven systems, CDC would play an important role towards this transition. It’s been some time since we have been using debezium with kafka and have implemented a number of use cases that are running fine. However we are still critically monitoring it and are looking to implement more stuff around it.
We plan to post more about our use cases of CDC. Please stay tuned.