Introduction:
In today’s data-driven world, maintaining synchronization between different databases and search engines is crucial for seamless operations and efficient data retrieval. At our organization, we have implemented a robust system to ensure data synchronization between MySQL, Manticore Search, and application. In this blog post, we will delve into the details of our synchronization process and explain how we guarantee consistent and up-to-date data across these platforms.
Initiating the Synchronization Process:
To kickstart the synchronization process, we have devised a powerful job queue system using a Golang application running as a Kubernetes job. This job is specifically designed to handle the initial data synchronization from MySQL and Cassandra to Manticore Search. Once this synchronization job is completed, we perform an additional sync from the NATS stream to capture any missed events during the initial synchronization window. These steps ensure that Manticore Search is fully equipped to serve search requests promptly and accurately.
Ongoing Updates and Event Handling:
For continuous updates, inserts, and deletes, we rely on the reliability and efficiency of NATS Jetstream, a pub/sub messaging system. Leveraging a durable consumer with a timestamp sequence, we extract events from the NATS Jetstream stream. Whenever our application undergoes any mutations, such as updates or inserts, the corresponding events are detected and sent to the NATS Jetstream stream. To enhance performance, we intelligently batch these events and sync them with Manticore Search at regular intervals, optimizing both speed and efficiency.
Ensuring Data Integrity and Idempotency:
One of the key aspects of our synchronization process is the preservation of data integrity. To achieve this, we employ a NATS durable consumer that guarantees only a single consumer receives the events. By eliminating the possibility of duplicate event processing, we ensure the consistency and accuracy of the synchronized data. In addition, we incorporate snowflake IDs for idempotency. These unique and distributed IDs enable us to track and identify processed events accurately. Regardless of how many times an event is received, it will be handled only once, thereby maintaining data integrity across the system.
Processing Events and Schema Compatibility:
Once the events are received by our Golang application, connected to the NATS Jetstream stream, they are processed accordingly. Our application is responsible for writing the updated data to Manticore Search, ensuring that Manticore remains synchronized with the changes happening in our application. To ensure compatibility and consistency, we preprocess the data within our Golang application, converting it to the desired Manticore schema format. This step is particularly essential during updates or changes in the schema, guaranteeing that the data remains synchronized and compatible with the Manticore Search engine.
Efficient Communication and Data Transfer:
To facilitate efficient communication and data transfer, we rely on Protobuf encodings for events within the NATS messaging system. This choice allows us to achieve standardized and efficient data exchange between different components of our system. Furthermore, we store the events themselves for a duration of 7 days, providing a time window for consumption and analysis by various applications, including Manticore, vector databases, and machine learning algorithms. This approach ensures that the data remains accessible and actionable across different components of our system, empowering various applications to consume the data efficiently according to their specific requirements.
Conclusion:
In today’s fast-paced and data-centric environment, seamless data synchronization is paramount for the efficient operation of applications. Our robust synchronization process between MySQL, Manticore Search, and our application ensures consistent and up-to-date data across these platforms. Through a combination of a job queue system, NATS Jetstream, snowflake IDs, and efficient event processing, we guarantee data integrity, idempotency, and compatibility. By adopting this approach, we empower different applications within our system to consume data efficiently and make informed decisions. This synchronization process forms the backbone of our data infrastructure, enabling us to deliver exceptional performance and reliability to our users.
If you have any further questions or would like to explore our synchronization process in more detail, please feel free to reach out. We’re here to help.