How to use AWS DMS to migrate Data & capture CDC from source RDS to S3
What is AWS Data Migration Service?
AWS Database Migration Service helps you migrate databases to AWS cloud quickly and securely. While migration the source database remains fully operational, minimizing downtime to applications that rely on the database.
AWS Database Migration Service supports homogeneous migrations such as Oracle to Oracle, as well as heterogeneous migrations between different database platforms, such as Oracle or Microsoft SQL Server to Amazon Aurora. With AWS Database Migration Service, you can continuously replicate your source data with high availability to the destination using CDC (Change Data Capture) feature.
Migration Flow Overview
Source : AWS RDS instance which is configured with MySQL database engine
Destination : AWS S3 bucket where the data will be extracted and which can then feed into a number of destination databases or warehouses on AWS or on On-Premise
Migration Agent : Here the migration agent is an AWS DMS replication instance which is launched in a private subnet which will extract the data from source and will save it to the destination
Steps to perform migration using DMS?
- Create a migration subnet group within the VPC
- Create a DMS replication instance in the Private Subnet
- Edit the Security group of the RDS Source instance
- Create a source endpoint and Test the connection
- Create a target endpoint and Test the connection
- Create a task to perform the one time migration of the data
- Create another task to perform CDC migration from the source
Step 1 : Create a migration subnet group within the VPC
- Open the DMS console in your AWS account and click on the Subnet groups on the left hand side panel and then click on Create subnet group button
- In the Create Replication Subnet group form give a valid name and a description
- Select the VPC in which you want to create the migration subnet group
- Then add subnets from the above selected VPC
- If required you can also add tags while creating the migration subnet group
- Then click on Create Subnet
Step 2 : Create a DMS replication instance in the Private Subnet
- Once the DMS subnet group is created then click on the replication instance on the left panel and click on create replication instance
- Please give a valid name to the replication instance and add a description
- Then select an instance type for the replication instance as per the volume of data which is to be migrated. The speed of migration will depend on the size of the migration instance
- Then select the VPC in which you want to launch the DMS replication instance
- Make sure that un check the Publicly Accessible option for DMS instance as the Source is within the VPC itself
- Then open Advanced security and network configuration and select the replication subnet group which we created in the previous step
- Select the Security group for your replication instance and then select a KMS key for encryption (Optional) of the data
- You can also define a maintenance window for the Replication instance and can also tags to the instance (Optional)
Step 3 : Edit the Security group of the RDS Source instance
Once the Migration replication instance is available you can check the details of the replication instance. As the replication instance is launched in a private subnet it will just have a private IP
As the migration instance needs to connect to the RDS instance to extract the data, we need to add the private IP of the DMS migration instance in the security group of RDS instance with the specific database port
Step 4 : Create a source endpoint and Test the connection
Click on the Endpoints option on the left panel of the console and then click on Create endpoint
- In the Create endpoint form select the option of Source Endpoint as we are creating endpoint for the source
- Check the box of Select RDS DB instance as our source is an RDS instance (Note : If the source is not an RDS instance but a MySQL instance on on-premise or on another cloud GCP or Azure, then you can provide the endpoint information of the DB server manually, without checking this option)
- Provide a Endpoint name and an ARN name for the source endpoint
- Select the Database Engine for your source database
- Then chose the Provide access information manually radio button to enter the details of your source database
- As we have selected the RDS instance above, so majority of the fields will be auto populated (Note : If the source is not a RDS instance then you will have to manually enter all the details of the source DB)
- You can select the KMS key for the encryption for you source endpoint (Optional)
- You can also add tags if necessary (Optional)
- Then open the Test endpoint connection and select VPC in which you want you have created your DMS replication instance and also select the DMS replication instance created in the earlier step
- Then click on Run Test to check whether the source endpoint is able to connect the DMS Migration instance with Source RDS instance
Important Note : If your Run Test Failed, then this means that the Source Endpoint is not able to connect the DMS replication instance to the RDS Database. For that check the Step 3 as Security Group configuration is an important step to make sure that Firewalls are configured properly for the connection
Step 5: Create a target endpoint and Test the connection
Once the source endpoint is created you see the Active status on the Endpoint page of the DMS console. Now to create a target endpoint again click on the Create endpoint button as shown below
- Firstly, select the Target Endpoint option over here as we are now creating an endpoint for the target or the destination which is S3 in our case
- Provide a valid name and an ARN description to the target endpoint
- Select the target engine as S3 (as per our example)
- Then add the IAM role ARN of the IAM role which has access to write on the S3 bucket (Note : This IAM role should be properly created as this will ensure that whether the target endpoint is able to connect to the destination or not)
- Provide the target S3 bucket name where the data is to be migrated by the DMS replication instance
- Provide the exact folders and sub folders in which the data is to written on the target S3 bucket
- You can also add tags if necessary (Optional)
- Then open the Test endpoint connection and select VPC in which you want you have created your DMS replication instance and also select the DMS replication instance created in the earlier step
- Then click on Run Test to check whether the target endpoint is able to connect the DMS Migration instance with Target S3 bucket
Important Note : If your Run Test Failed, then this means that the Target Endpoint is not able to connect the DMS replication instance to the target S3 Bucket. For that check the IAM policies attached to the IAM role whose ARN have been used in the above step. Make sure the the IAM role includes the policy which gives write access to the S3 bucket which we are using as target bucket
Lets have a Recap of what we have done till now…
- We have created a DMS subnet group along with a DMS replication instance which is going to perform the actual data migration
- We have created a Source and Target endpoints with RDS MySQL database as source and an S3 bucket as a target.
- Both the Endpoints have also been tested and both are successful in connecting to their respective targets
Step 6 : Create a task to perform the one time migration of the data
Now lets create a Database Migration Task which will perform a one time migration of the entire data to S3 bucket, for that click on the Create Task in the Database Migration task section on the DMS console
- Provide a valid name to the migration task and the also add a description for it
- Select the DMS replication instance which we created in the Step 2
- Select the Source Endpoint and the Target Endpoint which we created Step 4 and Step 5
- In the Migration type select the option of Migrate Existing Data
- You can keep all the Task settings default and if required then you can enable the Cloudwatch Logs just to debug in case of any failures
- In Table Mapping click on the Add new selection rule button and then select Enter a Schema and add the schema name which you want to migrate (Note : If you want to migrate all the schema then just add modulas (%) symbol)
- Keep all the other configurations default and add tags if required and then click on Create Task
Once click on Create Task it will create the task and once its created it will automatically start the migration task and you can see it in the console
It would take some time to complete Migration task and once it does, you can see the status and the progress on the console as shown below
Once the Migration task is completed the entire data from the MySQL database will be obtained on the target S3 bucket as shown below
DMS automatically creates a folder with the schema name and all the folders inside it with the respective table names. Each table name have some CSV files which contains the data from the MySQL database
Hurray! we have completed a full one time Migration from MySQL to S3. Lets see how we can achieve a CDC in DMS where only the latest data is pushed to S3 whenever there are changes in the source database
Step 7 : Create another task to perform CDC migration from the source
- All the steps remains exactly same which we did in our last step, just one thing which we will have to select in the Migration Type drop down as Replication data changes only which is nothing but CDC
Important Note : When the data source is a MySQL database then for DMS replication instance to capture the changes in the database you need to make changes to the source database configurations as suggested above in the red box
Now as soon as you insert new data into the source database, you will see that the CDC replication migration task shows progress on the DMS console, as shown below
If you check the S3 bucket then you can see that there will be new files which will uploaded to S3 with date and Unix timestamp appended to the file names. Just attaching the screenshot below for reference
Well Done!! Guys. We have completed our entire Migration plan as per the architecture defined at the beginning.
Conclusion
Thus we have successfully completed the migration of the RDS MySQL database to S3 bucket. This would make your life much more easier while working on a Migration use case. In our example we just saw migration from RDS, but practically your source can be any database server hosted in any cloud or on premise network. For this you will just have to check whether DMS supports the database engine for that source. As discussed, AWS DMS can be leveraged for a homogeneous migration as well as for an heterogeneous migration. In case of heterogeneous migration you will have to use AWS SCT (Schema Conversion Tool) which will act as an mediator to handle the schema differences between the source database engine and the target database engine.
I hope this was helpful for Data Engineers and Cloud Engineers working on various migration project on AWS. I am also planning to create multiple blogs in continuation with this where we further leverage AWS Glue to process this data and load it into a target Database/Warehouse. Please stay tuned! Also in case of any questions/doubts about AWS DMS, please feel free to reach out to me.
References
- Supported Sources : https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.html
- Supported Targets : https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.html
- AWS DMS with SCT : https://aws.amazon.com/blogs/database/aws-database-migration-service-and-aws-schema-conversion-tool-now-support-ibm-db2-as-a-source/
- AWS DMS with CDC details : https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Task.CDC.html
- Best Practices During Migration while working with AWS DMS : https://docs.aws.amazon.com/dms/latest/userguide/CHAP_BestPractices.html