How to use AWS DMS to migrate Data & capture CDC from source RDS to S3

Rajas Walavalkar
10 min readMay 20, 2021

--

What is AWS Data Migration Service?

AWS Database Migration Service helps you migrate databases to AWS cloud quickly and securely. While migration the source database remains fully operational, minimizing downtime to applications that rely on the database.

AWS Database Migration Service supports homogeneous migrations such as Oracle to Oracle, as well as heterogeneous migrations between different database platforms, such as Oracle or Microsoft SQL Server to Amazon Aurora. With AWS Database Migration Service, you can continuously replicate your source data with high availability to the destination using CDC (Change Data Capture) feature.

Migration Flow Overview

AWS DMS Migration Architecture using Source RDS and Destination S3 Bucket

Source : AWS RDS instance which is configured with MySQL database engine

Destination : AWS S3 bucket where the data will be extracted and which can then feed into a number of destination databases or warehouses on AWS or on On-Premise

Migration Agent : Here the migration agent is an AWS DMS replication instance which is launched in a private subnet which will extract the data from source and will save it to the destination

Steps to perform migration using DMS?

  • Create a migration subnet group within the VPC
  • Create a DMS replication instance in the Private Subnet
  • Edit the Security group of the RDS Source instance
  • Create a source endpoint and Test the connection
  • Create a target endpoint and Test the connection
  • Create a task to perform the one time migration of the data
  • Create another task to perform CDC migration from the source

Step 1 : Create a migration subnet group within the VPC

  • Open the DMS console in your AWS account and click on the Subnet groups on the left hand side panel and then click on Create subnet group button
  • In the Create Replication Subnet group form give a valid name and a description
  • Select the VPC in which you want to create the migration subnet group
  • Then add subnets from the above selected VPC
  • If required you can also add tags while creating the migration subnet group
  • Then click on Create Subnet
Create Migration Subnet Group

Step 2 : Create a DMS replication instance in the Private Subnet

  • Once the DMS subnet group is created then click on the replication instance on the left panel and click on create replication instance
  • Please give a valid name to the replication instance and add a description
  • Then select an instance type for the replication instance as per the volume of data which is to be migrated. The speed of migration will depend on the size of the migration instance
  • Then select the VPC in which you want to launch the DMS replication instance
  • Make sure that un check the Publicly Accessible option for DMS instance as the Source is within the VPC itself
  • Then open Advanced security and network configuration and select the replication subnet group which we created in the previous step
  • Select the Security group for your replication instance and then select a KMS key for encryption (Optional) of the data
  • You can also define a maintenance window for the Replication instance and can also tags to the instance (Optional)
Create DMS Replication Instance - Image 1
Create DMS Replication Instance - Image 2

Step 3 : Edit the Security group of the RDS Source instance

Once the Migration replication instance is available you can check the details of the replication instance. As the replication instance is launched in a private subnet it will just have a private IP

As the migration instance needs to connect to the RDS instance to extract the data, we need to add the private IP of the DMS migration instance in the security group of RDS instance with the specific database port

Step 4 : Create a source endpoint and Test the connection

Click on the Endpoints option on the left panel of the console and then click on Create endpoint

  • In the Create endpoint form select the option of Source Endpoint as we are creating endpoint for the source
  • Check the box of Select RDS DB instance as our source is an RDS instance (Note : If the source is not an RDS instance but a MySQL instance on on-premise or on another cloud GCP or Azure, then you can provide the endpoint information of the DB server manually, without checking this option)
  • Provide a Endpoint name and an ARN name for the source endpoint
  • Select the Database Engine for your source database
  • Then chose the Provide access information manually radio button to enter the details of your source database
  • As we have selected the RDS instance above, so majority of the fields will be auto populated (Note : If the source is not a RDS instance then you will have to manually enter all the details of the source DB)
  • You can select the KMS key for the encryption for you source endpoint (Optional)
  • You can also add tags if necessary (Optional)
  • Then open the Test endpoint connection and select VPC in which you want you have created your DMS replication instance and also select the DMS replication instance created in the earlier step
  • Then click on Run Test to check whether the source endpoint is able to connect the DMS Migration instance with Source RDS instance
Important Note : If your Run Test Failed, then this means that the Source Endpoint is not able to connect the DMS replication instance to the RDS Database. For that check the Step 3 as Security Group configuration is an important step to make sure that Firewalls are configured properly for the connection
Create DMS Source Endpoint - Image 1
Create DMS Source Endpoint - Image 2

Step 5: Create a target endpoint and Test the connection

Once the source endpoint is created you see the Active status on the Endpoint page of the DMS console. Now to create a target endpoint again click on the Create endpoint button as shown below

  • Firstly, select the Target Endpoint option over here as we are now creating an endpoint for the target or the destination which is S3 in our case
  • Provide a valid name and an ARN description to the target endpoint
  • Select the target engine as S3 (as per our example)
  • Then add the IAM role ARN of the IAM role which has access to write on the S3 bucket (Note : This IAM role should be properly created as this will ensure that whether the target endpoint is able to connect to the destination or not)
IAM Role ARN which provides access to write to the destination S3 bucket
  • Provide the target S3 bucket name where the data is to be migrated by the DMS replication instance
  • Provide the exact folders and sub folders in which the data is to written on the target S3 bucket
  • You can also add tags if necessary (Optional)
  • Then open the Test endpoint connection and select VPC in which you want you have created your DMS replication instance and also select the DMS replication instance created in the earlier step
  • Then click on Run Test to check whether the target endpoint is able to connect the DMS Migration instance with Target S3 bucket
Important Note : If your Run Test Failed, then this means that the Target Endpoint is not able to connect the DMS replication instance to the target S3 Bucket. For that check the IAM policies attached to the IAM role whose ARN have been used in the above step. Make sure the the IAM role includes the policy which gives write access to the S3 bucket which we are using as target bucket
Create DMS Target Endpoint - Image 1
Create DMS Target Endpoint - Image 2
Create DMS Target Endpoint - Image 3

Lets have a Recap of what we have done till now…

  • We have created a DMS subnet group along with a DMS replication instance which is going to perform the actual data migration
  • We have created a Source and Target endpoints with RDS MySQL database as source and an S3 bucket as a target.
  • Both the Endpoints have also been tested and both are successful in connecting to their respective targets

Step 6 : Create a task to perform the one time migration of the data

Now lets create a Database Migration Task which will perform a one time migration of the entire data to S3 bucket, for that click on the Create Task in the Database Migration task section on the DMS console

  • Provide a valid name to the migration task and the also add a description for it
  • Select the DMS replication instance which we created in the Step 2
  • Select the Source Endpoint and the Target Endpoint which we created Step 4 and Step 5
  • In the Migration type select the option of Migrate Existing Data
  • You can keep all the Task settings default and if required then you can enable the Cloudwatch Logs just to debug in case of any failures
  • In Table Mapping click on the Add new selection rule button and then select Enter a Schema and add the schema name which you want to migrate (Note : If you want to migrate all the schema then just add modulas (%) symbol)
  • Keep all the other configurations default and add tags if required and then click on Create Task
Create DMS Migration Task - Image 1
Create DMS Migration Task - Image 2
Create DMS Migration Task - Image 3
Create DMS Migration Task - Image 4

Once click on Create Task it will create the task and once its created it will automatically start the migration task and you can see it in the console

It would take some time to complete Migration task and once it does, you can see the status and the progress on the console as shown below

Once the Migration task is completed the entire data from the MySQL database will be obtained on the target S3 bucket as shown below

DMS automatically creates a folder with the schema name and all the folders inside it with the respective table names. Each table name have some CSV files which contains the data from the MySQL database

Hurray! we have completed a full one time Migration from MySQL to S3. Lets see how we can achieve a CDC in DMS where only the latest data is pushed to S3 whenever there are changes in the source database

Step 7 : Create another task to perform CDC migration from the source

  • All the steps remains exactly same which we did in our last step, just one thing which we will have to select in the Migration Type drop down as Replication data changes only which is nothing but CDC
Create DMS CDC Migration Task - Image 1
Important Note : When the data source is a MySQL database then for DMS replication instance to capture the changes in the database you need to make changes to the source database configurations as suggested above in the red box

Now as soon as you insert new data into the source database, you will see that the CDC replication migration task shows progress on the DMS console, as shown below

If you check the S3 bucket then you can see that there will be new files which will uploaded to S3 with date and Unix timestamp appended to the file names. Just attaching the screenshot below for reference

Well Done!! Guys. We have completed our entire Migration plan as per the architecture defined at the beginning.

Conclusion

Thus we have successfully completed the migration of the RDS MySQL database to S3 bucket. This would make your life much more easier while working on a Migration use case. In our example we just saw migration from RDS, but practically your source can be any database server hosted in any cloud or on premise network. For this you will just have to check whether DMS supports the database engine for that source. As discussed, AWS DMS can be leveraged for a homogeneous migration as well as for an heterogeneous migration. In case of heterogeneous migration you will have to use AWS SCT (Schema Conversion Tool) which will act as an mediator to handle the schema differences between the source database engine and the target database engine.

I hope this was helpful for Data Engineers and Cloud Engineers working on various migration project on AWS. I am also planning to create multiple blogs in continuation with this where we further leverage AWS Glue to process this data and load it into a target Database/Warehouse. Please stay tuned! Also in case of any questions/doubts about AWS DMS, please feel free to reach out to me.

By Rajas Walavalkar

--

--

Rajas Walavalkar

Technical Architect - Data & AWS Ambassador at Quantiphi Analytics. Worked on ETL, Data Warehouses, Big Data (AWS Glue, Spark), BI & Dashboarding (D&A).