Federated Query in Amazon Redshift for Postgres & MySQL databases

Federated Query Architecture — Redshift

Overview

Working of Federated Query

How to achieve Federated Query with MySQL and PostgreSQL databases?

  1. Set up the connectivity from your Amazon Redshift cluster to your Amazon RDS or Aurora PostgreSQL DB/MySQL instance. For this , you need to make sure that your RDS or Aurora DB instance can accept connections from your Amazon Redshift cluster. For that, your Amazon Redshift cluster and Amazon RDS or Aurora instance should be in the same VPC and subnet group. This way, you can add the security group for the Amazon Redshift cluster to the inbound rules of the security group for your RDS or Aurora PostgreSQL DB instance. Please refer to the architecture diagram shown above to understand it in details.
  2. Next, you can set up secrets in AWS Secrets Manager for your RDS and Aurora databases. Then reference the secrets in IAM access policies and roles, so that the Redshift cluster has access to get the database credentials from the secret manager
  3. Attach the IAM role that you previously created to the Amazon Redshift cluster.
  4. Connect to your RDS and Aurora databases (PostgreSQL /MySQL) by using an external schema. You need to create an external schema using the query CREATE EXTERNAL SCHEMA and point to the actual database and table from PostgreSQL or MySQL
  5. Run your SQL queries referencing the external schema that references your RDS/Aurora PostgreSQL/MySQL databases.

How to create a Secret Manager for storing DB credentials?

  1. Sign in to the Secrets Manager console with the account that owns your RDS or Aurora instance.
  2. Choose Store a new secret.
  3. Choose the Credentials for RDS database tile. For User name and Password, enter values for your instance. Confirm or choose a value for Encryption key. Then choose the RDS database that your secret will access.
  4. Enter a name for the secret, continue with the creation steps with the default choices, and then choose Store

Next you need to create IAM policy to access this Secret Manager

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AccessSecret",
"Effect": "Allow",
"Action": [
"secretsmanager:GetResourcePolicy",
"secretsmanager:GetSecretValue",
"secretsmanager:DescribeSecret",
"secretsmanager:ListSecretVersionIds"
],
"Resource": "arn:aws:secretsmanager:us-east-1:123456788012:secret:my-rds-secret-test"
},
{
"Sid": "VisualEditor1",
"Effect": "Allow",
"Action": [
"secretsmanager:GetRandomPassword",
"secretsmanager:ListSecrets"
],
"Resource": "*"
}
]
}
  1. Give the IAM policy a valid name and finish creating it.
  2. Create an IAM role for Redshift — Customizable.
  3. Either attach the IAM policy you just created to an existing IAM role, or create a new IAM role and attach the policy.
  4. On the Trust relationships tab of your IAM role, confirm that the role contains the trust entity redshift.amazonaws.com.
  5. Note the Role ARN you created. This ARN has access to the secret.
  6. Attach this created IAM role to the Redshift cluster so that the cluster can access the Secret manager

Example of Redshift Federated Query

CREATE EXTERNAL SCHEMA test_schema
FROM POSTGRES
DATABASE 'database-1' SCHEMA 'myschema'
URI 'endpoint to aurora hostname'
IAM_ROLE 'arn:aws:iam::123456789111:role/Redshift-SecretsManager-RO'
SECRET_ARN 'arn:aws:secretsmanager:us-east-1:123456789111:secret:federation/test/dataplane-apg-creds-YbVKQw';
CREATE EXTERNAL SCHEMA s3 
FROM DATA CATALOG
DATABASE 'default' REGION 'us-east-1'
IAM_ROLE 'arn:aws:iam::123456789111:role/Redshift-S3';
GRANT USAGE ON SCHEMA s3 TO public;
CREATE VIEW lineitem_all AS
SELECT l_orderkey,l_partkey,l_suppkey,l_linenumber,l_quantity,l_extendedprice,l_discount,l_tax,l_returnflag,l_linestatus,
l_shipdate::date,l_commitdate::date,l_receiptdate::date, l_shipinstruct ,l_shipmode,l_comment
FROM s3.lineitem_1t_part
UNION ALL SELECT * FROM public.lineitem
UNION ALL SELECT * FROM test_schema.lineitem
with no schema binding;

Limitations of Federated Queries

  • Federated queries support read access to external data sources. You can’t write or create database objects in the external data source.
  • In some cases, you might access an Amazon RDS or Aurora database in a different AWS Region than Amazon Redshift. In these cases, you typically incur network latency and billing charges for transferring data across AWS Regions.
  • Federated queries don’t enable access to Amazon Redshift from RDS or Aurora.
  • Federated queries are only available in AWS Regions where both Amazon Redshift and Amazon RDS or Aurora are available.
  • Federated queries to RDS MySQL or Aurora MySQL support transaction isolation at the READ COMMITTED level.

References

  1. Federated Query — https://docs.aws.amazon.com/redshift/latest/dg/federated-overview.html
  2. Federated Query with PostgreSQL — https://docs.aws.amazon.com/redshift/latest/dg/getting-started-federated.html
  3. Federated Query using CFT — https://docs.aws.amazon.com/redshift/latest/dg/getting-started-federated-CF.html
  4. Federated Query with MySQL — https://docs.aws.amazon.com/redshift/latest/dg/getting-started-federated-mysql.html
  5. Data Type Differences between Redshift & PostgreSQL/MySQL DB — https://docs.aws.amazon.com/redshift/latest/dg/federated-data-types.html

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Rajas Walavalkar

Associate Technical Architect at Quantiphi Analytics. AWS & GCP Certified. Worked on ETL, Data Warehouses, Big Data (AWS Glue, Spark), BI & Dashboarding (D&A)