By: Rajendra Gupta | Updated: 2023-10-06 | Comments (1) | Related: > Amazon AWS
Problem
In the previous tips, An Introduction to Amazon Athena and Working with the Amazon Athena, we explored the use cases of Amazon Athena for querying data stored in the S3 bucket using standard SQL statements. SQL Server enables accessing external data (Oracle, MySQL, etc.) via linked servers. Can we use Linked Servers for Amazon Athena databases as well?
Let's explore leveraging Linked Servers to access Amazon Athena tables.
Solution
Data is spread across multiple locations and formats; integrating data is essential for any organization. Usually, we import data into database tables before querying it from SQL Server. The data import can use the SQL Server integration package (SSIS), Import\Export wizard, and BCP. You can directly query data stored in S3 bucket storage using Amazon Athena. This data can be in the following formats: Apache Parquet (.parquet), Apache ORC (.orc), CSV (Comma-Separated values), and JSON (JavaScript object notation).
Now, think of a solution: you receive the data from various sources regularly in different formats. These files are stored in the S3 bucket. Do you need to download files from the S3 bucket, import them into SQL Server tables, and query them? No, you can use Athena to query data from the files stored in the S3 bucket. Additionally, SQL Server can also connect to Athena using Linked Servers. You can directly use a SQL query for your data stored in the S3 bucket from the SQL Server using the Amazon Athena Linked Server.
ODBC Driver for Amazon Athena
To query Athena tables from SQL Server Linked Server, download the Amazon Athena ODBC Driver. It provides a standard ODBC driver interface to query the Amazon Athena interactive query services. It supports data aggregations and complex joins with compressive ANSI SQL-92 standard. There are two Amazon Athena ODBC drivers available for Windows, Linux, and macOS:
- ODBC 2.x: supports only Windows.
- ODBC 1.x: supports Windows, Linux, and macOS.
For this tip, we will use the ODBC 2.x driver. Note the following requirements before using the ODBC 2.x driver:
- Athena uses port 444 for streaming query results. Therefore, you must keep this port open for outbound traffic.
- You can use the Athena:GetQueryResultsStream policy in the IAM user to support the ODBC driver. This role supports the ODBC driver for streaming results.
Click the Windows download link (Windows 64-bit ODBC driver 2.0.1.0).
Launch the Amazon Athena ODBC driver setup wizard, as shown below.
Click Next to review the license agreement and accept it.
Choose the installation directory. Usually, it is good to install the ODBC driver on the OS drive.
Click Install to begin the installation.
The Amazon Athena ODBC driver is installed successfully, as shown below.
ODBC Configuration
Launch ODBC 64-bit from the start menu and click System DSN -> Add.
It opens a window to create a new data source. Choose Amazon Athena ODBC (x64) version 2.00.01.00, as shown below.
Click Finish and fill out the following information in the Amazon Athena ODBC configuration:
- Data Source Name: Enter a data source name of your choice.
- Region: Choose the AWS region for your AWS Athena database. You must use the correct region, or it won't connect to the Athena database. You can verify the region from the top-right corner of the web page, as shown below.
- Catalog: Specify the Amazon Athena Data source name you created in the previous tip. As shown below, my catalog name is AwsDataCatalog.
- Database: If you have a single database in Athena, you can use the default option or specify the database for which you create the ODBC.
- S3 Output Locations: It requires a S3 directory to store the query results. You can specify the same S3 bucket that we used in the previous tip. The directory format is s3://[bucket]/folder.
- Authentication Options: Choose IAM credentials as Auth schema from the drop-down menu. It asks you to enter the IAM user's AWS Access and Secret key. The IAM credentials must have access to the S3 bucket and Athena. Get the keys from the AWS IAM service and put them into the ODBC configuration page.
Click Test Connection to see if the ODBC connection to Amazon Athena is successful. It gives the message "Successfully connected to Athena Engine version 3" for a successful connection.
Create a SQL Server Linked Server to Amazon Athena
Once we have the ODBC connection setup from the SQL Server machine to the Amazon Athena database, we can use the ODBC to create the Linked Server in SQL Server.
To create the Linked Server, we can run the following T-SQL using the stored procedures sp_addlinkedserver and sp_addlinkedserverlogin.
In the below query, specify the ODBC data source name in the @server, @datasrc, and @rmtsvrname parameters. You need to put the AWS access key in the @rmtuser and the secret key in the @rmtpassword parameter of the sp_addlinkedsvrlogin stored procedure.
EXEC master.dbo.sp_addlinkedserver @server =N'AthenaQuery', @srvproduct=N'', @provider=N'MSDASQL', @datasrc=N'AthenaQuery' GO EXEC master.dbo.sp_addlinkedsrvlogin @rmtsrvname=N'AthenaQuery', @useself=N'False', @locallogin=NULL, @rmtuser=N'AKIASEPWAFS6UVVVXMP7', @rmtpassword='ON+T6J4/Zo92uCvZ/q4mZ1x6PUG9yVGxanFHwWQ9' GO
Executing the query creates a Linked Server from the local SQL Server to Amazon Athena. You can verify the Linked Server by connecting to SQL Server in SSMS and expanding Server Objects -> Linked Servers.
Right-click on the Linked Server and Test Connection to verify the Linked Server can connect to the Amazon Athena database.
Expand the Linked Server Catalogs to view all tables in your Amazon Athena setup. You can query these tables as a regular Linked Server table.
Next Steps
- Read AWS Athena documentation on the AWS blog
- Read more AWS tips on MSSQLTips.
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips
Article Last Updated: 2023-10-06