By: John Miner | Updated: 2023-09-08 | Comments | Related: > Google Cloud
Problem
Most organizations have a variety of legacy databases that reside on-premises. Many companies are moving these applications and databases into the cloud from their aging data centers. Currently, there are three leading cloud vendors: Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure Cloud (Azure).
Our company has selected Google as our service provider. How can we deploy and secure a SQL Server database instance as a managed service in GCP?
Solution
GCP has a managed service called Cloud SQL. When deploying this service, we can choose one of the database engines in the image below for our project. In this case, we want a Microsoft SQL Server database. We will use a virtual private cloud (VPC) to create a network and deploy both the Cloud SQL and Cloud Compute into that network. The details behind these tasks will be shared in this article. Since it is a managed service, not all database features are available. For details, check out Google Cloud SQL for SQL Server.
Business Problem
Our manager has asked us to investigate how to deploy a Cloud SQL Database for SQL Server into a Virtual Private Cloud. This adventure will include designing a network and deploying the managed service into that network. Since the work is secure by default, we must deploy a jump server into the network to communicate with the SQL Server Instance. Additionally, we will compare timings between GCP and Azure. Finally, we will repeat the test with a public-facing SQL Server and see if any hackers are out there.
Here is a list of tasks that we need to investigate and solve:
Task Id | Description |
---|---|
1 | Create Virtual Private Cloud (VPC) |
2 | Deploy Cloud Compute Engine |
3 | Deploy Cloud SQL Database |
4 | Investigate Cloud SQL Database |
5 | Configure Virtual Machine |
6 | Execute Test Case for Timings |
7 | Non-Private Managed Service |
At the end of the research, we will understand how to manage Cloud SQL Database on GCP.
Task 1: Create Virtual Private Cloud (VPC)
The Google command line interface (CLI), or gcloud, is how cloud engineers deploy objects repeatedly by executing scripts with the same results. The first step is to create a Virtual Private Cloud (VPC). Log into the Google Cloud Platform. The image below shows the default VPC that is part of all subscriptions. The lambda icon in the top right corner is how you launch the cloud shell, allowing an engineer to enter CLI commands.
Knowing that the cloud shell is a Linux terminal session, we can use the "cat" command to determine the exact version of the operating system. The image below shows that the cloud shell uses Debian Linux version 11.
The gcloud script below creates a virtual private network named tipsgcp4vpc, which uses an auto subnet mode. Please see the gcloud compute networks documentation for more information.
# # 1 - create network # gcloud compute networks create tipsgcp4vpc --subnet-mode=auto --bgp-routing-mode=regional --mtu=1460
We can see the output of the command by refreshing the VPC networks.
The auto subnets mode allows the new network to connect to any of the data centers supported by Google. Below, the us-central1 data center has a non-routable IP range of 10.128.0.0/20.
The following command lists the networks in our project named (jminer-sandbox):
gcloud compute networks list
We can see a default network and the custom one we just deployed.
Lastly, we want to add a firewall rule so that all Internet Protocol (IP) addresses will allow access to port 3389 for the remote desktop protocol (RDP). We can use the gcloud compute firewall-rules command to accomplish this.
# # 2 - add firewall rules # gcloud compute firewall-rules create tipsgcp-allow-rdp --project=jminer-sandbox --direction=INGRESS --priority=65534 --network=tipsgcp4vpc --action=ALLOW --rules=tcp:3389
The image below shows that the new firewall rule, tipsgcp-allow-rdp, has been added to the tipsgcp4vpc virtual network.
Deploying a network using the command line interface is easy. Our next step is to deploy a virtual machine, a jump server, for the private network.
Task 2: Deploy Cloud Compute Engine
The Google command line (CLI) interface contains commands to deploy computing resources. Please refer to the gcloud compute instances set of commands for details.
# # 3 - create virtual machine # gcloud compute instances create tipsgcp4wvm --network tipsgcp4vpc --zone us-central1-a --image-project windows-cloud --image-family windows-2022 --boot-disk-size 128GB --boot-disk-type pd-ssd
The above script creates a Windows 2022 data center server in the US Central1 region with a solid-state drive that can store 128 GB of data. The name of the virtual machine is called tipsgcp4wvm. The following command lists the virtual machines in our project named (jminer-sandbox):
gcloud compute instances list
The output shows both the internal and external IP addresses associated with our virtual machine.
Let us look at the properties of the Windows virtual machine. The instances tab under cloud computing lists all deployed virtual machines. Google supports both Windows and Linux operating systems. Double-click the hyperlink of the computer name to find out the details.
The details page is a great place to start. We can create a local user and password using the "SET WINDOWS PASSWORD” button. Additionally, we can download the RDP connection file to our laptop. Just remember that we did not reserve a public IP for this service. Thus, the address will change every time we shut down and turn on the virtual machine.
One thing not completed in this process was adding tagging to the deployed virtual machine (cloud service). This is very important for large organizations in which operational costs are charged back to the Business Line from the IT organization.
We can look at the machine, network, and firewall configurations if we keep scrolling down in Details.
The following sections go over the network interfaces and boot/local storage. Again, this internal protocol information for both private and public IPs was obtainable for the summary list. It is important to note that we are using the private network we created in Task 1.
The last section covers Shielded VM, SSH keys, and API/Identity management. If you use a Linux Virtual Machine, the SSH keys are a great way to set up a terminal session. Again, this information is only necessary if it applies to your deployment.
The most important part of any deployment is testing. While we know the virtual machine is up and running, we have not tested connectivity. We will circle back to this topic when configuring the VM with some tools.
Task 3: Deploy Cloud SQL Database
Of course, this command line interface code will deploy a cloud SQL instance. In our case, we want to create a SQL Server 2019 Standard database engine. Parameters such as project, zone, network, CPU, memory, and no public IP assignment are important. Please refer to the gcloud sql instances documentation for more information.
# # 4 - create sql server instance # gcloud sql instances create tipsgcp4sqlsvr --project=jminer-sandbox --network=projects/jminer-sandbox/global/networks/tipsgcp4vpc --no-assign-ip --database-version=SQLSERVER_2019_STANDARD --cpu=4 --memory=26GB --zone=us-central1-a --root-password=<your password here>
After deployment, we can even list the system databases from the cloud shell using the code below.
gcloud sql databases list
The above image shows the master, model, msdb, and tempdb system databases. I deployed a user database named dbs_tips2023. There are many more options when deploying your SQL Server instance. The following four images review settings available when manually deploying from the Google Portal.
If your company is replicating Active Directory (AD) Domain Controller settings to the cloud, you can point the deployment drop-down box to a local domain existing in Google. This will allow you to manage permissions using AD groups and users.
Backups are always great to have when something goes wrong. The point-in-time recovery setting is only available for Enterprise SKU. Both the Standard and Express SKU are available for simpler workloads.
While the end users think the database service is up 24/7, the reality is that maintenance does happen. You can specify the day and time that Google will patch your server. You can also deny Google from touching your service if there is something really important scheduled. For instance, an online store that does 90 percent of its business in December might not want any maintenance that month.
Lastly, you can specify server settings such as featured flags, default collation, and time zone settings. Many companies have audit requirements. This auditing feature can be enabled during deployment.
Now that we have gone over manual deployment settings, you can research the gcloud sql instances commands to add additional features to your deployment.
Task 4: Investigate Cloud SQL Database
Let us look at the pages (panels) in the Google Console that manage and configure your database. The image below shows the Overview page in which the private IP for our SQL Server instance is set to 172.17.144.3. Unlike Azure SQL Database, there is no public C-Name on the Internet to connect to. That is great since we want a secure, local database accessible by our VPC called tipsgcp4vpc.
The connections page lets us look at networking, security, and connectivity. We can see from the image below that public connectivity for the service is currently disabled. Also, enforcing SSL connections is not enabled. I suggest that any database that has confidential information should require an SSL connection.
The users page allows you to create new users and reset an existing user's password. The sqlserver account is an elevated account when using standard security. This account can create databases and users and assign database roles. Please see the documentation for more details.
After deploying the Cloud SQL Service, one of the first tasks is to give this account a password.
The databases page allows the cloud administrator to create a new (empty database) or delete an existing (full database). In short, this is not too exciting since these actions can be performed by many different tools.
The backups page allows you to set up automated backups or create a backup right now.
The same manual configuration settings can be added after deployment. Just edit the automated backups.
The Operations page is just a list of major operations that have taken place with the database server. The creation of the instance will be listed as the first log entry.
Now, we will go back to the Overview page and focus on the command buttons at the top.
The import action allows you to either restore a database from a backup file or import data from a sequence of SQL commands in a text file. The first action requires the database not to exist, and the second action is dependent upon an existing database. Both actions require a file stored in a Google Storage Container. Exporting the database to SQL commands is not supported by the Express edition. I will leave testing exporting to the SQL format as an exercise for the reader.
The export action allows you to either take a backup of the database or export the data as a set of SQL statements. Again, I leave testing of the SQL export to the reader. The BAK export creates a normal backup that can be restored on any SQL Server database that has a compatible version.
I think of Google Cloud SQL as a service that separates storage and compute. Thus, we can restart and stop the instance. The image below shows a stop database action.
There is always a need to remove unwanted databases. The delete action does just that.
What if we want to separate our testing effort into two teams? Team A has the prime numbers database schema. Team B needs a copy of the database. The clone action allows the cloud engineer to create a copy of an existing database.
If I am developing modern data platforms on the GCP, I will use SQL Server as my relational database. I would review what is supported and not supported by the service. See the features documentation. For instance, extended events are not supported by the service. If this is a deal breaker, manage the database using Infrastructure as a Service.
Task 5: Configure Virtual Machine
Create a local user and download the RDP file. The image below shows an RDP session with the public IP address (35.222.248.43) and username (jminer). Make sure you fully qualify the username by prefixing it with the machine name (tipsgcp4wvm).
Many developers use Azure Data Studio; however, I am an old-school database administrator who likes to use SQL Server Management Studio. Please download your favorite tool and configure it for the Cloud SQL database private IP.
The performance test script executes a batch file with 20 separate command shells using the sqlcmd utility. Please download and install this utility for our testing.
The image below shows a connection to our Cloud SQL database using SSMS.
Spoiler alert: I will compare and contrast Azure Data Factory and Google Data Fusion in a future article. I will be creating a stocks database shown in the image above. Google loves open-source projects, and Linux is one of the favorite operating systems out there. It is not surprising that the Cloud SQL database offering is on a Linux OS. I arrived at this conclusion by looking at the file path for data and log files in the above image. They are in a Linux file structure format.
Task 6: Execute Test Case for Timings
I have used the prime number database for calculating numbers using a brute force algorithm. This is a fantastic way to compare how the CPU competes between platforms since small amounts of data are written to disk. The image below shows the main table named TBL_PRIMES. The table named TBL_CONTROL_CARD assigns work out to 20 different command shells. Each shell searches for the next block of 250,000 numbers using the SP_STORE_PRIMES stored procedure. If we do the math, upon completion, the command shells will find all the prime numbers from 2 to 5,000,000.
The three files are part of the testing suite. The run command file calls the calculate-prime-numbers SQL script using the sqlcmd utility. The calc-processing-time SQL script is used to calculate the total processing time using the results table.
We can see that the process found 348,513 prime numbers in 478 seconds.
I have some timings from my laptop and various Azure articles I did in the past. I will compare all data I have using four cores as the baseline since this test is CPU intensive.
Test No | Platform | Description | Configuration | Compute | Cores | Time |
---|---|---|---|---|---|---|
1 | Cloud SQL | SQL 2019 - STD | ? | 4 | 478 | |
2 | Azure | Provisioned Business Critical | SQL DB | GEN5 | 4 | 462 |
3 | Azure | Provisioned General Purpose | SQL DB | GEN5 | 4 | 426 |
4 | Azure | Serverless Compute | SQL DB | GEN5 | 4 | 437 |
5 | Azure | Provisioned Managed Instance | SQL DB | GEN5 | 4 | 387 |
6 | Hardware | Local Laptop | SQL 2016 - DEV | GEN6 | 4 | 206 |
I did not include any memory-optimized table timings since I could not determine if the feature is available in the Google service. Those historical performance timings for memory-optimized tables are blinding fast, less than 10 seconds. I am not surprised that my timing from my old laptop using SQL Server 2016 ranked first at 206 seconds. I was surprised that Cloud SQL ranked last at 478 seconds.
Task 7: Non-Private Managed Service
Returning to our hypothetical business case, our company has a new staff member. Their task was to give the Cloud SQL instance named sql4jfm2023 a public IP and add my local laptop as an authorized network. However, they opened the firewall during testing to allow all IPs (0.0.0.0/0) to access the service. How can we fix this mistake?
If you decide to use a public IP in the future, add your external IP to the authorized networks list. You can even mix and match cloud providers. The IP listed below is from a VM running in the Azure Cloud. This patch command reduces the number of allowed IPs to one address.
# # 5 - patch sql server instance (overwrites current setting) # gcloud sql instances patch sql4jfm2023 --authorized-networks=172.176.206.91
This mistake was in production for six hours. Let us see if any hackers were trying to access the database. The logs explorer in Google Cloud allows us to filter by service. We can see that our Cloud SQL server has 22,000 login attempts. We can see both the source address of the attempted login and the username that the hacker tried to log in as.
We can take one of the addresses in the log entry and do a reverse IP lookup. We can see that the attack came from a data center in Amsterdam, Netherlands. In short, do not validate all IP ranges as authorized networks.
Did the hacker get into my database of historical stock data downloaded from Yahoo finances? The answer is no. I used a strong password. Many more attempts would have been needed to crack the password.
Summary
The Google Cloud Platform has a managed service that allows companies to migrate databases created with MySQL, PostgreSQL, and SQL Server to the cloud. The Cloud SQL database service can be deployed with both a private and public IP. The private IP requires services to be deployed in the same Virtual Private Cloud if you do not want to play around with networking. In our example, we deployed a Virtual Machine as a Jump Server so that we could manage the database instance. If a public IP is used, authorized network addresses are required for access.
Today, we focused on the Microsoft SQL Server offering of Cloud SQL from Google. The service separates storage from computing. This means we can use CLI scripting to start and stop the service. Like any Platform As A Service (PaaS) offering, the Google service manages software patching and automated backups. The best way to migrate from on-premises to the GCP cloud is to take a backup on-premises and restore the backup in the cloud. Features such as high availability were not explored in this article but do exist in the service. In short, the Cloud SQL database is a great managed service for legacy applications moving to the GCP cloud.
What the Cloud SQL service does not support is the scheduling of T-SQL for index maintenance. There is no SQL Agent available with the cloud service. However, there are Cloud Functions that support the whole Python language and a set of community libraries. A developer can use a library to connect to Cloud SQL and execute ad-hoc TSQL statements. The HTTP endpoint exposed by the Cloud Function can be called by the Cloud Scheduler on a predefined schedule. It is not surprising to find out that the scheduling service is just an enterprise-grade CRON job service.
To create a modern data platform application, one must leverage components such as storage containers, secret vaults, extract-translate-load (ETL) services, and cloud SQL databases. These cloud components might use a Virtual Private Cloud not connected to your Google project. Therefore, network pairing and Cloud SQL proxy software ensure one network can talk to another. Google offers the following native services to create ETL processes: Cloud Dataflow - Apache Beam, Cloud Dataproc - Apache Spark, and Cloud Data Fusion – Open Source CDAT. These services will be explored in future articles.
Next Steps
- Enclosed are the artifacts to start your journey with Google Cloud Platform and Cloud SQL Database: CLI snippets and TSQL code.
- Take some time to compare and contrast Azure Data Factory and Google Data Fusion
- Take some time to compare and contrast Azure Synapse Spark and Google Dataproc
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips
Article Last Updated: 2023-09-08