By: Ian Fogelman | Updated: 2020-12-09 | Comments | Related: More > Snowflake
Problem
You have just implemented a mass exodus from your on-premises data warehouse to Snowflake. Now that the schema and data portions of your migration are completed, you have one final migration type and this is the workflows that operated in your previous environment (think SQL Server Agent jobs in SQL Server terms). These workflows are largely batch processing loads from different data sources on a particular schedule. The problem we will be looking at today is how to implement these workflows in the new Snowflake environment.
Solution
For the solution to this problem, we will be covering Snowflake tasks. Snowflake tasks are schedulable scripts that are run inside of your Snowflake environment. This tip will cover:
- Introduction to Snowflake
- High level concepts of a task
- Building a simple task
- Turning a task on and off
- Adding descending tasks onto an existing task
Requirements:
- A free Snowflake instance.
Introduction to Snowflake
Snowflake is the first cloud first data warehouse solution, the service is built on top of the customer's choice of cloud provider (AWS, Azure or GCP) infrastructure.
Snowflake SQL is ANSI compliant and supports common analytic and windowing functions. You will definitely notice some unique syntax in Snowflake, but you will also notice some similarities coming from a heavy SQL Server background the transition has been reasonable for me.
In terms of what the IDE is for Snowflake it is entirely web based. To interact with your instance, you will navigate to your unique URL for example XXXXXXXX.us-east-1.snowflakecomputing.com. Once logged in you will be in the main web GUI, this is the main IDE and can start interacting with your data assets. Also, for reference each query tab in the Snowflake console is referred to as a "worksheet". These "worksheets" are automatically saved and can be reviewed at any time, which is equivalent to a tab history feature.
What is a Snowflake task?
The Snowflake documentation defines tasks as having the ability to execute single SQL statements including calling a stored procedure. Tasks in Snowflake may only be run by a schedule. DDL statements relevant to Snowflake tasks are as follows:
Name | Description |
---|---|
CREATE TASK | Create a new task, also takes parameter for scheduling |
ALTER TASK | Alter an existing task |
DROP TASK | Delete an existing task |
DESCRIBE TASK | Returns task meta data. Must be owner of the task to view this data |
SHOW TASKS | Provides a list of tasks available to the current role |
Scheduling a Snowflake task
The Snowflake task engine has a CRON and NONCRON variant scheduling mechanisms. The CRON variant should look familiar syntactically if you are an avid Linux user. CRON or CRONTAB is the Linux version of Windows task schedule. It is extremely simplified in regards to how it runs a job. It supports a few parameters and points to a SQL statement as part of the task definition. The parameters control the frequency of the job being run, days of week and time.
Here is a table with a few configurations and descriptions of CRON job timings to give you an idea of how it works.
Building a simple Snowflake task
In this example, the task we will be inserting a random letter as well as a timestamp into our TASKDEBUG table. The timestamp column will allow us to see the schedule of the job executing.
Start by creating a database and setting the database context in your Snowflake worksheet.
CREATE DATABASE MSSQLTIPS; USE MSSQLTIPS;
Here is an example script creating a task using NON-CRON notation:
CREATE OR REPLACE TASK TASK_DEBUG WAREHOUSE = COMPUTE_WH SCHEDULE = '1 MINUTE' TIMESTAMP_INPUT_FORMAT = 'YYYY-MM-DD HH24' AS insert into taskdebug with arr as (select array_construct('A','B','C','D','E','F') arr) select arr[ABS(MOD(RANDOM(), array_size(arr)))],CURRENT_TIMESTAMP() from arr;
Here is the same script using CRON notation.
CREATE OR REPLACE TASK TASK_DEBUG WAREHOUSE = COMPUTE_WH SCHEDULE = 'USING CRON 1 * * * * America/New_York' TIMESTAMP_INPUT_FORMAT = 'YYYY-MM-DD HH24' AS insert into taskdebug with arr as (select array_construct('A','B','C','D','E','F') arr) select arr[ABS(MOD(RANDOM(), array_size(arr)))],CURRENT_TIMESTAMP() from arr;
Turning the task on and off
An important note is that even though the task is created in Snowflake, which we can verify by running the show tasks command.
--VIEW TASK WITH SHOW TASKS COMMAND SHOW TASKS;
We can see the state is "suspended" which means the task will not be triggering. To turn our task on, we issue an alter task command.
--AFTER TASK IS CREATED, ITS INITAL STATE IS SUSPENDED, ENABLE TASK ALTER TASK TASK_DEBUG RESUME;
Now that the task is on, we just have to wait 1 minute for the task to trigger and query the TASKDEBUG table for proof that our task is running as expected.
--WAIT 1 MINUTE SELECT * FROM TASKDEBUG;
Adding descending tasks onto an existing task
Lastly, let's suppose that we would like another task to trigger after our initial task TASK_DEBUG runs. Snowflake offers descending task functionality; we will define a similar insert statement as previously shown. In this task we will hard code the first value as 'x' so that we can be sure the 2nd task is triggering.
Before we can stack any tasks on our initial task, we must suspend the initial task.
--TURN TASK OFF ALTER TASK TASK_DEBUG SUSPEND;
Now that the task is in a suspended state, we can create the second task to run after the first.
--CREATE A 2ND TASKS WITH AFTER CLAUSE CREATE OR REPLACE TASK TASK_DEBUG_2 WAREHOUSE = COMPUTE_WH AFTER TASK_DEBUG --SPECIFY WHICH TASK IT OPERATES AFTER AS insert into taskdebug select 'X',CURRENT_TIMESTAMP() from arr;
We can verify the task dependence by using the following query:
--CHECK DEPENTANT TASKS select * from table(information_schema.task_dependents(task_name => 'MSSQLTIPS.PUBLIC.TASK_DEBUG', recursive => false));
Here we can verify two things, first that TASK_DEBUG_2 has a Predecessor task of TASK_DEBUG. And also, that TASK_DEBUG_2 does not have a schedule of its own.
Now turn the original task back on and also enabled the second task. Its worth noting here that TASK_DEBUG_2 must be enabled first and then the root task.
--TURN ORIGINAL TASK BACK ON ALTER TASK TASK_DEBUG_2 RESUME; ALTER TASK TASK_DEBUG RESUME;
Wait 1 minute for the task to run and check the TASKDEBUG table.
--WAIT 1 MINUTE, VERYIFY THAT 'X' AND OTHER LETTERS ARE IN THE TASKDEBUG TABLES SELECT * FROM TASKDEBUG;
Here we can see an 'X' value and a random letter which verifies that both the tasks were operated.
To save on compute charges suspend both of the tasks.
--TURN BOTH TASKS OFF ALTER TASK TASK_DEBUG SUSPEND; ALTER TASK TASK_DEBUG_2 SUSPEND;
In summary tasks are very handy in Snowflake, they can be combined with streams, snowpipe and other techniques to make them extremely powerful.
Next Steps
- Check out this Snowflake overview
- For a comparison of SQL Server Agent Jobs compared to Snowflake tasks check out this article
- Look at Snowflake tasks documentation details
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips
Article Last Updated: 2020-12-09