By: Maria Zakourdaev | Updated: 2019-08-15 | Comments (2) | Related: > TSQL
Problem
A few days ago, one of the developers asked me if that was possible to generate test data by performing multiple nested INSERT statements, each of them involving inserting new rows into several parent tables and in the same statement reusing the autogenerated primary keys for the foreign key columns in the child table. The developer was working with PostgreSql and so I tried to find a solution for both PostgreSql and SQL Server to learn the differences in the ANSI features implementation in these different database systems.
Solution
Disclaimers
- The fastest data generation solution is as follows:
- Insert the required number of rows into each parent table
- Get the ids according to the data generation logic and use them to add rows to the child table
In this tip I will not be using the technique above, but try to do this all with just one statement.
On my laptop, I generated 100,000 rows using the below technique. In SQL Server it took 86 seconds compared to the 3 statements logic (like below) which took approximately 5 minutes.
- insert into 1st parent table + store output into the variable
- insert into 2st parent table + store output into the variable
- insert into a child table
ANSI Solution
There is a great feature in the ANSI standards that can be used exactly for this challenge, using a common table expression or CTE. A CTE is a temporary named result set, created from a simple query defined within an execution scope of SELECT, INSERT, UPDATE or DELETE statement.
In PostgreSql, CTE implementation includes data modification query scope. But in SQL Server, the CTEs query definition must meet a view’s requirements which means we cannot modify data inside a CTE.
In this tip I will show you a single statement solution I came up with in PostgreSql and SQL Server and I would love to hear your comments on how you would solve this challenge.
PostgreSql Approach to Load Data into Parent and Child Tables at the Same Time
Before we get started, here is the syntax for creating the three tables.
create database nested_inserts; CREATE TABLE public.product ( productid serial NOT NULL, product_name varchar(256) NOT NULL, color varchar(30) NULL, listprice money NOT null, CONSTRAINT pk_product_productid PRIMARY KEY (productid) ); CREATE TABLE public.salesperson ( salespersonid serial NOT NULL, territoryid int4 NULL, salesquota money NULL, bonus money NOT null, CONSTRAINT pk_salespersonid PRIMARY KEY (salespersonid) ); CREATE TABLE public.salesorderheader ( salesorderid serial not null, salespersonid int4 NOT NULL, productid int4 NOT NULL, orderdate timestamp NOT NULL, shipdate timestamp NULL, status int2 NOT null, CONSTRAINT pk_salesorderheader_salesorderid PRIMARY KEY (salesorderid), CONSTRAINT fk_salesorderheader__salesperson_salespersonid FOREIGN KEY (salespersonid) REFERENCES public.salesperson(salespersonid), CONSTRAINT fk_salesorderheader__product_productid FOREIGN KEY (productid) REFERENCES public.product(productid) );
In PostgreSql, the solution to this challenge is quite simple, we will update two tables in the CTE and use the generated ids as foreign key ids in the third table.
with prod as( INSERT INTO public.product(product_name,color,listprice) VALUES('3D printer','green',560) RETURNING productid ), pers as ( INSERT INTO public.salesperson(territoryid,salesquota,bonus) VALUES(56,5000,100) RETURNING salespersonid ) INSERT INTO public.salesorderheader (salespersonid,productid,orderdate,shipdate,status) SELECT prod.productid, pers.salespersonid, now() as orderdate, now() - INTERVAL '7 day' as shipdate, 1 as status FROM prod,pers select * from public.salesorderheader
SQL Server Approach to Load Data into Parent and Child Tables at the Same Time
Before we get started, here is the syntax for creating the three tables.
create database nested_inserts; use nested_inserts GO
CREATE TABLE dbo.product ( productid int IDENTITY(1,1) NOT NULL, product_name varchar(256) NOT NULL, color varchar(30) NULL, listprice money NOT null, CONSTRAINT pk_product_productid PRIMARY KEY (productid) ); CREATE TABLE dbo.salesperson ( salespersonid int IDENTITY(1,1) NOT NULL, territoryid int NULL, salesquota money NULL, bonus money NOT null, CONSTRAINT pk_salespersonid PRIMARY KEY (salespersonid) ); CREATE TABLE dbo.salesorderheader ( salesorderid int IDENTITY(1,1) not null, salespersonid int NOT NULL, productid int NOT NULL, orderdate datetime NOT NULL, shipdate datetime NULL, status smallint NOT null, CONSTRAINT pk_salesorderheader_salesorderid PRIMARY KEY (salesorderid), CONSTRAINT fk_salesorderheader__salesperson_salespersonid FOREIGN KEY (salespersonid) REFERENCES dbo.salesperson(salespersonid), CONSTRAINT fk_salesorderheader__product_productid FOREIGN KEY (productid) REFERENCES dbo.product(productid) );
As I mentioned earlier, in SQL Server a CTEs query definition must meet a view’s requirements which means we cannot modify data inside the CTE.
We can use INSERT...OUTPUT construction, but another limitation in SQL Server for capturing the results of an OUTPUT clause in a nested INSERT, UPDATE, DELETE or MERGE statement, a target table cannot participate on either side of a FOREIGN KEY constraint.
INSERT INTO dbo.product(product_name,color,listprice) OUTPUT 1 AS salespersonid, -- at first I am trying to do it with only one parent table inserted.productid, getdate() as orderdate, getdate()+7 as shipdate, 1 as status INTO dbo.salesorderheader (salespersonid,productid,orderdate,shipdate,status) VALUES('3D printer','green',560)
Since we are generating the test data and this is not a production system, we can temporary disable foreign keys as follows:
ALTER TABLE dbo.salesorderheader NOCHECK CONSTRAINT fk_salesorderheader__salesperson_salespersonid; ALTER TABLE dbo.salesorderheader NOCHECK CONSTRAINT fk_salesorderheader__product_productid;
We still cannot have two layers of nested INSERTs, because it is not possible to have two OUTPUT INTO clauses in the same statement:
INSERT INTO dbo.salesperson (territoryid,salesquota,bonus,lastsoldproduct) OUTPUT inserted.salespersonid,inserted.lastsoldproduct,getdate(),getdate() +7,1 INTO dbo.salesorderheader (salespersonid,productid,orderdate,shipdate,status) SELECT 56,5000,100, prod.productid FROM ( INSERT INTO dbo.product(product_name,color,listprice) OUTPUT inserted.productid VALUES('3D printer','green',560) ) prod
To overcome the issue, I have used INSERT EXEC construction, but in order to use both autogenerated keys in the OUTPUT statement, I need to have both of them in the inserted table. I have added a new column to the table salesperson for storing the id generated during product creation.
ALTER TABLE dbo.salesperson ADD lastsoldproduct int;
And here is my final statement:
INSERT INTO dbo.salesorderheader (salespersonid,productid,orderdate,shipdate,status) EXEC (' INSERT INTO dbo.salesperson (territoryid,salesquota,bonus,lastsoldproduct) OUTPUT inserted.salespersonid,inserted.lastsoldproduct,getdate(),getdate() +7,1 SELECT 56,5000,100,prod.productid FROM ( INSERT INTO dbo.product(product_name,color,listprice) OUTPUT inserted.productid VALUES(''3D printer'',''green'',560) ) prod ') select * from dbo.salesorderheader
We have succeeded to insert two separate rows into two tables, generate 2 ids and used them in the third insert all in one statement. Take into consideration that this solution required disabling referential integrity keys which is not suggested for production environments.
If you come up with another way to implement the above query, without disabling the keys and without an additional column – I would love to see it. Please enter feedback in the comments section below.
Next Steps
- Common table expressions SQL Server
- Common table expressions PostgreSql
- SQL Server OUTPUT clause
- SQL Server INSERT ...EXEC construct
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips
Article Last Updated: 2019-08-15