By: Pablo Echeverria | Updated: 2022-03-23 | Comments | Related: > SQL Server and Oracle Comparison
Problem
There are multiple tools and methods for loading data into a database, such as tools like SQL Server Management Studio or SQLDeveloper, using T-SQL and PL/SQL, third-party software, etc. Some specific tools are Bulk Copy Program (BCP) for SQL Server and SQL*Loader (SQLLDR) for Oracle. We will take a look at both of these to see how you can load data into SQL Server and Oracle.
Solution
In the next section you will see the parameters available for both BCP and SQLLDR, and we will load data into a database using both.
For data loading I’m going to get some tweets to do a sentiment analysis, so I’m going to use the PowerShell script below:
$APIKey = '<YourKeyHere>' $SearchWord = 'sql' $OutFile = 'C:\Temp\input.txt' $APIKeySecret = [System.Convert]::ToBase64String([System.Text.Encoding]::ASCII.GetBytes($APIKey)) $result = Invoke-RestMethod -Method Post -Uri https://api.twitter.com/oauth2/token -ContentType "application/x-www-form-urlencoded;charset=UTF-8" -Headers @{'Authorization'='Basic '+$APIKeySecret} -Body 'grant_type=client_credentials' $OAuthToken = $result | select -ExpandProperty access_token $result = Invoke-WebRequest -Uri ('https://api.twitter.com/1.1/search/tweets.json?q='+$SearchWord+' exclude:retweets&lang=en&count=100') -Headers @{'Authorization'='Bearer '+$OAuthToken} Clear-Content $OutFile Add-Content $OutFile (($result | select -ExpandProperty Content | ConvertFrom-Json | select -ExpandProperty statuses).text -replace "`n|`r" | select-string -pattern $SearchWord)
- Line 1 sets my Twitter API key, which you can obtain from Twitter.
- Line 2 specifies the word I’m going to search, this what I will perform a sentiment analysis on later.
- Line 3 sets my output file.
- Line 4 converts my Twitter key into base 64.
- Lines 5 and 6 get from Twitter the OAuthToken to be used when performing searches.
- Line 7 performs the search, excluding retweets, in English language, requiring 100 tweets to be returned (otherwise it only returns 15).
- Line 8 clears the output file.
- Line 9 saves only the text response from Twitter (because it returns XML with all kinds of information) removing the ads.
With this file in place, we can load the records into both databases to test BCP and SQLLDR.
Oracle SQLLDR
The default tool for data loading in Oracle is named SQL*Loader (SQLLDR), which from the official documentation states it has a powerful data parsing engine that puts little limitation on the format of the data in the data file.
You can use SQL*Loader to do the following:
- Load data across a network if your data files are on a different system than the database.
- Load data from multiple data files during the same load session.
- Load data into multiple tables during the same load session.
- Specify the character set of the data.
- Selectively load data (you can load records based on the records' values).
- Manipulate the data before loading it, using SQL functions.
- Generate unique sequential key values in specified columns.
- Use the operating system's file system to access the data files.
- Load data from disk, tape, or named pipe.
- Generate sophisticated error reports, which greatly aid troubleshooting.
- Load arbitrarily complex object-relational data.
- Use secondary data files for loading LOBs and collections.
- Use conventional, direct path, or external table loads.
If you run it on the command line with no parameters, it shows the usage as below:
PS C:\> sqlldr
SQL*Loader: Release 19.0.0.0.0 - Production on Mon Feb 21 16:58:37 2022 Version 19.3.0.0.0 Copyright (c) 1982, 2019, Oracle and/or its affiliates. All rights reserved. Usage: SQLLDR keyword=value [,keyword=value,...] Valid Keywords: userid -- ORACLE username/password control -- control file name log -- log file name bad -- bad file name data -- data file name discard -- discard file name discardmax -- number of discards to allow (Default all) skip -- number of logical records to skip (Default 0) load -- number of logical records to load (Default all) errors -- number of errors to allow (Default 50) rows -- number of rows in conventional path bind array or between direct path data saves (Default: Conventional path 250, Direct path all) bindsize -- size of conventional path bind array in bytes (Default 1048576) silent -- suppress messages during run (header,feedback,errors,discards,partitions) direct -- use direct path (Default FALSE) parfile -- parameter file: name of file that contains parameter specifications parallel -- do parallel load (Default FALSE) file -- file to allocate extents from skip_unusable_indexes -- disallow/allow unusable indexes or index partitions (Default FALSE) skip_index_maintenance -- do not maintain indexes, mark affected indexes as unusable (Default FALSE) commit_discontinued -- commit loaded rows when load is discontinued (Default FALSE) readsize -- size of read buffer (Default 1048576) external_table -- use external table for load; NOT_USED, GENERATE_ONLY, EXECUTE columnarrayrows -- number of rows for direct path column array (Default 5000) streamsize -- size of direct path stream buffer in bytes (Default 256000) multithreading -- use multithreading in direct path resumable -- enable or disable resumable for current session (Default FALSE) resumable_name -- text string to help identify resumable statement resumable_timeout -- wait time (in seconds) for RESUMABLE (Default 7200) date_cache -- size (in entries) of date conversion cache (Default 1000) no_index_errors -- abort load on any index errors (Default FALSE) partition_memory -- direct path partition memory limit to start spilling (kb) (Default 0) table -- Table for express mode load date_format -- Date format for express mode load timestamp_format -- Timestamp format for express mode load terminated_by -- terminated by character for express mode load enclosed_by -- enclosed by character for express mode load optionally_enclosed_by -- optionally enclosed by character for express mode load characterset -- characterset for express mode load degree_of_parallelism -- degree of parallelism for express mode load and external table load trim -- trim type for express mode load and external table load csv -- csv format data files for express mode load nullif -- table level nullif clause for express mode load field_names -- field names setting for first record of data files for express mode load dnfs_enable -- option for enabling or disabling Direct NFS (dNFS) for input data files (Default FALSE) dnfs_readbuffers -- the number of Direct NFS (dNFS) read buffers (Default 4) sdf_prefix -- prefix to append to start of every LOB File and Secondary Data File help -- display help messages (Default FALSE) empty_lobs_are_null -- set empty LOBs to null (Default FALSE) defaults -- direct path default value loading; EVALUATE_ONCE, EVALUATE_EVERY_ROW, IGNORE, IGNORE_UNSUPPORTED_EVALUATE_ONCE, IGNORE_UNSUPPORTED_EVALUATE_EVERY_ROW direct_path_lock_wait -- wait for access to table when currently locked (Default FALSE) PLEASE NOTE: Command-line parameters may be specified either by position or by keywords. An example of the former case is 'sqlldr scott/tiger foo'; an example of the latter is 'sqlldr control=foo userid=scott/tiger'. One may specify parameters by position before but not after parameters specified by keywords. For example, 'sqlldr scott/tiger control=foo logfile=log' is allowed, but 'sqlldr scott/tiger control=foo log' is not, even though the position of the parameter 'log' is correct.
In the script below I prepare my environment and load data. The command is in yellow and the output of each command is displayed below it in grey.
# Create the user, drop if exists $Script=@" BEGIN FOR i IN (SELECT 1 FROM DBA_USERS WHERE USERNAME='MYDB') LOOP EXECUTE IMMEDIATE 'DROP USER MYDB CASCADE'; END LOOP; EXECUTE IMMEDIATE 'CREATE USER MYDB IDENTIFIED BY MyPwd'; EXECUTE IMMEDIATE 'GRANT CONNECT, CREATE TABLE TO MYDB'; EXECUTE IMMEDIATE 'ALTER USER MYDB QUOTA 1G ON SYSTEM'; END; / "@ $Script | sqlplus -s / as sysdba
PL/SQL procedure successfully completed.
# Create the table "CREATE TABLE MYTBL(TEXT VARCHAR2(4000));" | sqlplus -s MYDB/MyPwd
Table created.
# Get file size Write-Host((Get-Item C:\setup\input.txt).length/1KB)
# Import data sqlldr MYDB/MyPwd TABLE=MYTBL DATA='c:\setup\input.txt' DIRECT=TRUE MULTITHREADING=TRUE PARALLEL=TRUE
SQL*Loader: Release 19.0.0.0.0 - Production on Wed Feb 23 11:32:14 2022 Version 19.3.0.0.0 Copyright (c) 1982, 2019, Oracle and/or its affiliates. All rights reserved. Express Mode Load, Table: MYTBL Path used: Direct Load completed - logical record count 72. Table MYTBL: 72 Rows successfully loaded. Check the log file: MYTBL.log for more information about the load.
# Show contents of log file Get-Content MYTBL.log
SQL*Loader: Release 19.0.0.0.0 - Production on Wed Feb 23 11:32:14 2022 Version 19.3.0.0.0 Copyright (c) 1982, 2019, Oracle and/or its affiliates. All rights reserved. Express Mode Load, Table: MYTBL Data File: c:\setup\input.txt Bad File: input.bad Discard File: none specified (Allow all discards) Number to load: ALL Number to skip: 0 Errors allowed: 50 Continuation: none specified Path used: Direct - with parallel option. Table MYTBL, loaded from every logical record. Insert option in effect for this table: APPEND Column Name Position Len Term Encl Datatype ------------------------------ ---------- ----- ---- ---- --------------------- TEXT FIRST * , CHARACTER Generated control file for possible reuse: OPTIONS(DIRECT=TRUE, PARALLEL=TRUE) LOAD DATA INFILE 'c:\setup\input.txt' APPEND INTO TABLE MYTBL FIELDS TERMINATED BY "," ( TEXT CHAR(4000) ) End of generated control file for possible reuse. Table MYTBL: 72 Rows successfully loaded. 0 Rows not loaded due to data errors. 0 Rows not loaded because all WHEN clauses were failed. 0 Rows not loaded because all fields were null. Bind array size not used in direct path. Column array rows : 5000 Stream buffer bytes: 256000 Read buffer bytes: 1048576 Total logical records skipped: 0 Total logical records read: 72 Total logical records rejected: 0 Total logical records discarded: 0 Total stream buffers loaded by SQL*Loader main thread: 1 Total stream buffers loaded by SQL*Loader load thread: 0 Run began on Wed Feb 23 11:32:14 2022 Run ended on Wed Feb 23 11:32:15 2022 Elapsed time was: 00:00:00.37 CPU time was: 00:00:00.11
# Review results "SELECT COUNT(1) FROM MYTBL;" | sqlplus -s MYDB/MyPwd | Write-Host
COUNT(1) ---------- 72
Note the following:
- The first script drops the user and its table (if exists) and creates the user new with permissions to create the table. Then it connects with this new user to create the table.
- When loading data, if you don’t specify a control file (as in my case), you will use the Express Mode, which has fewer options but is more straightforward.
- If you don’t specify DIRECT, it will try to load using an EXTERNAL TABLE which requires the user to have the CREATE ANY DIRECTORY permission; it is not recommended that normal users have this privilege. If you grant this privilege and load without MULTITHREADING nor PARALLEL, it takes around 6 seconds to load the data.
- If you don’t specify MULTITHREADING, it will try loading using a single CPU. If you specify it in DIRECT mode and without PARALLEL, it takes around 4 seconds to load the data.
- If you don’t specify PARALLEL, it will try loading in a single session. If you specify it alongside with DIRECT mode and MULTITHREADING, it takes 0.37 seconds to load the data as is seen in the output file MYTBL.log.
SQL Server BCP
The default tool for data loading in SQL Server is named Bulk Copy Program (BCP), which from the official documentation states bulk copies data between an instance of Microsoft SQL Server and a data file in a user-specified format. The bcp utility can be used to import large numbers of new rows into SQL Server tables or to export data out of tables into data files. Except when used with the queryout option, the utility requires no knowledge of Transact-SQL. To import data into a table, you must either use a format file created for that table or understand the structure of the table and the types of data that are valid for its columns.
If you run it with the /v flag, it shows the version:
PS C:\> bcp /v
BCP - Bulk Copy Program for Microsoft SQL Server. Copyright (C) Microsoft Corporation. All Rights Reserved. Version: 15.0.2000.5
If you run it with a double dash, it shows the usage:
PS C:\> bcp ––
usage: C:\Program Files\Microsoft SQL Server\Client SDK\ODBC\170\Tools\Binn\bcp.exe {dbtable | query} {in | out | queryout | format} datafile [-m maxerrors] [-f formatfile] [-e errfile] [-F firstrow] [-L lastrow] [-b batchsize] [-n native type] [-c character type] [-w wide character type] [-N keep non-text native] [-V file format version] [-q quoted identifier] [-C code page specifier] [-t field terminator] [-r row terminator] [-i inputfile] [-o outfile] [-a packetsize] [-S server name] [-U username] [-P password] [-T trusted connection] [-v version] [-R regional enable] [-k keep null values] [-E keep identity values][-G Azure Active Directory Authentication] [-h "load hints"] [-x generate xml format file] [-d database name] [-K application intent] [-l login timeout]
In the script below I prepare my environment and load data, the output of each command is displayed below it if there is output:
# Create the database, drop if exists SQLCMD -Q "DROP DATABASE IF EXISTS MyDb; CREATE DATABASE MyDb;"
# Create the table SQLCMD -Q "CREATE TABLE MyDb.dbo.MyTbl([Text] VARCHAR(MAX));"
# Get file size Write-Host((Get-Item C:\setup\input.txt).length/1KB)
8.5732421875
# Import data bcp MyDb.dbo.MyTbl IN C:\setup\input.txt -T -c
Starting copy... 72 rows copied. Network packet size (bytes): 4096 Clock Time (ms.) Total : 47 Average : (1531.91 rows per sec.)
# Review results SQLCMD -Q "SELECT COUNT(1) FROM MyDb.dbo.MyTbl;"
----------- 72 (1 rows affected)
Note the following:
- The first command drops the database if exists and creates a new one.
- The second command creates the table.
- The third command imports the data, the flag -T is used to establish a trusted connection and -c is used to import character data.
- You can see it took 47 ms to import the data and the program calculated in theory it could insert 1531 rows per second.
Conclusion
As you can see, in a default instance, default database, and default load, SQL Server is 10 times faster than Oracle. But of course, there are optimizations both in Oracle and SQL Server especially when doing imports that can reduce the differences in time. It is up to you to experiment with the type of load you will perform, and adjust the databases, tables, and loader program to your scenario, to do a better comparison and decide which one is better for you. As you noticed there are different options when loading data that are not available in both SQL Server and Oracle.
Next Steps
- Here is a previous tip on how to install both databases in Docker.
- Here is a previous tip on how to create a database in SQL Server and Oracle.
- Here is a tip on how to extend BCP functionality with PowerShell.
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips
Article Last Updated: 2022-03-23