By: Pablo Echeverria | Updated: 2023-08-10 | Comments | Related: > SQL Server and Oracle Comparison
Problem
When you're a programmer, sooner or later, you'll need to watch for files to arrive at a specific location; either you need to wait for an asynchronous task to complete unzipping/moving/downloading a file, the file needs to be manipulated by several programs at different stages, you need to transfer several files to another location, etc. You can implement this on your own by subscribing to the host operating system notifications using the C# FileSystemWatcher, the Java WatchService, or similar in another language. In the end, you'll find out you don't get notified promptly every time for every single file. You'll need to add a time-based module to make it failsafe. However, the program will have grown too big at this point, and you have invested so much time tuning it. How can this be simplified, leveraging the database's built-in capabilities in either Oracle or SQL Server?
Solution
SQL Server and Oracle provide out-of-the-box mechanisms to watch for files to arrive at a specific location, so you don't need to watch, tune or debug it. You can subscribe to event notifications from the database, being notified when a file arrives, with the benefit of the file metadata available in the database using a tested and reliable process, and only needs manipulating as you see fit. The only caveat is that neither option works in Docker containers; you need a full Windows installation.
Oracle file watcher process
The Oracle functionality comes from the DBMS_SCHEDULER package, specifically the method CREATE_FILE_WATCHER. But first, you must enable JVM as described in Oracle Semantic Search, steps 1-5. Then you must create an OS credential (see below) by replacing "oracle" with your OS username and replacing "password" with the user password:
BEGIN DBMS_CREDENTIAL.CREATE_CREDENTIAL('FILEWATCHERCRED', 'oracle', 'password'); END;
Once that is done, you can create the file watcher in the following way:
BEGIN DBMS_SCHEDULER.CREATE_FILE_WATCHER( FILE_WATCHER_NAME => 'FILEWATCHER', DIRECTORY_PATH => 'C:\setup\DocumentStore\', FILE_NAME => '*.txt', CREDENTIAL_NAME => 'FILEWATCHERCRED', MIN_FILE_SIZE => 1, STEADY_STATE_DURATION => INTERVAL '60' SECOND, ENABLED => FALSE); END;
Note the following parameters:
- DIRECTORY_PATH: Location where new files will arrive.
- FILE_NAME: You can use wildcards to filter for specific filenames.
- CREDENTIAL_NAME: The credential created earlier.
- MIN_FILE_SIZE: Only after the file size is greater than this number of bytes will it be processed by the watcher.
- STEADY_STATE_DURATION: If the file hasn't been modified in this amount of time, it will be processed by the watcher.
The file watcher has been created in a disabled state because we haven't defined its processing method. Now, let's create the destination table:
CREATE TABLE FILEWATCHERTBL ( NAME VARCHAR2(30), PATH VARCHAR2(30), DATETIME TIMESTAMP);
And define the procedure that will populate it:
CREATE OR REPLACE PROCEDURE FILEWATCHERPROC (FWR IN SYS.SCHEDULER_FILEWATCHER_RESULT) AS BEGIN INSERT INTO FILEWATCHERTBL(PATH, NAME, DATETIME) VALUES (FWR.DIRECTORY_PATH, FWR.ACTUAL_FILE_NAME, FWR.FILE_TIMESTAMP); COMMIT; END;
Note: The type SCHEDULER_FILEWATCHER_RESULT has many more fields that you can use when inserting the file in the database:
- DESTINATION: The host name or IP address where the file was found because you can watch for files on another computer.
- DIRECTORY_PATH: Absolute path of the file.
- ACTUAL_FILE_NAME: Name of the file.
- FILE_SIZE: Size of the file in bytes.
- FILE_TIMESTAMP: Timestamp when the file was found after considering the MIN_FILE_SIZE and STEADY_STATE_DURATION.
Now we need to create a program that will consume the procedure:
BEGIN DBMS_SCHEDULER.CREATE_PROGRAM( PROGRAM_NAME => 'FILEWATCHERPRG', PROGRAM_TYPE => 'STORED_PROCEDURE', PROGRAM_ACTION => 'FILEWATCHERPROC', NUMBER_OF_ARGUMENTS => 1, ENABLED => FALSE); DBMS_SCHEDULER.DEFINE_METADATA_ARGUMENT( PROGRAM_NAME => 'FILEWATCHERPRG', METADATA_ATTRIBUTE => 'EVENT_MESSAGE', ARGUMENT_POSITION => 1); DBMS_SCHEDULER.ENABLE('FILEWATCHERPRG'); END;
And we need to create a scheduled job that is the link between the file watcher and the program:
BEGIN DBMS_SCHEDULER.CREATE_JOB( JOB_NAME => 'FILEWATCHERJOB', PROGRAM_NAME => 'FILEWATCHERPRG', QUEUE_SPEC => 'FILEWATCHER', AUTO_DROP => FALSE, ENABLED => FALSE); END;
When everything is in place, we need to enable the job and the file watcher:
BEGIN DBMS_SCHEDULER.ENABLE('FILEWATCHERJOB, FILEWATCHER'); END;
When a file is placed in the specified directory, its metadata will be inserted in the database:
PATH NAME DATETIME ------------------------ --------- ----------------------------- C:\setup\DocumentStore\ new1.txt 19-JUN-23 05.09.31.000000 AM
There is one additional configuration for the failsafe method. In this case, we set it to check every 60 seconds:
BEGIN DBMS_SCHEDULER.SET_ATTRIBUTE('FILE_WATCHER_SCHEDULE', 'REPEAT_INTERVAL', 'FREQ=SECONDLY; INTERVAL=60'); END;
There is also one additional configuration to launch multiple procedures when multiple files are found:
BEGIN DBMS_SCHEDULER.SET_ATTRIBUTE('FILEWATCHERJOB', 'PARALLEL_INSTANCES', TRUE); END;
If there are issues and the file metadata is not getting inserted, you can first search for the failed reason in the database:
SELECT ERROR#, ADDITIONAL_INFO, DESTINATION FROM SYS.SCHEDULER$_JOB_RUN_DETAILS WHERE LOG_ID=(SELECT MAX(LOG_ID) FROM SYS.SCHEDULER$_JOB_RUN_DETAILS);
You can also search for errors in the trace folder with the following PowerShell command:
Get-ChildItem -Path "C:\app\diag\rdbms\orcl\orcl\trace" -File | Where-Object {$_.Name -like "*_j*.trc"} | Sort-Object LastWriteTime -Descending| Select-Object -First 20 | Select Length, Name, LastWriteTime, CreationTime
If you want to view the existing file watchers, you can use the queries below:
SELECT OBJ#, DIRECTORY_PATH FROM SCHEDULER$_FILE_WATCHER; SELECT OWNER, FILE_WATCHER_NAME, ENABLED FROM DBA_SCHEDULER_FILE_WATCHERS;
SQL Server file watcher process
The SQL Server functionality requires FILESTREAM to be enabled in the SQL Server Configuration Manager console. If you want to use PowerShell, you can use the code below specifying your instance name (see this link and this link):
$instance = "MSSQLSERVER" $wmi = Get-WmiObject -Namespace "ROOT\Microsoft\SqlServer\ComputerManagement15" -Class FilestreamSettings | where {$_.InstanceName -eq $instance} $wmi.EnableFilestream(3, $instance) Get-Service -Name $instance | Restart-Service
Note: This command will create a shared folder named "mssqlserver," but the name can be changed.
Note: The ReturnValue must be 0.
Later, when you put files in the shared folder, it will look like this, where "Win-qsulb167kkl" is the computer name:
Once done, you need to enable it at the instance level using the code below:
EXEC sp_configure filestream_access_level, 2 RECONFIGURE
Now, when you create a database, you need to specify the CONTAINS FILESTREAM option in a filegroup shown in the code below:
CREATE DATABASE Archive ON PRIMARY ( NAME = Arch1, FILENAME = 'C:\setup\archdat1.mdf'), FILEGROUP FileStreamGroup1 CONTAINS FILESTREAM ( NAME = Arch3, FILENAME = 'C:\setup\filestream1') LOG ON ( NAME = Archlog1, FILENAME = 'C:\setup\archlog1.ldf');
Later, when you put a file into the shared folder, it will internally look like this:
Now you need to enable FILETABLE using the code below:
ALTER DATABASE Archive SET FILESTREAM ( NON_TRANSACTED_ACCESS = FULL, DIRECTORY_NAME = N'filetable1' )
You can create the FILETABLE with the code below:
CREATE TABLE DocumentStore AS FileTable WITH ( FileTable_Directory = 'DocumentStore', FileTable_Collate_Filename = database_default );
When you put a file in the shared folder, it will automatically be inserted in this table in the following way:
There are several things you need to be aware of:
- SQL Server automatically creates the computer share; you don't do it manually.
- The permissions on the computer share can only be managed by SQL Server; if a user wants to read or write to it, the login and user must be created in the server instance and database and grant permissions on the FileTable.
- A system administrator can't manage the permissions on the computer share or the internal directories; it is only managed by SQL Server and can't be overridden.
- The FileTable has computed columns, so you can't use Query Notifications directly; you need to use a trigger to a second table or manually notify the client application.
Conclusion
The Oracle FileWatcher provides more configuration options, can be fine-tuned to your needs, and you can use Query Notifications with it. The SQL Server FileTable involves more steps and is inflexible to filter files but inserts the file metadata immediately when a file arrives; however, you can't use Query Notifications immediately.
Next Steps
- Here is the official Oracle documentation:
- Here is the official Microsoft documentation:
- Enable and configure filestream
- Create a filestream enabled database
- Create a table for storing filestream data
- Access filestream data with transact sql
- Avoid conflicts with database operations in filestream applications
- Create client applications for filestream data
- Filestream sql server
- Filetables sql server
- Enable the prerequisites for filetable
- Create alter and drop filetables
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips
Article Last Updated: 2023-08-10