Inserting JSON Data from Azure Blob Storage into Azure SQL Database using Dynamic SQL

Reading JSON into Local SQL Variable

In this article, we’ll explore how to read a large number of JSON files from Azure Blob Storage and insert them into an Azure SQL Database table as a single NVARCHAR(max) entry. This process involves using dynamic SQL to execute the INSERT statement.

Prerequisites

Before diving into the code, make sure you have:

  • An Azure SQL Database instance
  • A storage account with an Azure Blob Storage container containing your JSON files
  • The necessary permissions and credentials to access both the database and blob storage

Understanding the Problem

The problem is that we need to read each JSON file as a single string, which becomes a single NVARCHAR(max) entry in the table. The code provided initially doesn’t work due to incorrect syntax.

Step 1: Creating the Table and Constraints

To start solving this problem, you’ll need to create the table and constraints necessary for storing the data.

CREATE TABLE dbo.testJSON (
    rowID smallint NOT NULL PRIMARY KEY,
    jsonData nvarchar(max)
)

ALTER TABLE dbo.testJSON
    ADD CONSTRAINT [Check for JSON] CHECK (ISJSON(jsonData)=1)

This code creates a new table dbo.testJSON with two columns: rowID and jsonData. The rowID column serves as the primary key, while the jsonData column stores the JSON data.

Step 2: Defining Dynamic SQL

To insert the JSON data into the table, we need to use dynamic SQL. This allows us to dynamically construct the INSERT statement based on the current iteration of our loop.

DECLARE @i INT = 1
DECLARE @numObservations INT = 1000
DECLARE @json NVARCHAR(MAX)
DECLARE @file VARCHAR(4000)
DECLARE @sql VARCHAR(4000)

WHILE @i <= @numObservations
    BEGIN
        SET @file = 'testJSON' + CAST(@i AS VARCHAR(5)) + '.json'
        
        -- Construct the INSERT statement using dynamic SQL
        SET @sql = N'INSERT INTO dbo.testJSON (rowID, jsonData)
        SELECT ''' + CAST(@i AS VARCHAR(5)) + ''', * FROM OPENROWSET(BULK ''' + @file + ''', DATA_SOURCE = ''AzureBlobStorage'', SINGLE_CLOB) as x'
        
        -- Execute the dynamic SQL
        EXEC (@sql)
        
        SET @i = @i + 1
    END
GO

How Dynamic SQL Works

Let’s break down how this code works:

  • We start by declaring variables for the current iteration @i, the total number of observations @numObservations, and other auxiliary variables.
  • Inside our loop, we construct a dynamic SQL statement. The key part here is 'INSERT INTO dbo.testJSON (rowID, jsonData) SELECT ''' + CAST(@i AS VARCHAR(5)) + ''', * FROM OPENROWSET(BULK ''' + @file + ''', DATA_SOURCE = ''AzureBlobStorage'', SINGLE_CLOB) as x'
  • We use the SELECT ' + CAST(@i AS VARCHAR(5)) +'' part to dynamically include the current iteration in our row ID column.
  • The 'FROM OPENROWSET(BULK ''' + @file + ''', DATA_SOURCE = ''AzureBlobStorage'', SINGLE_CLOB) as x' part uses the OPENROWSET function to retrieve the JSON data from Azure Blob Storage. We use BULK ' + @file + '' to specify the path of our JSON file, and DATA_SOURCE = ''AzureBlobStorage'' to indicate that we’re retrieving data from this storage source.
  • Finally, we execute the dynamic SQL statement using EXEC (@sql).
  • This process is repeated for each iteration of our loop.

Using ISJSON

We added a CHECK constraint on our table to validate whether the JSON data stored in the jsonData column actually contains valid JSON. The syntax for this validation can be found here: https://docs.microsoft.com/en-us/sql/t-sql/functions/isjson-transact-sql?view=sql-server-ver15

Best Practices

To make this process more efficient and easier to manage:

  • Use a batch size: When inserting data into a database, it’s often a good idea to insert batches of rows at once rather than one row per loop iteration. This can help improve performance.
  • Avoid using SET @sql = N'...': Instead, use string concatenation or the string_format() function to build your dynamic SQL statement.
  • Error handling is essential: Make sure you have a plan in place for error handling and cleanup when working with dynamic SQL.

Example Use Case

Here’s an example of how this code could be used in practice:

-- First, create the table
CREATE TABLE dbo.testJSON (
    rowID smallint NOT NULL PRIMARY KEY,
    jsonData nvarchar(max)
)

ALTER TABLE dbo.testJSON
    ADD CONSTRAINT [Check for JSON] CHECK (ISJSON(jsonData)=1)

-- Define our variables and loop over 1000 observations
DECLARE @i INT = 1
DECLARE @numObservations INT = 1000

WHILE @i <= @numObservations
BEGIN
    SET @file = 'testJSON' + CAST(@i AS VARCHAR(5)) + '.json'
    
    -- Insert the data into our table using dynamic SQL
    INSERT INTO dbo.testJSON (rowID, jsonData)
    SELECT ''' + CAST(@i AS VARCHAR(5)) + ''', * 
    FROM OPENROWSET(BULK ''' + @file + ''', DATA_SOURCE = 'AzureBlobStorage'', SINGLE_CLOB) as x

    SET @i = @i + 1
END

This code can be used to insert data from Azure Blob Storage into an Azure SQL Database table, where the JSON data is stored in a single column. The OPENROWSET function is used to retrieve this data, and dynamic SQL is employed to construct the INSERT statement.

Conclusion

Inserting JSON data from Azure Blob Storage into an Azure SQL Database table using dynamic SQL can be a powerful tool for integrating data from various sources. By understanding how to use dynamic SQL and leveraging features like OPENROWSET and the ISJSON() function, you can efficiently retrieve and store large amounts of data in your database.

Additional Resources

For more information on Azure Blob Storage, Azure SQL Database, and working with JSON data in SQL Server:


Last modified on 2024-04-24