By: Koen Verbeeck | Updated: 2023-05-08 | Comments | Related: > Azure Cosmos DB
Problem
I'm trying to learn more about Azure Cosmos DB. As usual, you learn the most by actually working with the product. I would like to load some data into a container, write some SQL queries on that container, and see how the integration works with other Azure services. However, since each item in the container is represented as JSON, creating your own sample data can be quite cumbersome. Is there a ready-to-use data set I can utilize?
Solution
Azure Cosmos DB is a globally distributed, multi-model database service offered by Microsoft Azure. It provides a highly scalable and available platform for storing and querying large amounts of data using various data models, including NoSQL (also sometimes referred to as "document" or SQL API), Column-Family (Cassandra), Graph (Gremlin), and Key-Value (Table) APIs. Recently, Microsoft has also added support for MongoDB and PostgreSQL. For more background information, check out the tip Introduction to Azure Cosmos DB database and the SQL API .
In this tip, we're exclusively using the Cosmos NoSQL API. There are multiple methods to get data into Azure Cosmos DB. If you already have some sample data in a relational database, you can try one of these methods:
- Use the Azure Cosmos DB Data Migration tool to import the data into your Cosmos DB container. You can find instructions on how to do this in the tip Migrating SQL Data into Azure Cosmos DB.
- You can create your own import using pipelines in Azure Data Factory. You might have to do a two-step process if your data is more complex than just one table with no nested values. This blog post explains the process: How to Store Normalized SQL Server Data into Azure Cosmos DB.
Another method is to use a sample database – called cosmicworks – that is already provided by Microsoft.
Prerequisites
If you don't have an Azure Cosmos DB account in your tenant, follow the prerequisite steps in the tip named Analyze Azure Cosmos DB data with Synapse Serverless SQL Pools to set up an account. If you don't have an Azure subscription, try Cosmos DB using the emulator. This tool allows you to develop and test Cosmos DB locally on your computer and is available to download: cosmosdb-emulator.
After you've run the installation wizard, it will launch itself in the browser:
With the emulator, you can test basic functionality. It's only possible to have a database with provisioned throughput; the serverless option is unavailable locally.
Installing Sample Data Using the Portal
You can install a sample database with a container holding some data with a couple of clicks. In the Data Explorer of your Azure Cosmos DB Account, you will see the following home screen:
When you click in the quick start, it will open a dialog allowing you to create a sample container with the associated database:
The sample container will contain 295 JSON documents holding product information.
In the emulator, you have a similar wizard:
However, this wizard creates a small Persons database.
This container only has four JSON documents with a very simple structure.
Installing Sample Data with the Command Line
If you want the full cosmicworks sample database in the emulator or want to install the sample database programmatically instead of manually through the portal, you will need the cosmicworks nuget package. If you have the .Net SDK installed on your machine, you can run the dotnet command from the prompt. Run the following command to install the cosmicworks package:
dotnet tool install –global cosmicworks
Once the tool is installed, you can deploy a copy of the cosmicworks database to a Cosmos DB account. You will need the endpoint URI of the account and the account key (called the primary key in the emulator). For the emulator, both can be found in the sample place on the quickstart overview page:
For a regular Azure Cosmos DB account, you can find the URI on the overview page of the account:
On the Keys page, you can find the primary and secondary keys (you only need the first one) and the URI as well:
When you have found the necessary information, you can run the following command from the prompt:
Cosmicworks --endpoint myendpoint --key myprimarykey --datasets product
This will load all the product sample documents to a database called cosmicworks in a container called products.
Products is actually one of the multiple sample datasets. You can find more information on them in the Github project repo.
By specifying the name of another dataset, you can install additional sample containers in the cosmicworks database (the data itself is modeled after the AdventureWorks sample database for SQL Server). As you can see, with the command line, you have more options for sample data than through the portal. The other datasets are also much larger. For example, the customers dataset contains over 50,000 documents:
Installing Sample Data with Visual Studio
Instead of loading each dataset through the command prompt, you can simultaneously load them through a Visual Studio project. The cosmicsworks github repo contains a Visual Studio project that allows you to run a program that will upload everything for you. This repo is a demo environment to showcase the capabilities of Azure Cosmos DB and how you can model normalized data from a relational database into Cosmos DB. Everything is licensed under the MIT license, so you can download the source code from the repo and run it to load the sample data.
In the Visual Studio solution, you need to fill in the URI and the primary key in the appSettings.json files.
Then you must configure the modeling_demos project as the startup project.
When you run the project (press F5), you will be presented with a menu—press "k" to create the databases and the containers.
This will only work on an Azure Cosmos DB account with provisioned throughput or the emulator. It will error out a serverless account. Once the objects are created, press "l" to load all the sample data.
This might take a while. Running the program will create multiple databases with multiple containers.
Each database represents a different iteration in the modeling process explained in a presentation. Check out the readme file of the repo for more information.
You now have multiple databases with sample data you can use to familiarize yourself with Azure Cosmos DB.
Next Steps
- You can find all Azure Cosmos DB tips in this overview.
- There are plenty of Azure tips on this website.
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips
Article Last Updated: 2023-05-08