By: Daniel Calbimonte | Updated: 2017-08-14 | Comments (1) | Related: More > Professional Development Certifications
Problem
I would like to have more information about the Microsoft 70-775 exam "Perform Data Engineering on Microsoft Azure HDInsight". Do you have books, links, videos or courses about this exam?
Solution
The Microsoft 70-775 exam is focused on Big Data for Azure. We will cover some questions about this exam and recommend books, links and courses to help you prepare.
FAQ
Who should take this exam?
This exam is oriented to DBAs, Data Scientist, Data Architects, Data Analysists, Data Developers or other professionals who want to learn or who want to be certified in Big Data in Azure or HDInsight more specifically.
What is HDInsight?
Azure HDInsight is the Azure version of Hadoop. It provides big data services including Apache Hive, HBase, Spark, Kafka and other services.
Do I need to have an Azure subscription to study for this exam?
Yes, however there are free versions that require a credit card to register, but are free to use.
What Microsoft Certifications are related to this exam?
This exam is mandatory to get the MCSE (Microsoft Certified Solutions Expert) in Data Management and Analytics. You can also for a MCP (Microsoft Certified Professional) with this exam.
Is the exam difficult?
If you do not have previous experience with Big Data, Azure, PowerShell, it will be hard to pass the exam. If you already worked with HDInsight and the technologies related in the exam, it will not be so difficult.
Which books would you recommend for this exam?
The following books may be useful:
- Big Data Analytics with Microsoft HDInsight in 24 Hours, Sams Teach Yourself
- HDInsight Essentials - Second Edition
- Processing Big Data with Azure HDInsight: Building Real-World Big Data Systems on Azure HDInsight Using the Hadoop Ecosystem
- Microsoft Big Data Solutions
- HDInsight: Microsoft’s Cloud Hadoop
- Mastering Azure Analytics: Architecting in the Cloud with Azure Data Lake, HDInsight, and Spark
- Pro Microsoft HDInsight: Hadoop on Windows
- HDInsight For Beginners
- HDInsight Jump Start
- Dive In HDInsight
- Learning Spark: Lightning-Fast Big Data Analysis
- Advanced Analytics with Spark: Patterns for Learning from Data at Scale
- High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark
- Storm Applied: Strategies for real-time event processing 1st Edition by Sean T. Allen (Author), Matthew Jankowski (Author), Peter Pathirana (Author)
- Big Data: Principles and best practices of scalable realtime data systems
- Getting Started with Storm: Continuous Streaming Computation with Twitter's Cluster Technology
- Streaming Architecture: New Designs Using Apache Kafka and MapR Streams 1st Edition
- Kafka: The Definitive Guide: Real-Time Data and Stream Processing at Scale
- Learning Apache Kafka, Second Edition 2nd Edition
- Apache HBase Primer 1st ed. Edition
- HBase in Action 1st Edition
Are there some courses for this exam?
Yes, the following courses will be useful:
- Big Data Analytics with HDInsight: Hadoop on Azure
- Microsoft Azure HDInsight Big Data Analyst
- Learn fundamental big data methods in six straightforward courses
- Big Data Hadoop and Spark Developer Certification Training
Can you provide some links to study, for this exam?
Yes, are some useful links:
Administer and Provision HDInsight Clusters
- Deploy HDInsight clusters
- Virtual Network (VNET) support for HDInsight is now generally available
- Extend Azure HDInsight using an Azure Virtual Network
- Set up clusters in HDInsight with Hadoop, Spark, Kafka, and more
- Create Hadoop clusters in HDInsight by using Resource Manager templates
- Configure Domain-joined HDInsight clusters
- How To Choose The Right Azure Hdinsight Cluster
- Customize Linux-based HDInsight clusters using Script Action
- Create HDInsight clusters using the Azure CLI
- Migrating to Azure Resource Manager-based development tools for HDInsight clusters
- Create Linux-based clusters in HDInsight using Azure PowerShell
- Apache Kafka on HDInsight with Azure Managed Disks
- Virtual network peering
- Deploy and secure multi-user HDInsight clusters
- An introduction to Hadoop security with domain-joined HDInsight clusters (Preview)
- Manage Domain-joined HDInsight clusters (Preview)
- Configure Domain-joined HDInsight clusters (Preview)
- Apache Ambari Reference
- Secure your Enterprise Hadoop environments on Azure
- Manage Hadoop clusters in HDInsight by using Azure PowerShell
- Securing Azure HDInsight with Apache Ranger & Azure Active Directory Domain-joined Clustering
- Securing Azure HDInsight with Apache Ranger & Azure Active Directory Domain-joined Clustering
- Use SSH Tunneling to access Ambari web UI, JobHistory, NameNode, Oozie, and other web UIs
- Connect to HDInsight (Hadoop) using SSH
- Securing Azure HDInsight
- Ingest data for batch and interactive processing
- Collecting and loading data into HDInsight
- Upload data for Hadoop jobs in HDInsight
- Overview of Azure Data Lake Store
- Using Azure Data Lake Store for big data requirements
- Upload data for Hadoop jobs in HDInsight
- Use Azure storage with Azure HDInsight clusters
- Azure CLI 2.0
- Using Sqoop to Move Data into Hive
- Use Apache Sqoop to import and export data between Hadoop on HDInsight and SQL Database
- Getting started with Sqoop in HDInsight
- Sqoop on Spark for Data Ingestion
- ADF Tutorial - part 1 of 4
- Transfer data with the AzCopy on Windows
- Copy data from Azure Storage Blobs to Data Lake Store
- Using Azure Data Lake Store for big data requirements
- Configure HDInsight clusters
- Hive Metastore in HDInsight –Tips, Tricks & Best Practices
- Manage HDInsight clusters by using the Ambari Web UI
- Using Host Config Group
- Modify configurations
- Accessing Hadoop Logs in HDInsight
- Customize HDInsight clusters using Bootstrap
- Analyze HDInsight logs
- Set up clusters in HDInsight with Hadoop, Spark, Kafka, and more
- Manage Hadoop clusters in HDInsight by using .NET SDK
- Manage Hadoop clusters in HDInsight by using Azure PowerShell
- Manage HDInsight clusters by using the Ambari REST API
- Monitor Hadoop clusters in HDInsight using the Ambari API
- Manage and debug HDInsight jobs
- Access YARN application logs on Linux-based HDInsight
- How to Find and Kill a running Yarn Application Master in HDInsight with and without SSH access
- Hadoop Architecture Overview
- Use Apache Spark REST API to submit remote jobs to an HDInsight Spark cluster
- Submit Hadoop jobs in HDInsight
- Debug Apache Spark jobs running on Azure HDInsight
- What is Operations Management Suite (OMS)?
- Microsoft monitoring product comparison
- Managing alerts with Microsoft monitoring
Implement Big Data Batch Processing Solutions
- Implement batch solutions with Hive and Apache Pig
- What is Apache Hive and HiveQL on Azure HDInsight?
- Create Hive tables and load data from Azure Blob Storage
- Optimize Hive queries in Azure HDInsight
- Partitions & Buckets in #Hive
- Hive and XML File Processing
- Process and analyze JSON documents using Hive in HDInsight
- HDInsight (Azure Hadoop) JSON Hive files – Environment setup
- Optimizing Joins running on HDInsight Hive on Azure at GFS
- Hive Join Strategies
- Use a Java UDF with Hive in HDInsight
- Hadoop Hive UDF Tutorial - Extending Hive with Custom Functions
- Use Python User Defined Functions (UDF) with Hive and Pig in HDInsight
- Transform data using Hive Activity in Azure Data Factory
- How Parquet.Net from Elastacloud Will Empower your Big Data Applications
- CREATE EXTERNAL FILE FORMAT (Transact-SQL)
- Design batch ETL solutions for big data with Spark
- Manage resources for Apache Spark cluster on Azure HDInsight
- Spark troubleshooting
- Improving Spark Performance With Partitioning
- Partitions and Partitioning
- Spark Data Sources
- Introducing Apache Spark Datasets
- Pyspark.sql module
- Apache Spark 2.0 Performance Improvements Investigated With Flame Graphs
- Operationalize Hadoop and Spark
- Create on-demand Hadoop clusters in HDInsight using Azure Data Factory
- Transform data in Azure Data Factory
- Create on-demand Hadoop clusters in HDInsight using Azure Data Factory
- Transform data in Azure Data Factory
- Why Oozie?
- Integrating Your Central Apache Hive Metastore with Apache Spark on Databricks
- Tutorial: Build your first pipeline to transform data using Hadoop cluster
- Comparing Azure Data Lake Store and Azure Blob Storage
- Understanding WASB and Hadoop Storage in Azure
- Why use Blob Storage with HDInsight on Azure
Implement Big Data Interactive Processing Solutions
- Implement interactive queries for big data with Spark SQL
- Introduction to Spark on HDInsight
- Running Hive Queries Using Spark SQL
- Run interactive queries on an HDInsight Spark cluster
- RDD Caching and Persistence
- Using DataFrames iteratively leads to slow query planning
- Reading Parquet Files
- Apache Spark BI using data visualization tools with Azure HDInsight
- What is JOIN in Apache Spark
- Broadcast Join with Spark
- Optimizing Apache Spark SQL Joins
- How to: Run Queries on Spark SQL using JDBC via Thrift Server
- Manage resources for Apache Spark cluster on Azure HDInsight
- Perform exploratory data analysis by using Spark SQL
- Jupyter Notebooks in Azure Machine Learning Studio the perfect tool for Academics and Students
- Use Zeppelin notebooks with Apache Spark cluster on Azure HDInsight
- Join Two DataFrames without a Duplicated Column
- Use Apache Spark REST API to submit remote jobs to an HDInsight Spark cluster
- Running an Interactive Session With the Livy API
- Implement interactive queries for big data with Interactive Hive
- Perform exploratory data analysis by using Hive
- Perform interactive processing by using Apache Phoenix on HBase
- Use Apache Phoenix with Linux-based HBase clusters in HDInsight
- Grammar
- Transactions (beta)
- User-defined functions(UDFs)
- Secondary Indexing
- Performance
- Tuning Guide
- Apache Phoenix vs Hive-Spark
- How is Apache Phoenix different from Hive-Hbase integration?
- Use Apache Phoenix with Linux-based HBase clusters in HDInsight
Implement Big Data Real-Time Processing Solutions
- Create Spark streaming applications using DStream API
- Apache Spark streaming: Process data from Azure Event Hubs with Spark cluster on HDInsight
- Spark Streaming Programming Guide
- Transformations on DStreams
- Data Storage Options (Building Real-World Cloud Apps with Azure)
- Chapter 1. Enterprise Analytics Fundamentals
- Introduction to Microsoft Azure Storage
- Apache Spark streaming (DStream) example with Kafka (preview) on HDInsight
- Apache Spark streaming: Process data from Azure Event Hubs with Spark cluster on HDInsight
- Real-time streaming in Power BI
- Visualize big data with Power BI and Spark on Azure HDInsight
- Structured Streaming Programming Guide
- Create Spark structured streaming applications
- Spark SQL, DataFrames and Datasets Guide
- Window Operations on Event Time
- Stateful Transformations with Windowing in Spark Streaming
- Introducing Window Functions in Spark SQL
- Get started with Azure Data Lake Store using the Azure Portal
- Choosing between Azure Event Hub and Kafka: What you need to know
- Visualize big data with Power BI and Spark on Azure HDInsight
- Develop big data real-time processing solutions with Apache Storm
- Understanding the Parallelism of a Storm Topology
- What is Apache Storm on Azure HDInsight?
- Example Storm topologies and components for Apache Storm on HDInsight
- Real-time Big Data Processing with Storm
- Joining Streams in Storm Core
- Local Mode
- Understanding the Parallelism of a Storm Topology
- Debugging an Apache Storm topology
- Concepts
- hdinsight-storm-examples
- Develop C# topologies for Apache Storm by using the Data Lake tools for Visual Studio
- Build solutions that use Kafka
- Set up clusters in HDInsight with Hadoop, Spark, Kafka, and more
- Configuring Kafka for Performance and Resource Management
- Apache Kafka on HDInsight with Azure Managed Disks
- Use MirrorMaker to replicate Apache Kafka topics with Kafka on HDInsight (preview)
- Use Apache Kafka (preview) with Storm on HDInsight
- Build solutions that use HBase
- HBase Architecture, Use cases & Best practices in HDInsight
- What is HBase in HDInsight: A NoSQL database that provides BigTable-like capabilities for Hadoop
- HBase - Shell
- Get started with an Apache HBase example in HDInsight
- HDInsight HBase: 9 things you must do to get great HBase performance
- Configure HBase cluster replication within virtual networks
Next Steps
HDInsight includes several technologies like Hadoop, Storm, Data Lake, HBase and more. It is really hard to cover all the topics. Azure HDInsight tries to be simple, but it requires a lot of time to study all the features.
For more information about this exam, refer to these links:
- Exam 70-775
- Perform Data Engineering on Microsoft Azure HDInsight Community Guide
- 70-775 Perform Data Engineering on Microsoft Azure HDInsight Certification Exam
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips
Article Last Updated: 2017-08-14