Our Data & Analytics team explores key features, business benefits, industry applications, and technical capabilities of Microsoft Cortana Intelligence Suite.
Blog nine of a series.Information management is the first step to take advantage of the comprehensive big data and advanced analytics processing capabilities that Microsoft Cortana Intelligence Suite offers.
This data ingestion step is important because this it's how you will manage and store various source data in the Cortana Suite. Like other components of the Cortana Suite, Microsoft has provided a set of Azure cloud-based services that include the necessary infrastructure, platforms, software, networking facilities and more. They are all built on scalable, accessible architectures with pre-built components that make it easy for developers to focus on business solutions. Given the size and complexity of big data and associated analytics, Microsoft Cortana's Information Management capabilities help users effectively automate data processing tasks while reducing ownership costs.Cortana Suite's Information Management includes three Azure services:
- Azure Data Catalog
- Azure Data Factory
- Azure Event Hubs
Learn more about each Azure service below:
Azure Data Catalog
Azure Data Catalog is a fully, user-managed cloud service that enables users to discover needed data sources, internal or external. It also helps them understand the data sources they find. It even includes a crowdsourcing model for extended enterprise MDM solutions. Data Catalog provides capabilities for every user - from data stewards and analysts to data scientists or developers. Users are able to discover, register, manage, search, consume and document all data sources, as needed, regardless of whether the data sets are in the cloud or on-premises. By automating all of those data functions, Data Catalog facilitates self-service business intelligence (BI) by business users. In addition, Data Catalog has expanded common use cases of traditional on-premises data sets to include those in the cloud. Here's what it provides:- A documentation portal for centralized data sources such as transactional data from lines of business, analytical data from the data warehouse, marts, and data lakes
- Self-service BI and MDM such as managing enterprise data dictionary, persistent data sets, data quality assurance, data publication and reporting.
- Data discovery and provisionings such as managing the search, filter and registration of various source data, data ownership, data access, and data visibility.
Azure Data Factory
Another component of Cortana Suite's Information Management: Azure Data Factory is a cloud-based data integration service that orchestrates and automates the movement and transformation of data in the cloud or on-premises. Users can create automated data integration solutions using the Data Factory service that can ingest data from various data stores, transform and process the data, and publish the result data to the target data stores. Data Factory service allows users to create data pipelines that move and transform data of large scale, and then run the pipelines on a specified schedule (hourly, daily, weekly, etc.) to enable operationalized analytical solutions. It also provides rich visualizations to display the lineage and dependencies between the data pipelines and monitor all of the data pipelines from a single unified view to debug issues and set up monitoring alerts. Azure Data Factory supports the following transformation activities: HDInsight Hive, Pig, MapReduce, Hadoop Streaming, as well as Azure Machine Learning, Stored Procedure, and Data Lake Analytics. All of which can be added to the Azure cloud. Common use cases of Azure Data Factory include:- Big data processing based on HDInsight and big data stores of Azure
- Operationalized advanced analytics based on Azure scheduled operations
- Source data staging
- Source data integration
Azure Event Hubs
Cortana Suite also includes Azure Event Hubs, a data ingestion service in the Azure Cloud that works with large-scale and complex data streams from a variety of data sources. Event Hubs acts as the "front door" for an event pipeline. Once data is collected into an Event Hub, it can be transformed and stored using any real-time analytics provider or batching/storage adapters. Event Hubs decouples the production of a stream of events from the consumption of those events, so that event consumers can access the events on their own schedule. In other words, Event Hubs is an event processing service that provides event and telemetry ingress to the cloud at a massive scale, with low latency and high reliability. This service, used with other downstream services, is particularly useful in application instrumentation, user experience or workflow processing, and Internet of Things (IoT) scenarios. Different from traditional enterprise messaging, Event Hubs are oriented more towards high throughput and processing flexibility for event streams, while the former typically requires more sophisticated capabilities such as sequencing, dead-lettering, transaction support, and strong delivery assurances.Common use cases of Azure Event Hubs include:
- Live Feeds of Machine Telemetry
- Live Feeds of Traffic Conditions
- Live Feeds of Web Logs
- Live Feeds of Social Media Sentiment
- Real-time Feeds of Identity Protection