Overview

Our client wanted to treat Azure as an extension of their existing on-premise network; anything deployed on Azure had to follow the same rules as on-premise resources. They did not want to allow any traffic coming in or going out to the internet from the Azure virtual machines without going through their on-premise proxy server/firewall.

To address this, a new dedicated virtual network was created on Azure. The virtual network was connected to the on-premise network using a site-to-site VPN tunnel connection (express route, scheduled for later). All traffic was force-tunneled to use client’s existing network. This made all internet requests from Azure go through the on-premise proxy server/firewall.

Due to this network setup, we ran into few major issues as we installed Cloudera Director on the RHEL Virtual Machine on the virtual network.

YUM repositories (RHEL) are not accessible by non-azure IPs

Azure has a strange requirement that it does not allow non-Azure IPs to access its YUM. Azure whitelists only Azure provisioned IP ranges to access its YUM repository on Microsoft servers.

Since the client wanted us to use their on-premise virtual network/subnet that was force tunneling every internet request to its proxy server, we could not access the YUM repository from the Azure cloud. Azure was simply rejecting proxy server YUM request as the proxy server IP was not in the whitelisted Azure IP range.

To address this, we used Routing tables attached to the corresponding subnet of the virtual network and specified a routing rule so that Azure does not hit client’s proxy, instead directly goes to Microsoft YUM

Cloudera director was unable to authenticate Azure credentials

During Cloudera Director installation, we again faced similar issue and this time Cloudera Director was unable to authenticate Azure Service Principal/client credentials provided in the director.

We found out that the root cause of the issue is again Azure whitelisting only Azure provided IPs to access its identity management URL/API. We had to yet again include IP addresses of the Azure management URL to the routing table and this made Cloudera Director automatically use the routing table and get to Azure management URL for authentication bypassing proxy server.

For both steps mentioned above, as part of our configuration and bootstrap scripts, we had to turn off proxy server to get the routing table to work. Once our task was complete, we turned the proxy back on for other tasks that required internet connectivity.

Correct Azure solution

While we used routing tables to address this Azure requirement, the correct way to address this is by making all traffic go through a firewall deployed on Azure such as the ones from Palo Alto networks.

Using an Azure firewall will avoid us using maintenance-heavy routing tables and are more efficient. Azure has hooks in its firewalls to automatically route Azure IP specific traffics and for non-azure network traffic it will follow the rules mentioned in the firewall.

The following are some of the pre-requisites for installing Cloudera Enterprise on Azure.

Azure Service Principal

Cloudera requires a service principal account to be created in Azure before we can deploy director node and clusters on Azure. As of this writing, there is no easy way to create Azure Service Principal other than using Azure CLI or Windows Power shell.

We used Windows Power Shell to do the CLI related tasks for Azure. The Power Shell had to be download from Microsoft site and Azure libraries needed to be downloaded and installed in order to connect to the Azure cloud.

DNS Server capable of Forward and Reverse DNS lookups

By default, Azure provides DNS server for its virtual networks automatically. But as of this writing, Azure DNS servers are not capable for Forward & Reverse DNS lookups.

There are two choices. First is to host DNS in Azure provided VM or use client’s DNS server which usually supports forward/reverse lookups.

If you are going with first choice and decide to host DNS on an Azure VM, make sure you execute appropriate bootstrap scripts from Cloudera/Azure GitHub depending on the O/S version of the VM as detailed below in the link.

https://github.com/cloudera/director-scripts/tree/master/azure-dns-scripts

Quota limit for the Active Directory Service Principal for spinning up virtual machines and adding them to domain

This is often overlooked aspect when doing a cloud based installation. Since Cloudera director is going to be spinning up and tearing down VMs on will, it is important to increase quota limit for the AD service principal used for Cloudera installation.

Summary

While cloud computing offers indisputable benefits, it comes with its own challenges. One such challenge was related to security and our customer had to change many of their security policies and be very flexible to accommodate cloud computing requirement.

Despite facing several setbacks, we were successful in deploying Hadoop cluster on Azure. Using Cloudera Director on cloud, our customer got tremendous flexibility of spinning up and tearing down clusters on-demand using different configurations. We had very good support from both Cloudera and Microsoft throughout the deployment.