Cloudera QuickStart VM Setup

Introduction

Cloudera is a leading data management company.  Cloudera has a range of products built on Apache Hadoop and consulting services.  For training purposes, Cloudera has a QuickStart that will allow people to quickly set up an environment and start writing code.

Downloads

  1. Cloudera QuickStarts for CDH

NOTE:  4.8 GB Download – so plan accordingly!  J

http://www.cloudera.com/downloads.html

This blog post will use CDH 5.8, which is Cloudera’s open source software distribution.  We will also use VMWare player on a Windows 10 workstation.

Under “Get Started Now”, select the following options:
Version:  QuickStarts for CDH 5.8
Platform:  VMWare

There will be a quick information form to fill out, then the download will begin.

  1. VMWare Workstation Player

In order to run Cloudera’s CDH, you will need VMWare Workstation Player.  Download and install VMWare player.

https://my.vmware.com/web/vmware/free#desktop_end_user_computing/vmware_workstation_player/12_0

  1. PuTTY (Optional, but highly recommended)

PuTTY is a nice terminal window for SSH and Telnet.

http://www.chiark.greenend.org.uk/~sgtatham/putty/latest.html

  1. WinSCP (Optional, but highly recommended)

WinSCP is a nice GUI for file transfers between Windows and various other hosts.

https://winscp.net/eng/download.php

Running Cloudera QuickStart VM

Cloudera QuickStart VM zip file (cloudera-quickstart-vm-5.8.0-0-vmware.zip) can be moved to a directory on a local drive, or on a USB drive.  Extract cloudera-quickstart-vm-5.8.0-0-vmware.zip file.  In directory where files were extracted, double click on cloudera-quickstart-vm-5.8.0-0-vmware.vmx VM virtual machine configuration file.

Notes:

  1. The first time VMWare Workstation Player is run, you may be prompted to install updates. You can install the updates, or click “Remind Me Later” button to defer updates.
  2. The Cloudera QuickStart VM will take a few minutes to start up – be patient.

Connect to Cloudera QuickStart VM

Once the Cloudera QuickStart VM is running, start a terminal session using Applications | System Tools | Terminal option.

At the Linux command prompt, type ifconfig and the enter key.  The “inet addr:” is the IP address of the VM that can be used by PuTTY and WinSCP.  Use cloudera for username and password.

Hadoop System Monitoring

You can look at system statistics within the Cloudera QuickStart VM by opening a web browser in VM and go to the following URL:

http://localhost:50070

Quick System Validation / Hadoop Commands

Type the following the commands at the Cloudera QuickStart VM command prompt

hadoop fs –ls
hadoop fs –mkdir input
hadoop fs –mkdir output
hadoop fs -ls

Put a Text File in HDFS

  1. Download file nfl_2016_games_allseason.txt from the following Github site.

https://github.com/mndatascienceexamples/datascienceexamples

  1. Use WinSCP to copy file from Windows workstation to Cloudera QuickStart VM.
  2. Use hadoop command to put nfl_2016_games_allseason.txt file in input directory.

hadoop fs -ls input
hadoop fs -put nfl_2016_games_allseason.txt input
hadoop fs -ls input

Shutdown Cloudera QuickStart VM

At the command prompt, type command:

sudo halt

Additional Hadoop Tutorials

Cloudera’s site for people who are new to Hadoop ecosystem:

http://www.cloudera.com/get-started/developers.html

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

Up ↑

%d bloggers like this: