Cloudera is a leading data management company. Cloudera has a range of products built on Apache Hadoop and consulting services. For training purposes, Cloudera has a QuickStart that will allow people to quickly set up an environment and start writing code.
- Cloudera QuickStarts for CDH
NOTE: 4.8 GB Download – so plan accordingly! J
This blog post will use CDH 5.8, which is Cloudera’s open source software distribution. We will also use VMWare player on a Windows 10 workstation.
Under “Get Started Now”, select the following options:
Version: QuickStarts for CDH 5.8
There will be a quick information form to fill out, then the download will begin.
- VMWare Workstation Player
In order to run Cloudera’s CDH, you will need VMWare Workstation Player. Download and install VMWare player.
- PuTTY (Optional, but highly recommended)
PuTTY is a nice terminal window for SSH and Telnet.
- WinSCP (Optional, but highly recommended)
WinSCP is a nice GUI for file transfers between Windows and various other hosts.
Running Cloudera QuickStart VM
Cloudera QuickStart VM zip file (cloudera-quickstart-vm-5.8.0-0-vmware.zip) can be moved to a directory on a local drive, or on a USB drive. Extract cloudera-quickstart-vm-5.8.0-0-vmware.zip file. In directory where files were extracted, double click on cloudera-quickstart-vm-5.8.0-0-vmware.vmx VM virtual machine configuration file.
- The first time VMWare Workstation Player is run, you may be prompted to install updates. You can install the updates, or click “Remind Me Later” button to defer updates.
- The Cloudera QuickStart VM will take a few minutes to start up – be patient.
Connect to Cloudera QuickStart VM
Once the Cloudera QuickStart VM is running, start a terminal session using Applications | System Tools | Terminal option.
At the Linux command prompt, type ifconfig and the enter key. The “inet addr:” is the IP address of the VM that can be used by PuTTY and WinSCP. Use cloudera for username and password.
Hadoop System Monitoring
You can look at system statistics within the Cloudera QuickStart VM by opening a web browser in VM and go to the following URL:
Quick System Validation / Hadoop Commands
Type the following the commands at the Cloudera QuickStart VM command prompt
hadoop fs –ls
hadoop fs –mkdir input
hadoop fs –mkdir output
hadoop fs -ls
Put a Text File in HDFS
- Download file nfl_2016_games_allseason.txt from the following Github site.
- Use WinSCP to copy file from Windows workstation to Cloudera QuickStart VM.
- Use hadoop command to put nfl_2016_games_allseason.txt file in input directory.
hadoop fs -ls input
hadoop fs -put nfl_2016_games_allseason.txt input
hadoop fs -ls input
Shutdown Cloudera QuickStart VM
At the command prompt, type command:
Additional Hadoop Tutorials
Cloudera’s site for people who are new to Hadoop ecosystem: