Introduction to Hadoop:
- RDBMS vs Hadoop
- Ecosystem tour (9 products)
- Vendor comparison (Cloudera, Hortonworks, MapR, Amazon EMR)
- Hardware Recommendations
HDFS: File System details
- NameNode and DataNode architecture
- Write pipeline
- Read pipeline
- Heartbeats
- Rack awareness
- Block scanner
MapReduce:
- JobTracker/TaskTracker architecture
- Shuffle: Sort + Partitioning
- Speculative Execution
- input/output formats
- distributed cache
Pig:
- Pig philosophy and architecture
- Grunt shell
- Loading data
- Exploring Pig
- Latin commands
Hive:
- Hive architecture
- Hive vs RDBMS
- HiveQL and the shell
- Managing tables (external vs managed)
- Data types and schemas
- Partitions and buckets
HBase:
- Architecture and schema design
- HBase vs. RDBMS
- HMaster and Region Servers
- Column Families and Regions
- Write pipeline
- Read pipeline
Next-Gen Hadoop:
- Intro to the high level concepts coming in Hadoop 2.0:
- HDFS HA
- HDFS Federation
- MapReduce 2.0
- Flume
- SQOOP
For any further information on demo or Queries please revert back to admin@cloudera.training or
Contact us on 615 266 6667