Teaching big data with a virtual cluster
Eckroth J.
2016
SIGCSE 2016 - Proceedings of the 47th ACM Technical Symposium on Computing Science Education
12
10.1145/2839509.2844651
Both Indust. and Acad. Are Confronting the Challenge of Big Data, I.e., Data Proc. That Involves Data so Voluminous or Arriving at Such High Velocity That No Single Commodity Mach. Is Capable of Storing or Proc. Them All. A Com. Approach to Handling Big Data Is to Divide and Distribute the Proc. Job to A Cluster of Mach.. Ideally, A Course That Teaches Students How to Wk. with Big Data Would Provide Students Access to A Cluster for Hands-on Pract.. However, A Cluster of Phys., On-premise Mach. May Be Prohibitively Expensive, Particularly at Smaller Institutions with Smaller Budgets. in This Report. We Summarize Our Experiences Developing and Using A Virtual Cluster in A Big Data Mining and Analytics Course at A Small Priv. Liberal Arts Coll.. A Single Moderately-sized Server Hosts A Cluster of Virtual Mach., Which Run the Popular Apache Hadoop Syst.. the Virtual Cluster Gives Students Hands-on Experience and Costs Less Than An Equal No. of Phys. Mach.. It Is Also Easily Constructed and Reconfigured. We Describe Our Implementation, Analyze Its Perf. Characteristics, and Compare Costs with Phys. Clusters and the Amazon Elastic MapReduce Cloud Serv.. We Summarize Our Use of the Virtual Cluster in the Classroom and Show Student Feedback.
Big data; Cloud computing; Curriculum; Virtual machines
Barielle S., Calculating TCO for energy, IBM Systems Magazine: Power, pp. 38-40, (2011); Brown R., Shoop E., Teaching undergraduates using local virtual clusters, IEEE International Conference on Cluster Computing (CLUSTER), pp. 1-8, (2013); Brown R.A., Hadoop at home: Large-scale computing at a small college, ACM SIGCSE Bulletin, 41, pp. 106-110, (2009); Johnson E., Garrity P., Yates T., Brown R., Et al., Performance of a virtual cluster in a general-purpose teaching laboratory, IEEE International Conference on Cluster Computing (CLUSTER), pp. 600-604, (2011); Ngo L.B., Duffy E.B., Apon A.W., Teaching HDFS/MapReduce systems concepts to undergraduates, Parallel & Distributed Processing Symposium, Workshops (IPDPSW), 2014 IEEE International, pp. 1114-1121, (2014); Rabkin A.S., Reiss C., Katz R., Patterson D., Experiences teaching mapreduce in the cloud, Proceedings of the 43rd ACM Technical Symposium on Computer Science Education, pp. 601-606, (2012); Shvachko K., Kuang H., Radia S., Chansler R., The Hadoop distributed file system, IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1-10, (2010)
Association for Computing Machinery, Inc
Conference paper
All Open Access; Bronze Open Access
Scopus