Provide Docker containers for all the key HPCC Systems components and provide a kubernetes configuration to manage the popular HPCC Systems topologies.
The key HPCC Systems components include the HPCC Platform with a combination of plugins, Ganglia monitoring/Nagios monitoring, HPCC Clienttools, etc. Students should automate these docker image building steps. Currently planned support Linux distributions are Ubuntu 14.04 amd64 and CentOS 7 x86_64.
For Kubernetes, students will be asked to implement a cluster configuration which includes HPCC nodes (support/roxie/thor slaves) and Cassandra nodes. The configuration files will be in yaml format and include pods, service, controller, etc. To configure the HPCC cluster, students will be required to write some scripts (python or bash) to find IPs of each of the nodes and generate an environment.xml based on defined topology. HPCC Systems has the Cassandra plugin which also needs some configuration in order to work with Cassandra server nodes.
For the development platform, students can start with docker on Linux or Docker machine on Windows or OS X and Kubernetes on local Linux. To test the Kubernetes cluster student will work on one of the following: AWS, Azure and Google Computing Engine.
The challenge of the project is the learning curve to familiarize them with HPCC Systems, Docker, Kubernetes, and to deploy a complex big data platform on a cloud environment and make sure it works as expected.