Compiling Spark from Source

Kris_Corwin · June 27, 2020, 4:54pm

Thanks for the fast replies and encouragement.

To clarify, I think that big-data-basic bundle no longer includes Spark.

When following this tutorial a prerequisite step includes install of big-data-basic. However, the current documentation for all available bundles does not list Spark anywhere but buildreq-spark.
Even with fresh install, then following above tutorial, it appears an clr-update or install of big-data-basic will not populate the examples for spark in the /usr/share/defaults/spark pathway. This thread discusses suggested practices for keeping order/organization, partly responsive to any general question about documentation – and spark is one example.

doct0rHu requests clarification about the above on this thread.

With the announcements above, I made working assumptions that

a) Spark is no longer included or supported through swupd and
b) I could either try using the stacks from Docker or
c) compile from source, then move certain things over manually, where these “used to be” when swupd would install Spark.

b) above proved a bit rough because Docker Swarm did not play nice and easy with the image/stack provided by Clear on DockerHub. I did learn a lot, and even tried Kubernetes. I had fits of success, but ultimately since my use case is as described in the initial post on this thread, I moved back to “just Spark” on bare metal.

c) above works now, with the following hiccups

reboot of Spark Master means I have to run again the scripts for .start master and .start [script name omitted for sensitivity to pejorative different shade of meaning for same word] to connect workers
Rstudio (desktop) throws an error and won’t connect to Spark; so I spun up rstudio-server is up and I didn’t get to troubleshoot the error here because of 1.

Also, any tips for tracking above and keeping tidy record of it. I wasn’t overstating how much I appreciate the community and existing documentation. It’s a ton of thinking and expertise provided at no cost. To supplement, I have these desk references as follows with title (author) my rating/utility:

Spark: The Definitive Guide (Bill Chambers) 8 of 10
Hadoop: The Definitive Guide (Tom White) 8 of 10
The Kubernetes Book (Nigel Poulton) 7 of 10
Mastering Spark with R (Javier Luraschi) 4 of 10
Docker Deep Diver (Nigel Poulton) 7 of 10
Ubuntu Unleashed (Matthew Helmke) 10 of 10
Kubernetes: A Step-by-Step Guide for Beginners (Sheldon Miles) 3 of 10

So, I look up whatever I can. But, I finally got stuck enough to ask. My docker/kubernetes journey pointed me back to “just do Spark on bare metal”.

Giving back/engaging more on my end:

Gently, I’d rate the Clear Linux tutorials as 5 of 10, the community posts 10 of 10, and documentation overall 2 of 10.
I’m happy to help share/describe more lessons learned, but don’t want to add noise or distraction.
None of this is a complaint, I only endeavor to figure out my problem(s) while helping others either avoid or leverage my hard headed DIY mistakes.
I used to use these quite a bit, but wanted Spark and not to pay out the nose for AWS. So, the Rstudio issues I think I can handle, once I get back there. Would be great to offer something to community like this, and maybe I could help?
Creating mixes and helping incubate a community-based 3rd party repo for bundles is something with chatter across the forum, and I’d be excited to be a part of that effort. I just don’t have real-world experience. But, I’d eagerly do what I could, and think I could help with maintaining, or just writing documentation.