Thanks for the fast replies and encouragement.
To clarify, I think that big-data-basic bundle no longer includes Spark.
-
When following this tutorial a prerequisite step includes install of big-data-basic. However, the current documentation for all available bundles does not list Spark anywhere but buildreq-spark.
-
Even with fresh install, then following above tutorial, it appears an clr-update or install of big-data-basic will not populate the examples for spark in the
/usr/share/defaults/spark
pathway. This thread discusses suggested practices for keeping order/organization, partly responsive to any general question about documentation – and spark is one example.
doct0rHu requests clarification about the above on this thread.
With the announcements above, I made working assumptions that
- a) Spark is no longer included or supported through swupd and
- b) I could either try using the stacks from Docker or
- c) compile from source, then move certain things over manually, where these “used to be” when swupd would install Spark.
b) above proved a bit rough because Docker Swarm did not play nice and easy with the image/stack provided by Clear on DockerHub. I did learn a lot, and even tried Kubernetes. I had fits of success, but ultimately since my use case is as described in the initial post on this thread, I moved back to “just Spark” on bare metal.
c) above works now, with the following hiccups
- reboot of Spark Master means I have to run again the scripts for .start master and .start [script name omitted for sensitivity to pejorative different shade of meaning for same word] to connect workers
- Rstudio (desktop) throws an error and won’t connect to Spark; so I spun up rstudio-server is up and I didn’t get to troubleshoot the error here because of 1.
Also, any tips for tracking above and keeping tidy record of it. I wasn’t overstating how much I appreciate the community and existing documentation. It’s a ton of thinking and expertise provided at no cost. To supplement, I have these desk references as follows with title (author) my rating/utility:
- Spark: The Definitive Guide (Bill Chambers) 8 of 10
- Hadoop: The Definitive Guide (Tom White) 8 of 10
- The Kubernetes Book (Nigel Poulton) 7 of 10
- Mastering Spark with R (Javier Luraschi) 4 of 10
- Docker Deep Diver (Nigel Poulton) 7 of 10
- Ubuntu Unleashed (Matthew Helmke) 10 of 10
- Kubernetes: A Step-by-Step Guide for Beginners (Sheldon Miles) 3 of 10
So, I look up whatever I can. But, I finally got stuck enough to ask. My docker/kubernetes journey pointed me back to “just do Spark on bare metal”.
Giving back/engaging more on my end:
- Gently, I’d rate the Clear Linux tutorials as 5 of 10, the community posts 10 of 10, and documentation overall 2 of 10.
- I’m happy to help share/describe more lessons learned, but don’t want to add noise or distraction.
- None of this is a complaint, I only endeavor to figure out my problem(s) while helping others either avoid or leverage my hard headed DIY mistakes.
- I used to use these quite a bit, but wanted Spark and not to pay out the nose for AWS. So, the Rstudio issues I think I can handle, once I get back there. Would be great to offer something to community like this, and maybe I could help?
- Creating mixes and helping incubate a community-based 3rd party repo for bundles is something with chatter across the forum, and I’d be excited to be a part of that effort. I just don’t have real-world experience. But, I’d eagerly do what I could, and think I could help with maintaining, or just writing documentation.