Figured I’d stop lurking, and first say thank you to the Clear Linux team and all that post here.
Cut to the chase: is it a terrible move to have compiled Spark from source and unpacked/installed from user home?
I have a home lab setup, which is meant to run a Spark cluster. Cluster will be a one-trick pony of sorts – want to use RStudio/ R API to Spark for ML/modeling. Since my own limitations (skills, patience) meant I could only keep up Docker Swarm, never K8s, I switched back to doing Spark on bare metal. So, here’s my “dumb noob” move – I compiled from source, after using the jdk13 bundle install via swupd. I set the JAVA_HOME per other posts about it, but unpacked and installed from home. All worked surprisingly well – loads less heartache than having Swarm go down / trying to find the right setup for using the stacks/images.
Should I stop and re-do everything?
I understand on a basic level about the stateless approach of CL etc., but this is just a home lab and it’s DIY learn by doing. I’m not ever going to production or developing something, etc. I simply want to apply existing R skills/knowledge and play with ML through Spark and Synthea data (https://synthetichealth.github.io/synthea/). It’s also pretty crazy how much faster my ten Braswell NUC’s are on CL versus Ubuntu, maybe just placebo?
Originally, I picked Clear because of the stacks, and was stuck inside like all, in need of an inside only weekend hobby. By the time I had enough repurposed weekend/commute time in getting feet wet with Docker, etc., I realized there would be no more direct support for Spark through swupd. I get that move, too, and bet it may have some small part to do with licensing (https://www.scotusblog.com/case-files/cases/google-llc-v-oracle-america-inc/)?