Post

Spark

Spark

This is an extension of my 2025 Learning Log.

Reviewing Spark (PySpark) through this course Taming Big Data With Apache Spark

Setting up

Apache Spark 3.x is only compatible with Java 8, Java 11, or Java 17, and Apache Spark 4 is only compatible with Java 17.

Currently, Spark is not compatible with Python 3.12 or newer.

So I needed to install alternative lower versions of Java and Python.

Install Java 11

1
brew install openjdk@11

Make sure that it is the default Java in the system

1
2
3
4
5
6
cd /Library/Java/JavaVirtualMachines/jdk-23.jdk/Contents
mv Info.plist Info.plist.disabled

sudo ln -sfn /opt/homebrew/opt/openjdk\@11/libexec/openjdk.jdk /Library/Java/JavaVirtualMachines/openjdk.jdk
export JAVA_HOME=`/usr/libexec/java_home -v 11`
java -version

Downgrade Python

1
brew install python@3.10

Create a virtual environment

1
2
3
4
5
6
python3.10 -m pip3 install virtualenv
python3.10 -m virtualenv .venv
source .venv/bin/activate

# test
pyspark

To test

1
spark-submit test.py
This post is licensed under CC BY 4.0 by the author.