当前位置: 首页 > news >正文

PySpark -

Spark installed on WSL.

Java installed on WSL.

Jupyter ran on WSL:

jupyter-lab --no-browser

 

Code ran in Windows browser:

spark = (SparkSession.builder.master('spark://ZZHPC.localdomain:7077').appName('Read Data from PostgreSQLL').getOrCreate())25/11/22 20:06:34 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable TypeError: 'JavaPackage' object is not callable

 

Root causes remaining:

1️⃣ Spark master unreachable (most common)

You're using:

 
.master("spark://ZZHPC.localdomain:7077")

WSL hostname resolution is often wrong.

Run in WSL:

 
ping ZZHPC.localdomain

If it resolves to 127.0.1.1broken.

Then try:

 
ping ZZHPC hostname -I

If Spark master is actually using 10.255.255.254,
you must connect with:

 
.master("spark://10.255.255.254:7077")

Hostname mismatch → Java gateway fails → JavaPackage error.


2️⃣ No worker registered

Open Spark master UI:

 
http://ZZHPC.localdomain:8080

(or use the IP version)

Check:

  • Workers = 0 → cluster can't run → JavaPackage errors inside SparkContext

Start worker:

 
$SPARK_HOME/sbin/start-worker.sh spark://ZZHPC.localdomain:7077

Or using IP:

 
$SPARK_HOME/sbin/start-worker.sh spark://10.255.255.254:7077

3️⃣ Spark version mismatch (common in WSL setups)

Run:

 
pyspark --version

and

 
$SPARK_HOME/bin/spark-shell --version

If versions differ → classloader breaks → JavaPackage errors.


4️⃣ Python kernel in Jupyter is NOT your WSL Python

This happens if you selected a Windows Python kernel accidentally.

Inside a Jupyter cell, run:

 
import platform, sys platform.system(), sys.executable

You should see:

 
('Linux', '/usr/bin/python3')

If you see:

 
('Windows', 'C:\\...python.exe')

→ you are using a Windows kernel (wrong!) even though Jupyter was started in WSL.

This will DEFINITELY produce the 'JavaPackage' error.

Fix:
Select the correct kernel:

  • In Jupyter: Kernel → Change Kernel → Python (WSL)

  • Or ensure you launched Jupyter from the virtual environment in WSL.


🎯 Let’s run 3 quick diagnostics to identify the real cause

Please run these inside your notebook:

Test 1: Check OS of the Python kernel

 
import platform, sys platform.system(), sys.executable

Test 2: Check if PySpark can find the Java gateway

 
import pyspark print(pyspark.__version__)

Test 3: Check Spark master connectivity

In WSL terminal:

 
curl http://ZZHPC.localdomain:8080

Then try:

 
curl http://10.255.255.254:8080

📌 Summary

Since Jupyter server is in WSL,
you are very close to a working setup.
The remaining likely causes are:

  1. Wrong Python kernel (Windows kernel selected)

  2. Wrong master URL (hostname resolves badly)

  3. Worker not running

  4. Spark version mismatch

Send me the results of the 3 tests,
and I’ll pinpoint the exact cause in one message.