User Guide¶

In-depth guides for writing, configuring, scheduling, monitoring, and operating Spark applications with the Kubeflow Spark Operator.

Working with SparkApplications¶

Using SparkApplications

Create, list, check the status of, and delete SparkApplication objects

Writing a SparkApplication

The full anatomy of a SparkApplication spec — types, deps, pods, volumes, and more

Working with SparkApplications

Restart policies, failure handling, and managing running applications

Running on a Schedule

Use ScheduledSparkApplication to run Spark jobs on a cron schedule

Customizing Spark Operator

Tune operator behavior, flags, and Helm chart values

Enabling Leader Election

Run the operator in high-availability mode with leader election

Running Multiple Instances

Deploy several operator instances scoped to different namespaces

Resource Quota Enforcement

Enforce Kubernetes resource quotas on Spark workloads

Monitoring with Prometheus & JMX

Export Spark metrics to Prometheus using the JMX exporter

Volcano Integration

Batch scheduling and gang scheduling with Volcano

YuniKorn Integration

Resource-aware batch scheduling with Apache YuniKorn

Google Cloud Storage & BigQuery

Read and write data with GCS and BigQuery on GKE

Kubeflow Notebooks

Run PySpark jobs from Kubeflow Notebooks