User Guide

In-depth guides for writing, configuring, scheduling, monitoring, and operating Spark applications with the Kubeflow Spark Operator.


Working with SparkApplications

Using SparkApplications

Create, list, check the status of, and delete SparkApplication objects

Using SparkApplications
Writing a SparkApplication

The full anatomy of a SparkApplication spec — types, deps, pods, volumes, and more

Writing a SparkApplication
Working with SparkApplications

Restart policies, failure handling, and managing running applications

Working with SparkApplications
Running on a Schedule

Use ScheduledSparkApplication to run Spark jobs on a cron schedule

Running Spark Applications on a Schedule

Operating the Operator

Customizing Spark Operator

Tune operator behavior, flags, and Helm chart values

Customizing Spark Operator
Enabling Leader Election

Run the operator in high-availability mode with leader election

Enabling Leader Election
Running Multiple Instances

Deploy several operator instances scoped to different namespaces

Running Multiple Instances of the Spark Operator
Resource Quota Enforcement

Enforce Kubernetes resource quotas on Spark workloads

Enabling Resource Quota Enforcement

Monitoring & Scheduling

Monitoring with Prometheus & JMX

Export Spark metrics to Prometheus using the JMX exporter

Monitoring Spark Applications with Prometheus and JMX Exporter
Volcano Integration

Batch scheduling and gang scheduling with Volcano

Integration with Volcano for Batch Scheduling
YuniKorn Integration

Resource-aware batch scheduling with Apache YuniKorn

Integration with YuniKorn

Integrations

Google Cloud Storage & BigQuery

Read and write data with GCS and BigQuery on GKE

Integration with Google Cloud Storage and BigQuery
Kubeflow Notebooks

Run PySpark jobs from Kubeflow Notebooks

Integration with Kubeflow Notebooks