Managing cluster instances¶
Spark serverless doesn’t really need complex management or maintenance of the Spark cluster. Upgrading, scaling-out, optimization, and other complex tasks are handled automatically. Enjoy zero maintenance serverless experience.
All you need to do is simple task, such as Start or Stop cluster instances when you need.
Cluster instance management operations can be done either programmatically using Python client library or with mouse clicks from Instance management menu.
Create a new Spark cluster instance¶
You can create multiple Spark serverless cluster instances in one or more Kubernetes cluster (SKE). See Create Kubernetes cluster section to create a SKE.
You can create a cluster instance by creating a spark session from your Python environment.
- Create spark session with the default configuration
import ods ods.init(ske="my-ske") spark = ods.spark("my-cluster").session()
- Create spark session with 3 initial worker nodes
import ods ods.init(ske="my-ske") spark = ods.spark("my-cluster", worker_num=3).session()
- Create spark session with delta lake support
import ods ods.init(ske="my-ske") spark = ods.spark("my-cluster", delta=True).session()
Note
pip install ods
to install ods library.
Python version 3.6, 3.7, 3.8 are supported.
Done! You have Spark session that is connected to executors running remotely on the cloud. No application packaging and job submit to the cluster required.
Your Spark session is capable of doing interactive computing. That means, you can use Spark session in Python REPL or in the Notebook.
Note
It may take a few seconds to minutes for executors to be fully ready. See next section to monitor status of executors.
Stop Spark cluster instance¶
In Instance management menu menu,
You can find Stop
(Start
) and Terminate
button.
- Stop
Stop all executors. Can be (re)started later. Data stored in persistent volume is not removed.
Python API equivalent is
# 'spark' is spark session created from 'spark = ods.spark("my-cluster").session()' spark.stop()
- Terminate
Stop all executors permanently. Can not be restarted. Data stored in persistent volume is also removed.
Python API equivalent is
ods.spark("my-cluster").delete()