Spark Settings¶
Manage Apache Spark configurations for a Fabric workspace. Spark settings control compute pools, environment defaults, session behavior, and logging.
Overview¶
Spark settings are mutable objects that you read, modify, and commit back to the API:
import fabias
ws = fabias.workspace("Analytics")
# Get settings (lazy-loaded and cached)
settings = ws.spark.settings
# Modify settings
settings.log = True
settings.pool.customcompute = True
settings.pool.starter.nodes = 10
# Commit all changes
settings.commit()
Settings Structure¶
The SparkSettings object has five main configuration areas:
| Property | Type | Description |
|---|---|---|
log |
bool | Enable automatic Spark run logging |
highconcurrency |
HighConcurrencySettings | Shared session configuration |
environment |
SparkEnvironment | Default environment and runtime |
jobs |
JobSettings | Job admission and timeout settings |
pool |
PoolSettings | Pool configuration and constraints |
Logging¶
Enable or disable automatic logging of Spark runs:
settings = ws.spark.settings
# Enable logging
settings.log = True
settings.commit()
# Check current setting
print(f"Logging enabled: {settings.log}")
High Concurrency¶
Share Spark sessions across concurrent runs:
settings = ws.spark.settings
# Enable shared sessions for interactive notebooks
settings.highconcurrency.interactive = True
# Enable shared sessions for pipeline Spark activities (if supported by SKU)
settings.highconcurrency.pipelines = True
settings.commit()
SKU Requirements
The pipelines setting is only available on higher-tier Fabric capacity SKUs. It will be None when unsupported.
Default Environment¶
Set the default Spark environment and runtime version for workspace jobs:
from fabias import Environment
settings = ws.spark.settings
# Set default environment
settings.environment = Environment("ML Environment", "1.3")
settings.commit()
# Clear default environment
settings.environment = Environment(None, None)
settings.commit()
# Check current environment
print(f"Default: {settings.environment.name} v{settings.environment.version}")
Job Settings¶
Control Spark job admission and session timeout:
settings = ws.spark.settings
# Enable conservative job admission (reserve cores)
settings.jobs.reservecores = True
# Set session timeout (minutes)
settings.jobs.timeout = 30
settings.commit()
Job Properties¶
| Property | Type | Description |
|---|---|---|
reservecores |
bool | Conservative admission (limits concurrent jobs) |
timeout |
int | Idle session timeout in minutes |
Pool Settings¶
Configure compute pools for the workspace:
from fabias import PoolType
settings = ws.spark.settings
# Allow users to customize compute per session
settings.pool.customcompute = True
# Set default pool
settings.pool.default.name = "Production Pool"
settings.pool.default.type = PoolType.WORKSPACE
# Configure starter pool limits
settings.pool.starter.nodes = 10
settings.pool.starter.executors = 5
settings.commit()
Pool Properties¶
The pool object has three sub-properties:
customcompute (bool)¶
Whether users can customize compute settings per session/job.
default (DefaultPoolSettings)¶
Default pool selection:
- name (str): Pool display name
- type (PoolType): Pool type enum (WORKSPACE, STARTER, CUSTOM)
starter (StarterPoolSettings)¶
Starter pool size constraints:
- nodes (int): Maximum node count
- executors (int): Maximum executors per session
Complete Example¶
import fabias
from fabias import Environment
from fabias import PoolType
ws = fabias.workspace("Analytics")
settings = ws.spark.settings
# Configure all Spark settings
settings.log = True
settings.highconcurrency.interactive = True
settings.highconcurrency.pipelines = True
settings.environment = Environment("Data Engineering", "1.3")
settings.jobs.reservecores = False
settings.jobs.timeout = 60
settings.pool.customcompute = True
settings.pool.default.name = "Standard Pool"
settings.pool.default.type = PoolType.WORKSPACE
settings.pool.starter.nodes = 8
settings.pool.starter.executors = 4
# Commit all changes in one API call
settings.commit()
print("Spark settings updated!")
Refreshing Settings¶
Settings are cached after first access. To reload from the API:
# Clear cache and reload
ws = fabias.workspace("Analytics")
# Access clears old cache and fetches fresh data
settings = ws.spark.settings
Or force a fresh load:
# Access the private cache property (advanced usage)
ws.spark._settings_cache = None
settings = ws.spark.settings # Reloads from API
Reading Current Settings¶
settings = ws.spark.settings
print(f"Automatic Logging: {settings.log}")
print(f"Interactive Sharing: {settings.highconcurrency.interactive}")
print(f"Pipeline Sharing: {settings.highconcurrency.pipelines}")
print(f"Default Environment: {settings.environment.name}")
print(f"Environment Version: {settings.environment.version}")
print(f"Reserve Cores: {settings.jobs.reservecores}")
print(f"Session Timeout: {settings.jobs.timeout} minutes")
print(f"Custom Compute: {settings.pool.customcompute}")
print(f"Default Pool: {settings.pool.default.name} ({settings.pool.default.type})")
print(f"Starter Max Nodes: {settings.pool.starter.nodes}")
print(f"Starter Max Executors: {settings.pool.starter.executors}")
Example: Enable High Concurrency¶
import fabias
ws = fabias.workspace("Analytics")
settings = ws.spark.settings
# Enable session sharing for better resource utilization
settings.highconcurrency.interactive = True
# Only enable pipelines if supported (check if not None)
if settings.highconcurrency.pipelines is not None:
settings.highconcurrency.pipelines = True
settings.commit()
print("High concurrency enabled!")
Example: Production Pool Configuration¶
import fabias
from fabias import PoolType
ws = fabias.workspace("Production")
settings = ws.spark.settings
# Lock down compute to specific pool
settings.pool.customcompute = False
settings.pool.default.name = "Production Pool"
settings.pool.default.type = PoolType.WORKSPACE
# Conservative job admission
settings.jobs.reservecores = True
settings.jobs.timeout = 120 # 2 hours
settings.commit()
print("Production settings configured!")
See Also¶
- Workspaces - Workspace management
- Environments - Environment items
- Notebooks - Notebook execution