Spark Settings¶

Manage Apache Spark configurations for a Fabric workspace. Spark settings control compute pools, environment defaults, session behavior, and logging.

Overview¶

Spark settings are mutable objects that you read, modify, and commit back to the API:

import fabias

ws = fabias.workspace("Analytics")

# Get settings (lazy-loaded and cached)
settings = ws.spark.settings

# Modify settings
settings.log = True
settings.pool.customcompute = True
settings.pool.starter.nodes = 10

# Commit all changes
settings.commit()

Settings Structure¶

The SparkSettings object has five main configuration areas:

Property	Type	Description
`log`	bool	Enable automatic Spark run logging
`highconcurrency`	HighConcurrencySettings	Shared session configuration
`environment`	SparkEnvironment	Default environment and runtime
`jobs`	JobSettings	Job admission and timeout settings
`pool`	PoolSettings	Pool configuration and constraints

Logging¶

Enable or disable automatic logging of Spark runs:

settings = ws.spark.settings

# Enable logging
settings.log = True
settings.commit()

# Check current setting
print(f"Logging enabled: {settings.log}")

High Concurrency¶

Share Spark sessions across concurrent runs:

settings = ws.spark.settings

# Enable shared sessions for interactive notebooks
settings.highconcurrency.interactive = True

# Enable shared sessions for pipeline Spark activities (if supported by SKU)
settings.highconcurrency.pipelines = True

settings.commit()

SKU Requirements

The pipelines setting is only available on higher-tier Fabric capacity SKUs. It will be None when unsupported.

Default Environment¶

Set the default Spark environment and runtime version for workspace jobs:

from fabias import Environment

settings = ws.spark.settings

# Set default environment
settings.environment = Environment("ML Environment", "1.3")
settings.commit()

# Clear default environment
settings.environment = Environment(None, None)
settings.commit()

# Check current environment
print(f"Default: {settings.environment.name} v{settings.environment.version}")

Job Settings¶

Control Spark job admission and session timeout:

settings = ws.spark.settings

# Enable conservative job admission (reserve cores)
settings.jobs.reservecores = True

# Set session timeout (minutes)
settings.jobs.timeout = 30

settings.commit()

Job Properties¶

Property	Type	Description
`reservecores`	bool	Conservative admission (limits concurrent jobs)
`timeout`	int	Idle session timeout in minutes

Pool Settings¶

Configure compute pools for the workspace:

from fabias import PoolType

settings = ws.spark.settings

# Allow users to customize compute per session
settings.pool.customcompute = True

# Set default pool
settings.pool.default.name = "Production Pool"
settings.pool.default.type = PoolType.WORKSPACE

# Configure starter pool limits
settings.pool.starter.nodes = 10
settings.pool.starter.executors = 5

settings.commit()

Pool Properties¶

The pool object has three sub-properties:

customcompute (bool)¶

Whether users can customize compute settings per session/job.

default (DefaultPoolSettings)¶

Default pool selection: - name (str): Pool display name - type (PoolType): Pool type enum (WORKSPACE, STARTER, CUSTOM)

starter (StarterPoolSettings)¶

Starter pool size constraints: - nodes (int): Maximum node count - executors (int): Maximum executors per session

Complete Example¶

import fabias
from fabias import Environment
from fabias import PoolType

ws = fabias.workspace("Analytics")
settings = ws.spark.settings

# Configure all Spark settings
settings.log = True
settings.highconcurrency.interactive = True
settings.highconcurrency.pipelines = True
settings.environment = Environment("Data Engineering", "1.3")
settings.jobs.reservecores = False
settings.jobs.timeout = 60
settings.pool.customcompute = True
settings.pool.default.name = "Standard Pool"
settings.pool.default.type = PoolType.WORKSPACE
settings.pool.starter.nodes = 8
settings.pool.starter.executors = 4

# Commit all changes in one API call
settings.commit()

print("Spark settings updated!")

Refreshing Settings¶

Settings are cached after first access. To reload from the API:

# Clear cache and reload
ws = fabias.workspace("Analytics")

# Access clears old cache and fetches fresh data
settings = ws.spark.settings

Or force a fresh load:

# Access the private cache property (advanced usage)
ws.spark._settings_cache = None
settings = ws.spark.settings  # Reloads from API

Reading Current Settings¶

settings = ws.spark.settings

print(f"Automatic Logging: {settings.log}")
print(f"Interactive Sharing: {settings.highconcurrency.interactive}")
print(f"Pipeline Sharing: {settings.highconcurrency.pipelines}")
print(f"Default Environment: {settings.environment.name}")
print(f"Environment Version: {settings.environment.version}")
print(f"Reserve Cores: {settings.jobs.reservecores}")
print(f"Session Timeout: {settings.jobs.timeout} minutes")
print(f"Custom Compute: {settings.pool.customcompute}")
print(f"Default Pool: {settings.pool.default.name} ({settings.pool.default.type})")
print(f"Starter Max Nodes: {settings.pool.starter.nodes}")
print(f"Starter Max Executors: {settings.pool.starter.executors}")

Example: Enable High Concurrency¶

import fabias

ws = fabias.workspace("Analytics")
settings = ws.spark.settings

# Enable session sharing for better resource utilization
settings.highconcurrency.interactive = True

# Only enable pipelines if supported (check if not None)
if settings.highconcurrency.pipelines is not None:
    settings.highconcurrency.pipelines = True

settings.commit()
print("High concurrency enabled!")

Example: Production Pool Configuration¶

import fabias
from fabias import PoolType

ws = fabias.workspace("Production")
settings = ws.spark.settings

# Lock down compute to specific pool
settings.pool.customcompute = False
settings.pool.default.name = "Production Pool"
settings.pool.default.type = PoolType.WORKSPACE

# Conservative job admission
settings.jobs.reservecores = True
settings.jobs.timeout = 120  # 2 hours

settings.commit()
print("Production settings configured!")