Skip to content

Spark Settings

Manage Apache Spark configurations for a Fabric workspace. Spark settings control compute pools, environment defaults, session behavior, and logging.

Overview

Spark settings are mutable objects that you read, modify, and commit back to the API:

import fabias

ws = fabias.workspace("Analytics")

# Get settings (lazy-loaded and cached)
settings = ws.spark.settings

# Modify settings
settings.log = True
settings.pool.customcompute = True
settings.pool.starter.nodes = 10

# Commit all changes
settings.commit()

Settings Structure

The SparkSettings object has five main configuration areas:

Property Type Description
log bool Enable automatic Spark run logging
highconcurrency HighConcurrencySettings Shared session configuration
environment SparkEnvironment Default environment and runtime
jobs JobSettings Job admission and timeout settings
pool PoolSettings Pool configuration and constraints

Logging

Enable or disable automatic logging of Spark runs:

settings = ws.spark.settings

# Enable logging
settings.log = True
settings.commit()

# Check current setting
print(f"Logging enabled: {settings.log}")

High Concurrency

Share Spark sessions across concurrent runs:

settings = ws.spark.settings

# Enable shared sessions for interactive notebooks
settings.highconcurrency.interactive = True

# Enable shared sessions for pipeline Spark activities (if supported by SKU)
settings.highconcurrency.pipelines = True

settings.commit()

SKU Requirements

The pipelines setting is only available on higher-tier Fabric capacity SKUs. It will be None when unsupported.

Default Environment

Set the default Spark environment and runtime version for workspace jobs:

from fabias import Environment

settings = ws.spark.settings

# Set default environment
settings.environment = Environment("ML Environment", "1.3")
settings.commit()

# Clear default environment
settings.environment = Environment(None, None)
settings.commit()

# Check current environment
print(f"Default: {settings.environment.name} v{settings.environment.version}")

Job Settings

Control Spark job admission and session timeout:

settings = ws.spark.settings

# Enable conservative job admission (reserve cores)
settings.jobs.reservecores = True

# Set session timeout (minutes)
settings.jobs.timeout = 30

settings.commit()

Job Properties

Property Type Description
reservecores bool Conservative admission (limits concurrent jobs)
timeout int Idle session timeout in minutes

Pool Settings

Configure compute pools for the workspace:

from fabias import PoolType

settings = ws.spark.settings

# Allow users to customize compute per session
settings.pool.customcompute = True

# Set default pool
settings.pool.default.name = "Production Pool"
settings.pool.default.type = PoolType.WORKSPACE

# Configure starter pool limits
settings.pool.starter.nodes = 10
settings.pool.starter.executors = 5

settings.commit()

Pool Properties

The pool object has three sub-properties:

customcompute (bool)

Whether users can customize compute settings per session/job.

default (DefaultPoolSettings)

Default pool selection: - name (str): Pool display name - type (PoolType): Pool type enum (WORKSPACE, STARTER, CUSTOM)

starter (StarterPoolSettings)

Starter pool size constraints: - nodes (int): Maximum node count - executors (int): Maximum executors per session

Complete Example

import fabias
from fabias import Environment
from fabias import PoolType

ws = fabias.workspace("Analytics")
settings = ws.spark.settings

# Configure all Spark settings
settings.log = True
settings.highconcurrency.interactive = True
settings.highconcurrency.pipelines = True
settings.environment = Environment("Data Engineering", "1.3")
settings.jobs.reservecores = False
settings.jobs.timeout = 60
settings.pool.customcompute = True
settings.pool.default.name = "Standard Pool"
settings.pool.default.type = PoolType.WORKSPACE
settings.pool.starter.nodes = 8
settings.pool.starter.executors = 4

# Commit all changes in one API call
settings.commit()

print("Spark settings updated!")

Refreshing Settings

Settings are cached after first access. To reload from the API:

# Clear cache and reload
ws = fabias.workspace("Analytics")

# Access clears old cache and fetches fresh data
settings = ws.spark.settings

Or force a fresh load:

# Access the private cache property (advanced usage)
ws.spark._settings_cache = None
settings = ws.spark.settings  # Reloads from API

Reading Current Settings

settings = ws.spark.settings

print(f"Automatic Logging: {settings.log}")
print(f"Interactive Sharing: {settings.highconcurrency.interactive}")
print(f"Pipeline Sharing: {settings.highconcurrency.pipelines}")
print(f"Default Environment: {settings.environment.name}")
print(f"Environment Version: {settings.environment.version}")
print(f"Reserve Cores: {settings.jobs.reservecores}")
print(f"Session Timeout: {settings.jobs.timeout} minutes")
print(f"Custom Compute: {settings.pool.customcompute}")
print(f"Default Pool: {settings.pool.default.name} ({settings.pool.default.type})")
print(f"Starter Max Nodes: {settings.pool.starter.nodes}")
print(f"Starter Max Executors: {settings.pool.starter.executors}")

Example: Enable High Concurrency

import fabias

ws = fabias.workspace("Analytics")
settings = ws.spark.settings

# Enable session sharing for better resource utilization
settings.highconcurrency.interactive = True

# Only enable pipelines if supported (check if not None)
if settings.highconcurrency.pipelines is not None:
    settings.highconcurrency.pipelines = True

settings.commit()
print("High concurrency enabled!")

Example: Production Pool Configuration

import fabias
from fabias import PoolType

ws = fabias.workspace("Production")
settings = ws.spark.settings

# Lock down compute to specific pool
settings.pool.customcompute = False
settings.pool.default.name = "Production Pool"
settings.pool.default.type = PoolType.WORKSPACE

# Conservative job admission
settings.jobs.reservecores = True
settings.jobs.timeout = 120  # 2 hours

settings.commit()
print("Production settings configured!")

See Also