API Reference

This page is generated with sphinx.ext.autodoc and sphinx.ext.autosummary directly from the source docstrings.

Core pipeline

Core pipeline orchestrator for Sensor2EventLog framework

class core.pipeline.Sensor2EventLogPipeline(config_module=None)[source]

Bases: object

Main pipeline class for transforming sensor data to event logs.

This class orchestrates the entire Machine Teaching process: 1. Feature extraction 2. Model training (HMM with supervised/unsupervised modes) 3. Diagnostic analysis 4. Event log generation

Example

>>> pipeline = Sensor2EventLogPipeline(config)
>>> result = pipeline.run(
...     data_path="sensor_data.csv",
...     feature_plan=feature_plan,
...     mode="unsupervised"
... )
>>> event_log = result['event_log']
>>> event_log.to_xes("output.xes")

__init__(config_module=None)[source]

Initialize the pipeline with configuration.

Parameters:

config_modulemodule, optional: Configuration module with all parameters. If None, uses default config

run(data_path: str, feature_plan: Dict[str, list], mode: str = 'unsupervised', use_cip: bool = False, n_unsup: int | None = None, random_seed: int = 42, min_duration_seconds: float = 2.0, return_intermediate: bool = False) → Dict[str, Any][source]

Run the complete pipeline.

Parameters:

data_pathstr: Path to CSV data file
feature_plandict: Feature extraction plan with families and signals
modestr: “supervised” or “unsupervised”
use_cipbool: Whether to include CIP states
n_unsupint: Number of states for unsupervised mode
random_seedint: Random seed for reproducibility
min_duration_secondsfloat: Minimum duration for filtering brief states
return_intermediatebool: If True, returns intermediate results (features, diagnostics)

Returns:

dict with keys:

event_log: EventLog object
model: trained HMM model
predictions: predicted state sequences
features (if return_intermediate): extracted features
diagnostics (if return_intermediate): diagnostic results

Machine Teaching loop

Machine Teaching loop for Sensor2EventLog framework.

class abstraction.mt_loop.MachineTeachingLoop(model_type: str, feature_extractor, diagnostic_analyzer, config)[source]

Bases: object

Orchestrates feature extraction, diagnostics, and model training.

get_review_summary() → Dict[str, Any] | None[source]

run(df: DataFrame, feature_plan: Dict[str, list], mode: str = 'unsupervised', n_unsup: int | None = None, random_seed: int = 42) → Dict[str, Any][source]

Event log

Event log object with PM4Py compatibility

class contextualization.event_log.Event(case_id: str, activity: str, start_time: datetime, end_time: datetime, duration: float | None = None, **kwargs)[source]

Bases: object

Single event in an event log.

Attributes:

case_idstr: Identifier for the process case
activitystr: Name of the activity/state
start_timedatetime: Start timestamp of the event
end_timedatetime: End timestamp of the event
durationfloat: Duration in seconds

to_dict() → Dict[source]: Convert event to dictionary.

class contextualization.event_log.EventLog(data: DataFrame | List[Event])[source]

Bases: object

Event log container with PM4Py compatibility.

This class provides a standardized interface for event logs that can be exported to various formats (CSV, XES) and used with process mining tools like PM4Py.

Example

>>> log = EventLog(df)
>>> log.to_csv("event_log.csv")
>>> log.to_xes("event_log.xes")
>>> pm4py_log = log.to_pm4py()  # Use with PM4Py

__init__(data: DataFrame | List[Event])[source]

Initialize event log from DataFrame or list of Events.

Parameters:

datapd.DataFrame or List[Event]: Input event log data

to_dataframe() → DataFrame[source]: Get event log as pandas DataFrame.

to_csv(path: str, filtered: bool = False) → None[source]

Export event log to CSV.

Parameters:

pathstr: Output file path
filteredbool: If True, saves the filtered version (if available)

to_xes(path: str, case_id_key: str = 'case:concept:name', timestamp_key: str = 'time:timestamp') → None[source]

Export event log to XES format using PM4Py.

Parameters:

pathstr: Output file path
case_id_keystr: Column name to use as case identifier in XES
timestamp_keystr: Column name to use as timestamp in XES

to_pm4py(case_id_key: str = 'case:concept:name', timestamp_key: str = 'time:timestamp') → pm4py.objects.log.obj.EventLog[source]

Convert to PM4Py EventLog object for further analysis.

Parameters:

case_id_keystr: Column name to use as case identifier
timestamp_keystr: Column name to use as timestamp

Returns:

pm4py.objects.log.obj.EventLog: PM4Py event log object

filter_duration(min_seconds: float = 0, max_seconds: float = inf) → EventLog[source]

Filter events by duration.

Parameters:

min_secondsfloat: Minimum duration in seconds
max_secondsfloat: Maximum duration in seconds

Returns:

EventLog: Filtered event log

get_cases() → List[str][source]: Get list of unique case IDs.

get_activities() → List[str][source]: Get list of unique activities.

get_case(case_id: str) → EventLog[source]: Get all events for a specific case.

get_statistics() → Dict[str, Any][source]

Compute basic statistics about the event log.

Returns:

dict with:

total_cases: number of cases
total_events: number of events
unique_activities: number of distinct activities
avg_case_duration: average case duration in seconds
activity_frequencies: frequency of each activity

__len__() → int[source]: Return number of events.

__repr__() → str[source]: String representation.

head(n: int = 5) → DataFrame[source]: Return first n events.

contextualization.event_log.create_interval_event_log_normalized(df, y_pred, state_mapping, case_id_col='batch_id', timestamp_col='timestamp')[source]

Create interval-based event log using normalized timestamps.

This function is kept for backward compatibility.

Feature library

Modular feature extraction library with diagnostic capabilities

class features.feature_library.ModularFeatureLibrary(window_sizes=None, stability_eps=1, peak_threshold=0.1)[source]

Bases: object

Modular feature extraction library supporting multiple feature families with integrated rule diagnostics.

compute_features(df, feature_plan: Dict[str, List[str]])[source]: Compute features based on a feature plan.

analyze_rule_performance(df: DataFrame, feature_plan: Dict[str, List[str]]) → Dict[source]: Compute features and analyze rule performance.

Parameters:

df : Input data with sensor signals and state labels feature_plan : Feature plan including event rules

Returns:

Dict with features and diagnostic results

Rule diagnostics

Rule diagnostic analyzer for evaluating rule performance metrics

class evaluation.rule_analyzer.RuleDiagnosticAnalyzer(coverage_threshold: float = 0.6, precision_threshold: float = 0.7, explainability_threshold: float = 0.3)[source]

Bases: object

Analyzes rule performance using coverage, precision, and explainability metrics.

compute_rule_metrics(df: DataFrame, event_features: DataFrame, state_column: str = 'state') → Dict[source]: Compute coverage, precision, and explainability metrics for all event features.

Parameters:

df : DataFrame with state labels event_features : DataFrame containing event rule features (binary columns) state_column : Column name containing state labels

Returns:

Dict with comprehensive diagnostic results

print_diagnostic_report(diagnostic_results: Dict)[source]: Print comprehensive diagnostic report.

Models

Base model interface for pluggable models

class models.base_model.BaseModel[source]

Bases: ABC

Abstract base class for all models in Sensor2EventLog.

This interface ensures that all models can be used interchangeably in the Machine Teaching loop.

abstract fit(X: ndarray, lengths: List[int], y: ndarray | None = None) → BaseModel[source]

Fit the model to training data.

Parameters:

Xnp.ndarray: Feature matrix (n_samples, n_features)
lengthsList[int]: Lengths of each sequence
ynp.ndarray, optional: Labels for supervised learning

Returns:

selfBaseModel: Fitted model

abstract predict(X: ndarray, lengths: List[int]) → ndarray[source]

Predict states for new data.

Parameters:

Xnp.ndarray: Feature matrix (n_samples, n_features)
lengthsList[int]: Lengths of each sequence

Returns:

predictionsnp.ndarray: Predicted state indices (n_samples,)

abstract get_state_mapping() → Dict[int, str][source]

Get mapping from state indices to state names.

Returns:

Dict[int, str]: Mapping from index to state name

Hidden Markov Model implementation

class models.hmm_model.HMMModel(config=None)[source]

Bases: BaseModel

Gaussian Hidden Markov Model for process state discovery.

Supports both supervised and unsupervised learning modes.

__init__(config=None)[source]

Initialize HMM model.

Parameters:

configmodule: Configuration module with HMM parameters

fit(X: ndarray, lengths: List[int], y: ndarray | None = None) → HMMModel[source]

Fit HMM to data.

If y is provided, uses supervised initialization. Otherwise, uses unsupervised learning.

predict(X: ndarray, lengths: List[int]) → ndarray[source]: Predict state sequence using Viterbi algorithm.

get_state_mapping() → Dict[int, str][source]: Get mapping from state indices to state names.

train_supervised(X_train, lengths_train, X_test, lengths_test, y_train, y_test, state_list, idx_to_state) → Tuple[source]: Train supervised HMM with labeled data.

train_unsupervised(X_train, lengths_train, X_test, lengths_test, y_train, y_test, state_list, idx_to_state, n_unsup) → Tuple[source]: Train unsupervised HMM with state mapping.

Utilities

HMM utility functions for training, evaluation, and event log generation

utils.hmm_utils.empirical_start_trans(labels, lengths, n_states)[source]: Estimate startprob_ and transmat_ from labeled sequences.

utils.hmm_utils.emissions_from_labels(X_np, labels_np, n_states)[source]: Compute means and covariances per labeled state.

utils.hmm_utils.viterbi_decode(model, X_np, lengths)[source]: Wrapper for HMM Viterbi decoding.

utils.hmm_utils.print_evaluation(y_true_idx, y_pred_idx, idx_to_state, state_list, title='')[source]: Print classification report and confusion matrix.

utils.hmm_utils.normalize_timestamps(df, timestamp_col='timestamp', case_id_col='batch_id', base_date='2023-01-01')[source]: Normalize timestamps by handling different time units properly.

utils.hmm_utils.create_interval_event_log_normalized(df, y_pred, state_mapping, case_id_col='batch_id', timestamp_col='timestamp')[source]: Create interval-based event log using normalized timestamps.

utils.hmm_utils.filter_brief_states(event_log, min_duration_seconds=5.0)[source]: Remove state segments that are too brief by merging them with adjacent states.

utils.hmm_utils.create_gantt_chart(event_log, max_cases=10, figsize=(14, 8), color_map='Set3')[source]: Create Gantt chart visualization of process execution.

Configuration

Configuration parameters for HMM process analyzer