API Reference

This page is generated with sphinx.ext.autodoc and sphinx.ext.autosummary directly from the source docstrings.

Core pipeline

Core pipeline orchestrator for Sensor2EventLog framework

class core.pipeline.Sensor2EventLogPipeline(config_module=None)[source]

Bases: object

Main pipeline class for transforming sensor data to event logs.

This class orchestrates the entire Machine Teaching process: 1. Feature extraction 2. Model training (HMM with supervised/unsupervised modes) 3. Diagnostic analysis 4. Event log generation

Example

>>> pipeline = Sensor2EventLogPipeline(config)
>>> result = pipeline.run(
...     data_path="sensor_data.csv",
...     feature_plan=feature_plan,
...     mode="unsupervised"
... )
>>> event_log = result['event_log']
>>> event_log.to_xes("output.xes")
__init__(config_module=None)[source]

Initialize the pipeline with configuration.

Parameters:

config_modulemodule, optional

Configuration module with all parameters. If None, uses default config

run(data_path: str, feature_plan: Dict[str, list], mode: str = 'unsupervised', use_cip: bool = False, n_unsup: int | None = None, random_seed: int = 42, min_duration_seconds: float = 2.0, return_intermediate: bool = False) Dict[str, Any][source]

Run the complete pipeline.

Parameters:

data_pathstr

Path to CSV data file

feature_plandict

Feature extraction plan with families and signals

modestr

“supervised” or “unsupervised”

use_cipbool

Whether to include CIP states

n_unsupint

Number of states for unsupervised mode

random_seedint

Random seed for reproducibility

min_duration_secondsfloat

Minimum duration for filtering brief states

return_intermediatebool

If True, returns intermediate results (features, diagnostics)

Returns:

dict with keys:
  • event_log: EventLog object

  • model: trained HMM model

  • predictions: predicted state sequences

  • features (if return_intermediate): extracted features

  • diagnostics (if return_intermediate): diagnostic results

Machine Teaching loop

Machine Teaching loop for Sensor2EventLog framework.

class abstraction.mt_loop.MachineTeachingLoop(model_type: str, feature_extractor, diagnostic_analyzer, config)[source]

Bases: object

Orchestrates feature extraction, diagnostics, and model training.

get_review_summary() Dict[str, Any] | None[source]
run(df: DataFrame, feature_plan: Dict[str, list], mode: str = 'unsupervised', n_unsup: int | None = None, random_seed: int = 42) Dict[str, Any][source]

Event log

Event log object with PM4Py compatibility

class contextualization.event_log.Event(case_id: str, activity: str, start_time: datetime, end_time: datetime, duration: float | None = None, **kwargs)[source]

Bases: object

Single event in an event log.

Attributes:

case_idstr

Identifier for the process case

activitystr

Name of the activity/state

start_timedatetime

Start timestamp of the event

end_timedatetime

End timestamp of the event

durationfloat

Duration in seconds

to_dict() Dict[source]

Convert event to dictionary.

class contextualization.event_log.EventLog(data: DataFrame | List[Event])[source]

Bases: object

Event log container with PM4Py compatibility.

This class provides a standardized interface for event logs that can be exported to various formats (CSV, XES) and used with process mining tools like PM4Py.

Example

>>> log = EventLog(df)
>>> log.to_csv("event_log.csv")
>>> log.to_xes("event_log.xes")
>>> pm4py_log = log.to_pm4py()  # Use with PM4Py
__init__(data: DataFrame | List[Event])[source]

Initialize event log from DataFrame or list of Events.

Parameters:

datapd.DataFrame or List[Event]

Input event log data

to_dataframe() DataFrame[source]

Get event log as pandas DataFrame.

to_csv(path: str, filtered: bool = False) None[source]

Export event log to CSV.

Parameters:

pathstr

Output file path

filteredbool

If True, saves the filtered version (if available)

to_xes(path: str, case_id_key: str = 'case:concept:name', timestamp_key: str = 'time:timestamp') None[source]

Export event log to XES format using PM4Py.

Parameters:

pathstr

Output file path

case_id_keystr

Column name to use as case identifier in XES

timestamp_keystr

Column name to use as timestamp in XES

to_pm4py(case_id_key: str = 'case:concept:name', timestamp_key: str = 'time:timestamp') pm4py.objects.log.obj.EventLog[source]

Convert to PM4Py EventLog object for further analysis.

Parameters:

case_id_keystr

Column name to use as case identifier

timestamp_keystr

Column name to use as timestamp

Returns:

pm4py.objects.log.obj.EventLog

PM4Py event log object

filter_duration(min_seconds: float = 0, max_seconds: float = inf) EventLog[source]

Filter events by duration.

Parameters:

min_secondsfloat

Minimum duration in seconds

max_secondsfloat

Maximum duration in seconds

Returns:

EventLog

Filtered event log

get_cases() List[str][source]

Get list of unique case IDs.

get_activities() List[str][source]

Get list of unique activities.

get_case(case_id: str) EventLog[source]

Get all events for a specific case.

get_statistics() Dict[str, Any][source]

Compute basic statistics about the event log.

Returns:

dict with:
  • total_cases: number of cases

  • total_events: number of events

  • unique_activities: number of distinct activities

  • avg_case_duration: average case duration in seconds

  • activity_frequencies: frequency of each activity

__len__() int[source]

Return number of events.

__repr__() str[source]

String representation.

head(n: int = 5) DataFrame[source]

Return first n events.

contextualization.event_log.create_interval_event_log_normalized(df, y_pred, state_mapping, case_id_col='batch_id', timestamp_col='timestamp')[source]

Create interval-based event log using normalized timestamps.

This function is kept for backward compatibility.

Feature library

Modular feature extraction library with diagnostic capabilities

class features.feature_library.ModularFeatureLibrary(window_sizes=None, stability_eps=1, peak_threshold=0.1)[source]

Bases: object

Modular feature extraction library supporting multiple feature families with integrated rule diagnostics.

compute_features(df, feature_plan: Dict[str, List[str]])[source]

Compute features based on a feature plan.

analyze_rule_performance(df: DataFrame, feature_plan: Dict[str, List[str]]) Dict[source]

Compute features and analyze rule performance.

Parameters:

df : Input data with sensor signals and state labels feature_plan : Feature plan including event rules

Returns:

Dict with features and diagnostic results

Rule diagnostics

Rule diagnostic analyzer for evaluating rule performance metrics

class evaluation.rule_analyzer.RuleDiagnosticAnalyzer(coverage_threshold: float = 0.6, precision_threshold: float = 0.7, explainability_threshold: float = 0.3)[source]

Bases: object

Analyzes rule performance using coverage, precision, and explainability metrics.

compute_rule_metrics(df: DataFrame, event_features: DataFrame, state_column: str = 'state') Dict[source]

Compute coverage, precision, and explainability metrics for all event features.

Parameters:

df : DataFrame with state labels event_features : DataFrame containing event rule features (binary columns) state_column : Column name containing state labels

Returns:

Dict with comprehensive diagnostic results

print_diagnostic_report(diagnostic_results: Dict)[source]

Print comprehensive diagnostic report.

Models

Base model interface for pluggable models

class models.base_model.BaseModel[source]

Bases: ABC

Abstract base class for all models in Sensor2EventLog.

This interface ensures that all models can be used interchangeably in the Machine Teaching loop.

abstract fit(X: ndarray, lengths: List[int], y: ndarray | None = None) BaseModel[source]

Fit the model to training data.

Parameters:

Xnp.ndarray

Feature matrix (n_samples, n_features)

lengthsList[int]

Lengths of each sequence

ynp.ndarray, optional

Labels for supervised learning

Returns:

selfBaseModel

Fitted model

abstract predict(X: ndarray, lengths: List[int]) ndarray[source]

Predict states for new data.

Parameters:

Xnp.ndarray

Feature matrix (n_samples, n_features)

lengthsList[int]

Lengths of each sequence

Returns:

predictionsnp.ndarray

Predicted state indices (n_samples,)

abstract get_state_mapping() Dict[int, str][source]

Get mapping from state indices to state names.

Returns:

Dict[int, str]

Mapping from index to state name

Hidden Markov Model implementation

class models.hmm_model.HMMModel(config=None)[source]

Bases: BaseModel

Gaussian Hidden Markov Model for process state discovery.

Supports both supervised and unsupervised learning modes.

__init__(config=None)[source]

Initialize HMM model.

Parameters:

configmodule

Configuration module with HMM parameters

fit(X: ndarray, lengths: List[int], y: ndarray | None = None) HMMModel[source]

Fit HMM to data.

If y is provided, uses supervised initialization. Otherwise, uses unsupervised learning.

predict(X: ndarray, lengths: List[int]) ndarray[source]

Predict state sequence using Viterbi algorithm.

get_state_mapping() Dict[int, str][source]

Get mapping from state indices to state names.

train_supervised(X_train, lengths_train, X_test, lengths_test, y_train, y_test, state_list, idx_to_state) Tuple[source]

Train supervised HMM with labeled data.

train_unsupervised(X_train, lengths_train, X_test, lengths_test, y_train, y_test, state_list, idx_to_state, n_unsup) Tuple[source]

Train unsupervised HMM with state mapping.

Utilities

HMM utility functions for training, evaluation, and event log generation

utils.hmm_utils.empirical_start_trans(labels, lengths, n_states)[source]

Estimate startprob_ and transmat_ from labeled sequences.

utils.hmm_utils.emissions_from_labels(X_np, labels_np, n_states)[source]

Compute means and covariances per labeled state.

utils.hmm_utils.viterbi_decode(model, X_np, lengths)[source]

Wrapper for HMM Viterbi decoding.

utils.hmm_utils.print_evaluation(y_true_idx, y_pred_idx, idx_to_state, state_list, title='')[source]

Print classification report and confusion matrix.

utils.hmm_utils.normalize_timestamps(df, timestamp_col='timestamp', case_id_col='batch_id', base_date='2023-01-01')[source]

Normalize timestamps by handling different time units properly.

utils.hmm_utils.create_interval_event_log_normalized(df, y_pred, state_mapping, case_id_col='batch_id', timestamp_col='timestamp')[source]

Create interval-based event log using normalized timestamps.

utils.hmm_utils.filter_brief_states(event_log, min_duration_seconds=5.0)[source]

Remove state segments that are too brief by merging them with adjacent states.

utils.hmm_utils.create_gantt_chart(event_log, max_cases=10, figsize=(14, 8), color_map='Set3')[source]

Create Gantt chart visualization of process execution.

Configuration

Configuration parameters for HMM process analyzer