Incident Response in OT: Containment without shutting the plant

Home

Blogs

Incident Response in OT: Containment without shutting the plant

Prayukth K V

June 17, 2025

Incident Response in OT: Containment without shutting the plant

Operational Technology (OT) environments power the world’s most critical infrastructure, from energy grids and chemical refineries to manufacturing plants and water utilities. In such essential environments, uptime isn't just a metric, it's a mandate. Any cybersecurity incident that disrupts OT operations could cause physical damage, financial loss, environmental harm, or even loss of life.

Why OT Incident Response is different

In IT, containment can mean unplugging a server or killing a process. In OT, these actions could cause safety shutdowns, production losses, or equipment damage.

These differences should ideally inform every step of your OT incident response plan.

Step 1: Preparation with Industrial Context

Preparation is the foundation of an effective incident response capability. In OT, this starts with deep operational awareness.

Key preparatory actions:

· Asset Inventory: Maintain an up-to-date, detailed OT asset inventory (automated tools like with proven asset discovery capability such as Shieldworkz can help).

· Network maps and data flows: Understand control loops, safety systems, HMI-PLC-SCADA interactions, and vendor remote access points.

· Baseline Behaviors: Use anomaly detection systems such as Shieldworkz to define “normal” for protocols, command patterns, and device behaviors.

· Incident Playbooks: Develop OT-specific IR playbooks with predefined actions per asset type (e.g., PLC infected vs. historian breach). Work with proven OT security vendors with proven capabilities like Shieldworkz

· Stress test all functions, processes and systems: To ensure they are ready to manage an incident

IEC 62443-2-1 mandates clearly defined security program policies, asset owner responsibilities and roles. The IR plan includes cross-functional collaboration between IT, OT, and safety departments.

Step 2: Detection and Initial Triage (aligns to NIST CSF: Detect)

Most OT attacks do not start with an immediate disruption, they begin with reconnaissance, lateral movement, or unauthorized access attempts. Early detection is therefore critical. This is where an OT incident aware workforce and tested incident response playbooks become an essential link in the incident containment chain.

Detection sources in OT:

· OT IDS/IPS: Tools such as Shieldworkz that are OT aware and understand ICS protocols (such as Modbus, DNP3, S7).

· Syslogs and Device Logs: Where available, from firewalls, HMIs, and engineering workstations.

· Anomaly Detection: Sudden changes in PLC programming, firmware uploads, or unauthorized commands.

· Human Reporting: Operators who are trained to and may observe unusual behavior before systems do.

Initial triage considerations:

· Safety Impact: Could the anomaly pose a risk to human life or equipment?

· Operational Impact: Is production currently affected?

· Propagation Risk: Could the threat spread laterally across control zones?

· Use a risk-based scoring system that factors in IEC 62443 zones and conduits to assess containment urgency and scope.

· Can the risk continue to loiter in remote systems?

· Are there reports of similar incidents from external sources?

Step 3: Containment without shutdown (aligned to NIST CSF: Respond)

This is where OT IR diverges dramatically from IT. The challenge is to ensure surgical containment which means neutralizing the threat while preserving process integrity.

Recommended containment techniques by asset types:

For engineering workstations / HMIs:

· Isolate the affected workstation/HMI from the network via switch port shutdown or ACL updates.

· Redirect operator functions to backup HMI if available.

· Apply NDR tools such as Shieldworkz configured in monitoring (not blocking) mode to avoid disruptions.

For PLCs / RTUs:

· Avoid rebooting or reprogramming live PLCs unless coordinated with OT operations.

· Disable external write access using firewall rules or vendor-native controls.

· Use read-only monitoring to validate PLC state and logic changes.

For Historian or SCADA Servers:

· If compromised, shift data acquisition to a pre-determined backup system if architected.

· Throttle or block external communication channels to limit threat exfiltration.

For OT Networks:

· Segment suspected subnets using inline DPI firewalls.

· Disable specific protocol functions (e.g., Modbus write commands) temporarily.

· Deploy virtual patching via network-based IPS where possible.

Step 4: Eradication and recovery with operational continuity

Once the threat is contained, your next steps should eradicate malicious code or access while restoring full functionality, without triggering plant-wide outages. This should be done as per the IR playbook with assistance from experts and malware specialists.

Eradication steps:

· Connect with IR malware specialists

· Clean malware from affected devices using portable, approved tools.

· Remove unauthorized user accounts, scripts, or firmware artifacts.

· Reverse lateral movement mechanisms like rogue VPN tunnels or RDP sessions

Step 5: Post-incident reporting and regulatory response

Cyber incidents in critical infrastructure require notification to national regulators or CERTs, especially if there is data loss, safety impact, or public service disruption within a pre-decided time limit. The focus of this activity should be to ensure that the most accurate report with verifiable data is sent to the regulators as per the format applicable.

What to document:

· Pre-incident anomalous activity

· Timeline

· Affected assets

· Type of attack

· Impact of attack

· Remedial measures undertaken

· Logs and forensic data

Reporting obligations:

Depending on your jurisdiction, you may need to report to:

· Regional or sectoral CERT

· India: NCIIPC or CERT-In for CIIs and critical infrastructure operators

· Europe: NIS2 Directive (within 24 hours of detection)

· U.S.: CISA reporting requirements under the CIRCIA Act

· Sectoral bodies: Power, water, oil & gas often have their own regulators

IEC 62443 emphasizes secure supplier relationships, so vendor-originated incidents must also be reported and investigated. Further, Dark Web scans can be conducted to see if any data has been leaked.

Step 6: Asset-level incident response maturity

The IR posture shouldn’t stop at the network level; it should drill down to specific OT assets. An asset level incident response can enable a more coherent and consistent response to an event and prevent the need for shutting down large parts of the network to contain an event.

Very few OT operators have an IR plan that goes down to the asset level.

Asset Response Profiles can be created for:

· PLCs

· HMIs

· SCADA systems

· Historians

Define normal vs anomalous behaviour for each asset set

· Document approved firmware versions

· Maintain known-good configurations

· List containment and failover actions

· Include contact protocols for OEM/vendor engagement

These can be stored in your OT IR runbook and integrated into SOAR workflows if your organization uses automation platforms.

Embedding IR into a resilient OT architecture

A well etched incident response plan resting on a strong foundation of institutional cyber resilience must be tightly coupled with OT network and system architecture.

Design Principle	Implementation Strategy
Segmentation	Enforce zones & conduits (IEC 62443-3-3)
Monitoring and detection	DPI-enabled IDS with ICS protocol support
Redundancy	Hot standby for critical HMIs and historians
Asset isolation	Use ACLs and managed switches per zone
Jump servers for access	Gate all remote/vendor access through DMZ
Offline backups	Immutable, offline images per asset type

Step 7: Post-incident OT security risk assessment

An IEC 62443-based risk assessment can be conducted post incident to ascertain security gaps as well as to identify additional security controls needed to prevent such incidents in the future. This assessment can also validate the immediate security measures taken post the incident.

Containment without catastrophe or concern

Industrial environments can’t afford cyber overreactions and shutdowns. Unlike IT, where devices can be rebooted and reimaged, in OT the cost of shutting down a PLC or isolating a switch could mean millions of dollars, regulatory fines, or worse, human injury.

The essence of OT incident response is control: control over the threat, over the process, and over the recovery timeline.

OT Incident Response Essentials Checklist

Component	Description
Incident Playbooks	Defined per asset and threat type
OT Asset Inventory	Detailed, continuously updated
Detection Tools	Protocol-aware IDS, behavioral baselines
Containment Strategies	Zone-based, asset-specific, safety-conscious
Recovery Plans	Tested backups, logic validation
Regulatory Reporting	Mapped to national and sectoral obligations
Cross-Functional Team	IT, OT, safety, vendor coordination

Need help developing or refining your OT incident response strategy or playbook? Our OT cybersecurity experts can guide you from blueprint to execution. Reach out now.