Requirement
NIST SP 800-171 REV.2 / CMMC 2.0 Level 2 - Control - AU.L2-3.3.4 – Alert in the event of an audit logging process failure.
Understanding the Requirement
This control (NIST SP 800-171 REV.2 / CMMC 2.0 Level 2) requires that your organization identify who must be alerted, define which logging failures should trigger alerts, and ensure those people actually receive alerts when audit logging processes fail. Audit logging failures commonly happen when a syslog/SIEM server runs out of storage, the logging service stops, or the server becomes unreachable; the goal is to detect and notify quickly so logs are not lost and issues can be resolved before critical evidence is missing.
Technical Implementation
- Identify alert recipients and escalation paths: Document roles (e.g., system/network administrators, information security lead, and audit/accountability personnel) and build an escalation chain. Configure primary alerts to go to the on-call admin and redundant alerts to a security officer or ticketing queue if the primary recipient doesn’t acknowledge within a defined SLA (for example, 15 minutes).
- Monitor storage and service health with thresholds: Configure your syslog server or SIEM to generate alerts when disk usage reaches conservative thresholds (e.g., 70% for planning, 85% for actionable warning, 95% for critical) and when the logging service process stops or becomes unresponsive. Use built-in alerting or a lightweight monitoring agent (Nagios, Zabbix, Prometheus, or cloud monitoring) if your syslog solution lacks these features.
- Detect connectivity and ingestion failures: Set up synthetic log heartbeat messages from representative systems (workstation, server, firewall) and alert when a heartbeat is missing for a defined period (for example, 10 minutes). Also alert on sudden drops in log volume from critical devices, which can indicate network issues or agent failures.
- Configure multiple alert channels and integrate with ticketing: Send alerts through at least two channels (email + SMS/phone or email + ticket creation) and integrate with your helpdesk or incident response system so an incident ticket is automatically created and assigned. This preserves an audit trail that the control requires.
- Implement short-term mitigations and long-term failover: Automate log rotation and archival to a secondary disk or network share when thresholds are hit, and consider forwarding critical logs to a cloud storage endpoint or secondary syslog server for redundancy. Test failover procedures periodically to ensure alerts actually fire and logs remain available.
- Document runbooks and test alerts regularly: Create a simple runbook that describes initial response steps (acknowledge alert, free space, restart service, failover, validate integrity) and run quarterly tests of alerting and escalation to ensure people listed in your policy are reachable and know their responsibilities.
Example in a Small or Medium Business
An SMB runs a central syslog server collecting logs from servers, workstations, and the perimeter firewall. The IT manager configures the syslog server and a small cloud-based monitoring service to send alerts when disk usage hits 80%, 90%, and 95%, and to send immediate alerts if the syslog service process stops or if synthetic heartbeat logs stop arriving. Alerts go first to the network administrator’s mobile and email, and if unacknowledged in 15 minutes they escalate to the information security officer and create a ticket in the company’s helpdesk. When an alert shows the disk at 92%, the network administrator moves older archived logs to a NAS and triggers a scheduled log rotation to free space; they record the event in the ticket. On a separate occasion the syslog service crashes; the monitoring tool notifies the on-call admin, who restarts the service and checks for any lost messages using heartbeat gaps and source devices. After each incident the team updates the runbook and adjusts thresholds—adding more frequent archival and enabling forwarding of critical firewall logs to a cloud endpoint so a single server failure won’t cause data loss.
Summary
By identifying the people to alert, defining the failure types to watch for, and implementing technical measures—threshold-based storage alerts, service health checks, heartbeat monitoring, dual alert channels, automated archival, and documented runbooks—SMBs can meet AU.L2-3.3.4. Those policy and technical controls ensure logging failures are detected quickly, appropriate staff are notified and escalated, and logs are preserved or recovered so auditability and incident response capabilities remain intact.