[IMP] maintenance_service_http_monitoring: auto-close request on service recovery
Some checks failed
pre-commit / pre-commit (pull_request) Failing after 6m50s
Some checks failed
pre-commit / pre-commit (pull_request) Failing after 6m50s
Previously, maintenance requests created on HTTP failures were never automatically resolved. Operators had to close them manually, with no traceability of when or why the request was closed. This commit adds automatic resolution when a service returns HTTP 200 while an open maintenance request exists for it. **Detection logic** (in ``cron_check_http_services``): Before pass 1, the cron takes a snapshot of all services that currently have an open (non-done) ``maintenance.request`` via ``http_maintenance_request``. After pass 1, services in that snapshot that are now OK (``http_status_ok = True``) are identified as recovered and passed to the new ``_close_http_maintenance_request()`` method. **Closure logic** (new ``_close_http_maintenance_request`` method): 1. Finds the first ``maintenance.stage`` with ``done = True``. If none exists (misconfigured instance), the method is a no-op. 2. Moves the ``maintenance.request`` to that done stage via ``sudo()`` to bypass ACL restrictions from the cron user context. 3. Posts a chatter note on the request as OdooBot (``base.partner_root``) using ``subtype_xmlid="mail.mt_note"`` (internal note, not a follower notification) indicating the service URL and that the closure was performed automatically by the monitoring cron. 4. Clears ``http_maintenance_request`` on the ``service.instance``, allowing a fresh request to be created if the service fails again. **Tests** (2 new, 16 total): - ``test_service_recovery_closes_request``: full end-to-end scenario — first cron run produces a KO request, second cron run with HTTP 200 asserts the request is in a done stage, the chatter note mentioning the service URL exists, and ``http_maintenance_request`` is cleared. - ``test_no_close_when_no_open_request``: calling ``_close_http_maintenance_request`` on a service with no open request is a no-op and does not raise. **README**: "Automatic Maintenance Requests" section extended with the recovery behaviour (done stage, OdooBot note, field cleared).
This commit is contained in:
@@ -107,6 +107,12 @@ When a service fails HTTP checks:
|
||||
after 2 seconds. A maintenance request is only created if the service fails **both**
|
||||
checks, reducing noise from transient HTTP errors
|
||||
|
||||
When a service recovers (returns HTTP 200 after having an open request):
|
||||
- The open maintenance request is automatically moved to the first **done** stage
|
||||
- A chatter note is posted on the request by OdooBot to record the automatic closure
|
||||
- The link between the service and the request is cleared, allowing a new request to
|
||||
be created if the service fails again in the future
|
||||
|
||||
## Webhook notifications
|
||||
|
||||
When a new maintenance request is created (HTTP check failure), the module can
|
||||
|
||||
@@ -74,13 +74,43 @@ class ServiceInstance(models.Model):
|
||||
ko_records |= rec
|
||||
return ko_records
|
||||
|
||||
def _close_http_maintenance_request(self):
|
||||
"""
|
||||
Close the open maintenance.request for each recovered service.
|
||||
|
||||
Moves the request to the first done stage, posts a chatter note as OdooBot, and
|
||||
clears http_maintenance_request on the service instance.
|
||||
"""
|
||||
done_stage = self.env["maintenance.stage"].search(
|
||||
[("done", "=", True)], limit=1
|
||||
)
|
||||
if not done_stage:
|
||||
return
|
||||
odoobot = self.env.ref("base.partner_root")
|
||||
for rec in self:
|
||||
request = rec.http_maintenance_request
|
||||
if not request or request.stage_id.done:
|
||||
continue
|
||||
request.sudo().write({"stage_id": done_stage.id})
|
||||
request.sudo().message_post(
|
||||
body=(
|
||||
f"Service {rec.service_url} is back online. "
|
||||
"This request has been automatically closed by the monitoring cron."
|
||||
),
|
||||
author_id=odoobot.id,
|
||||
message_type="comment",
|
||||
subtype_xmlid="mail.mt_note",
|
||||
)
|
||||
rec.http_maintenance_request = False
|
||||
|
||||
@api.model
|
||||
def cron_check_http_services(self):
|
||||
"""
|
||||
Check all active services with a URL, with one retry on failure.
|
||||
|
||||
Pass 1: test every eligible service.
|
||||
If any fail, wait HTTP_RETRY_DELAY seconds then retest only the KO ones.
|
||||
- Services that had an open request and are now OK are auto-resolved.
|
||||
- Services still KO after pass 1 are retested after HTTP_RETRY_DELAY seconds.
|
||||
maintenance.request is created only for services that fail both passes,
|
||||
reducing noise from transient HTTP errors.
|
||||
"""
|
||||
@@ -93,8 +123,19 @@ class ServiceInstance(models.Model):
|
||||
lambda s: not s.equipment_id.maintenance_mode
|
||||
)
|
||||
|
||||
# Snapshot services that currently have an open request before pass 1
|
||||
services_with_open_request = services.filtered(
|
||||
lambda s: s.http_maintenance_request
|
||||
and not s.http_maintenance_request.stage_id.done
|
||||
)
|
||||
|
||||
ko_after_pass1 = services.check_http_status()
|
||||
|
||||
# Auto-resolve services that recovered during pass 1
|
||||
recovered = services_with_open_request.filtered(lambda s: s.http_status_ok)
|
||||
if recovered:
|
||||
recovered._close_http_maintenance_request()
|
||||
|
||||
if not ko_after_pass1:
|
||||
return
|
||||
|
||||
|
||||
@@ -312,3 +312,49 @@ class TestHttpMonitoring(TransactionCase):
|
||||
),
|
||||
2,
|
||||
)
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# Test 15 -- Service recovery closes the open request and posts a note
|
||||
# ------------------------------------------------------------------
|
||||
def test_service_recovery_closes_request(self):
|
||||
# First cron run: service is KO -> request created
|
||||
with (
|
||||
patch(SERVICE_INSTANCE_REQUESTS) as mock_requests,
|
||||
patch(SERVICE_INSTANCE_SLEEP),
|
||||
):
|
||||
mock_requests.get.return_value = _mock_response(500)
|
||||
mock_requests.exceptions.RequestException = Exception
|
||||
self.env["service.instance"].cron_check_http_services()
|
||||
|
||||
request = self.service_instance.http_maintenance_request
|
||||
self.assertTrue(request)
|
||||
self.assertFalse(request.stage_id.done)
|
||||
|
||||
# Second cron run: service is back OK -> request auto-closed
|
||||
with (
|
||||
patch(SERVICE_INSTANCE_REQUESTS) as mock_requests,
|
||||
patch(SERVICE_INSTANCE_SLEEP),
|
||||
):
|
||||
mock_requests.get.return_value = _mock_response(200)
|
||||
mock_requests.exceptions.RequestException = Exception
|
||||
self.env["service.instance"].cron_check_http_services()
|
||||
|
||||
# Request must be in a done stage
|
||||
self.assertTrue(request.stage_id.done)
|
||||
# http_maintenance_request must be cleared on the service instance
|
||||
self.assertFalse(self.service_instance.http_maintenance_request)
|
||||
# A chatter note must have been posted mentioning the service URL
|
||||
notes = request.message_ids.filtered(
|
||||
lambda m: self.service_instance.service_url in (m.body or "")
|
||||
)
|
||||
self.assertTrue(notes)
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# Test 16 -- No open request -> _close_http_maintenance_request is a no-op
|
||||
# ------------------------------------------------------------------
|
||||
def test_no_close_when_no_open_request(self):
|
||||
# Service is OK from the start, no request exists
|
||||
self.assertFalse(self.service_instance.http_maintenance_request)
|
||||
# Calling close directly must not raise
|
||||
self.service_instance._close_http_maintenance_request()
|
||||
self.assertFalse(self.service_instance.http_maintenance_request)
|
||||
|
||||
Reference in New Issue
Block a user