[IMP] maintenance_service_http_monitoring: auto-close request on service recovery
Some checks failed
pre-commit / pre-commit (pull_request) Failing after 6m50s

Previously, maintenance requests created on HTTP failures were never
automatically resolved. Operators had to close them manually, with no
traceability of when or why the request was closed.

This commit adds automatic resolution when a service returns HTTP 200
while an open maintenance request exists for it.

**Detection logic** (in ``cron_check_http_services``):

Before pass 1, the cron takes a snapshot of all services that currently
have an open (non-done) ``maintenance.request`` via
``http_maintenance_request``. After pass 1, services in that snapshot
that are now OK (``http_status_ok = True``) are identified as recovered
and passed to the new ``_close_http_maintenance_request()`` method.

**Closure logic** (new ``_close_http_maintenance_request`` method):

1. Finds the first ``maintenance.stage`` with ``done = True``.
   If none exists (misconfigured instance), the method is a no-op.
2. Moves the ``maintenance.request`` to that done stage via ``sudo()``
   to bypass ACL restrictions from the cron user context.
3. Posts a chatter note on the request as OdooBot (``base.partner_root``)
   using ``subtype_xmlid="mail.mt_note"`` (internal note, not a follower
   notification) indicating the service URL and that the closure was
   performed automatically by the monitoring cron.
4. Clears ``http_maintenance_request`` on the ``service.instance``,
   allowing a fresh request to be created if the service fails again.

**Tests** (2 new, 16 total):

- ``test_service_recovery_closes_request``: full end-to-end scenario —
  first cron run produces a KO request, second cron run with HTTP 200
  asserts the request is in a done stage, the chatter note mentioning
  the service URL exists, and ``http_maintenance_request`` is cleared.
- ``test_no_close_when_no_open_request``: calling
  ``_close_http_maintenance_request`` on a service with no open request
  is a no-op and does not raise.

**README**: "Automatic Maintenance Requests" section extended with the
recovery behaviour (done stage, OdooBot note, field cleared).
This commit is contained in:
Stéphan Sainléger
2026-06-15 18:03:31 +02:00
parent c238e54808
commit c4d7e9b8a9
3 changed files with 94 additions and 1 deletions

View File

@@ -107,6 +107,12 @@ When a service fails HTTP checks:
after 2 seconds. A maintenance request is only created if the service fails **both**
checks, reducing noise from transient HTTP errors
When a service recovers (returns HTTP 200 after having an open request):
- The open maintenance request is automatically moved to the first **done** stage
- A chatter note is posted on the request by OdooBot to record the automatic closure
- The link between the service and the request is cleared, allowing a new request to
be created if the service fails again in the future
## Webhook notifications
When a new maintenance request is created (HTTP check failure), the module can

View File

@@ -74,13 +74,43 @@ class ServiceInstance(models.Model):
ko_records |= rec
return ko_records
def _close_http_maintenance_request(self):
"""
Close the open maintenance.request for each recovered service.
Moves the request to the first done stage, posts a chatter note as OdooBot, and
clears http_maintenance_request on the service instance.
"""
done_stage = self.env["maintenance.stage"].search(
[("done", "=", True)], limit=1
)
if not done_stage:
return
odoobot = self.env.ref("base.partner_root")
for rec in self:
request = rec.http_maintenance_request
if not request or request.stage_id.done:
continue
request.sudo().write({"stage_id": done_stage.id})
request.sudo().message_post(
body=(
f"Service {rec.service_url} is back online. "
"This request has been automatically closed by the monitoring cron."
),
author_id=odoobot.id,
message_type="comment",
subtype_xmlid="mail.mt_note",
)
rec.http_maintenance_request = False
@api.model
def cron_check_http_services(self):
"""
Check all active services with a URL, with one retry on failure.
Pass 1: test every eligible service.
If any fail, wait HTTP_RETRY_DELAY seconds then retest only the KO ones.
- Services that had an open request and are now OK are auto-resolved.
- Services still KO after pass 1 are retested after HTTP_RETRY_DELAY seconds.
maintenance.request is created only for services that fail both passes,
reducing noise from transient HTTP errors.
"""
@@ -93,8 +123,19 @@ class ServiceInstance(models.Model):
lambda s: not s.equipment_id.maintenance_mode
)
# Snapshot services that currently have an open request before pass 1
services_with_open_request = services.filtered(
lambda s: s.http_maintenance_request
and not s.http_maintenance_request.stage_id.done
)
ko_after_pass1 = services.check_http_status()
# Auto-resolve services that recovered during pass 1
recovered = services_with_open_request.filtered(lambda s: s.http_status_ok)
if recovered:
recovered._close_http_maintenance_request()
if not ko_after_pass1:
return

View File

@@ -312,3 +312,49 @@ class TestHttpMonitoring(TransactionCase):
),
2,
)
# ------------------------------------------------------------------
# Test 15 -- Service recovery closes the open request and posts a note
# ------------------------------------------------------------------
def test_service_recovery_closes_request(self):
# First cron run: service is KO -> request created
with (
patch(SERVICE_INSTANCE_REQUESTS) as mock_requests,
patch(SERVICE_INSTANCE_SLEEP),
):
mock_requests.get.return_value = _mock_response(500)
mock_requests.exceptions.RequestException = Exception
self.env["service.instance"].cron_check_http_services()
request = self.service_instance.http_maintenance_request
self.assertTrue(request)
self.assertFalse(request.stage_id.done)
# Second cron run: service is back OK -> request auto-closed
with (
patch(SERVICE_INSTANCE_REQUESTS) as mock_requests,
patch(SERVICE_INSTANCE_SLEEP),
):
mock_requests.get.return_value = _mock_response(200)
mock_requests.exceptions.RequestException = Exception
self.env["service.instance"].cron_check_http_services()
# Request must be in a done stage
self.assertTrue(request.stage_id.done)
# http_maintenance_request must be cleared on the service instance
self.assertFalse(self.service_instance.http_maintenance_request)
# A chatter note must have been posted mentioning the service URL
notes = request.message_ids.filtered(
lambda m: self.service_instance.service_url in (m.body or "")
)
self.assertTrue(notes)
# ------------------------------------------------------------------
# Test 16 -- No open request -> _close_http_maintenance_request is a no-op
# ------------------------------------------------------------------
def test_no_close_when_no_open_request(self):
# Service is OK from the start, no request exists
self.assertFalse(self.service_instance.http_maintenance_request)
# Calling close directly must not raise
self.service_instance._close_http_maintenance_request()
self.assertFalse(self.service_instance.http_maintenance_request)