August 30, 2025

Major European Fleet Provider — Fleet Core & Workforce Backbone

Fleet and workforce backbone—allocation, compliance, and incident orchestration

Summary

Built a fleet & workforce backbone unifying vehicle lifecycle, allocation & inventory, and driver compliance on a single API-led orchestration layer. The platform coordinated insurers, police, repairers, and mobile apps using SOAP/XML + REST over JMS queues, improving utilization and SLA adherence through right-vehicle/right-loadout and governed FNOL/repair workflows.


Problem

  • Operational fragmentation: Separate systems for vehicle lifecycle (acquisition→refuel→repair→resale), allocation, and driver management led to slow dispatch, stranded repairs, and unclear ownership.
  • Compliance gaps: Manual checks of licenses/permits/endorsements and vehicle suitability caused misassignments and SLA breaches.
  • Partner variability: Insurer and police interactions were SOAP/XML with differing codes and timelines; callbacks were missed; evidence was scattered across emails/drives.
  • On-prem constraints: VMware-hosted apps, Oracle persistence, JMS (TIBCO EMS) integration, and legacy SOAP endpoints limited architectural options.

Solution Mechanics

Primary pattern: API-led orchestration (Java + Spring Boot microservices on-prem).
Secondary pattern (targeted): Rules/validation (Drools) for driver/vehicle compliance and loadout checks.

  • API Orchestration Layer (Spring Boot)

    • Normalizes REST/SOAP across channels (Planner UI, Mobile App, batch imports).
    • Correlation IDs, idempotency keys, timeout/retry budgets, circuit breakers.
    • Translates partner-specific SOAP/XML into internal canonical objects.
  • Fleet Core Service (Spring Boot + Oracle)

    • Vehicle master, status transitions (in-service, repair, pool, decommission).
    • Lifecycle events (acquisition, fueling, maintenance, repairs, disposal).
    • PL/SQL packages for incident FNOL, progress queries, image URL capture (URLs stored instead of binaries for 3G/4G reliability).
  • Allocation & Inventory Service

    • Computes right-vehicle/right-loadout by job type, terrain, weight, and kit.
    • Pulls inventory/vehicle fitment and driver proximity; reserves vehicle and tool kits.
    • Publishes allocation events to JMS for downstream notifications.
  • Driver & Workforce Service

    • Central driver roster, license/endorsement validity, training/permit expiries.
    • Drools rules evaluate assignment eligibility (license class, ADR/HV, towing limits).
    • HR/LDAP lookups for role-based access (Driver, Fleet Manager).
  • Incident & Repair Orchestration

    • Mobile FNOL → create incident; attach photos via URL collection; capture police details; set liability and roadworthy flags.
    • Insurer integration over SOAP/XML via JMS request/reply; repairer authorizations and status transitions (AWAITING ESTIMATE, AUTHORISED, JOB COMPLETE, etc.).
    • Progress API for mobile and manager views; validates user/role against vehicle access.
  • Integration Layer (on-prem)

    • JMS (TIBCO EMS) queues with main + DLQ + replay workers.
    • SOAP/XML partner connectors for insurers and police notifications; REST for internal UIs.
    • Custom Dashboards built on JMX-exposed metrics with AppDynamics/Dynatrace integration for SLA and error tracking.
  • Data & Storage

    • Oracle for canonical fleet, incidents, allocations, repairers, insurance profiles.
    • Append-only audit tables for decisions, rule versions, and partner payload hashes.
    • NAS-backed media service for evidence image URLs referenced from incidents.
  • Governance & Observability

    • Service catalog with versioned contracts; backward-compatible schemas.
    • PII scoping and masked logs; structured auditing for insurer/police evidence packs.
    • Runbooks for DLQ triage, replay, and rule rollbacks.

Diagram 1 - Context Diagram — Fleet & workforce backbone with partner orchestration

Context Diagram — Fleet & workforce backbone with partner orchestration

Diagram 2 - Sequence — Mobile FNOL to insurer/police with async progress updates

Sequence — Mobile FNOL to insurer/police with async progress updates

Diagram 3 - Operations — JMS DLQ/replay, rule lifecycle, and idempotency keys

Operations — JMS DLQ/replay, rule lifecycle, and idempotency keys


Process Flow

  1. Planner or Mobile App requests an assignment → Orchestration invokes Allocation to find eligible driver + vehicle + loadout using Drools rules.
  2. Driver access validation (role + vehicle link) and license/permit checks run synchronously; allocation is persisted; notifications go via JMS.
  3. Engineer departs; telemetry/status updates mark in-service and track SLA clocks.
  4. Incident (FNOL) raised in Mobile App → Orchestration calls Incident PL/SQL package to create the case, store image URLs, and capture police details.
  5. Orchestration posts insurer notification over SOAP/XML via JMS; awaits reply or timeout; retries per policy.
  6. Repairer authorization flows back (estimate received → authorised → repair start/completion); progress is exposed to Mobile/Manager via Progress API with user/role filters.
  7. Vehicle lifecycle transitions (replacement vehicle issued, roadworthy check, return to service) update Fleet Core; allocation reconciliation ensures inventory/tooling match.
  8. Dashboards reflect p95 assignment time, FNOL completeness, repair cycle times, and SLA adherence; DLQ items are triaged/replayed.

Outcomes

  • Higher utilization via right-vehicle/right-loadout and fewer misassignments.
  • Improved SLA adherence through governed FNOL and repair orchestration with retries/callbacks.
  • Auditability & compliance with driver eligibility checks, liability flags, and evidence linkage (image URLs).

Strategic Business Impact

  • +5–9% fleet utilization (Modeled) — assumes baseline idle time, mix of job types, and reduction in misassignments after rule enforcement.
  • −20–35% repair cycle variance (Proxy) — stabilization from standardized insurer/repairer steps and DLQ/replay discipline.
  • +8–12% SLA on-time arrivals (Modeled) — driven by allocation latency reduction and eligibility gating.

Role & Scope

Owned architecture & delivery of the orchestration layer, Spring services (Fleet Core, Allocation, Driver, Incident), Oracle schema extensions, JMS patterns (DLQ/replay), partner SOAP connectors, mobile/manager APIs, rule governance, and runtime SLO dashboards.


Key Decisions & Trade-offs

  • API-led orchestration over point-to-point calls → adds a layer but yields uniform SLAs, retries, and auditing.
  • On-prem VMware + Oracle + JMS to fit 2017–2019 constraints → slower to scale than cloud, mitigated with horizontal VM scaling and connection pooling.
  • SOAP/XML partner adapters retained for insurers/police → canonical mapping reduces coupling at the cost of transformation logic.
  • Image URLs over binary upload for FNOL → faster on unreliable networks; demands URL governance and expiry.
  • Drools rules for eligibility → transparency and fast iteration; requires versioning and test harness.
  • Read-optimised projections for allocation lookups → extra storage/ETL but sub-second selection.

Risks & Mitigations

  • Missed partner callbacks / timeouts → JMS DLQ + replay, exponential backoff, and partner health scoring.
  • Data quality (licenses, fitment, inventory) → nightly reconciliation jobs and exception queues with owner assignment.
  • PII exposure in logs → structured logging with field masking and redaction; restricted evidence access.
  • Rule drift / regressions → rule version pinning per release; golden-path test packs and canary evaluation.
  • Oracle contention under spikes → connection pools, partitioning for hot tables, and batching for progress writes.

Suggested Metrics (run-time SLOs)

  • p95 allocation decision time (request→reservation).
  • p95 FNOL create time and evidence link attach rate.
  • Repair status transition latency (estimate→authorised→start→complete).
  • Driver eligibility failure rate (by rule category) and override count.
  • Callback success rate & JMS DLQ depth / replay age.
  • Incident progress API latency and role-validation error rate.

Closing principle

Build one governed backbone for assets and people—let rules, not habits, decide who drives what, where, and when.


Ready to take your idea to the next level? Let's work together.