Major European Fleet Provider — Fleet Core & Workforce Backbone

Summary
Built a fleet & workforce backbone unifying vehicle lifecycle, allocation & inventory, and driver compliance on a single API-led orchestration layer. The platform coordinated insurers, police, repairers, and mobile apps using SOAP/XML + REST over JMS queues, improving utilization and SLA adherence through right-vehicle/right-loadout and governed FNOL/repair workflows.
Problem
- Operational fragmentation: Separate systems for vehicle lifecycle (acquisition→refuel→repair→resale), allocation, and driver management led to slow dispatch, stranded repairs, and unclear ownership.
- Compliance gaps: Manual checks of licenses/permits/endorsements and vehicle suitability caused misassignments and SLA breaches.
- Partner variability: Insurer and police interactions were SOAP/XML with differing codes and timelines; callbacks were missed; evidence was scattered across emails/drives.
- On-prem constraints: VMware-hosted apps, Oracle persistence, JMS (TIBCO EMS) integration, and legacy SOAP endpoints limited architectural options.
Solution Mechanics
Primary pattern: API-led orchestration (Java + Spring Boot microservices on-prem).
Secondary pattern (targeted): Rules/validation (Drools) for driver/vehicle compliance and loadout checks.
-
API Orchestration Layer (Spring Boot)
- Normalizes REST/SOAP across channels (Planner UI, Mobile App, batch imports).
- Correlation IDs, idempotency keys, timeout/retry budgets, circuit breakers.
- Translates partner-specific SOAP/XML into internal canonical objects.
-
Fleet Core Service (Spring Boot + Oracle)
- Vehicle master, status transitions (in-service, repair, pool, decommission).
- Lifecycle events (acquisition, fueling, maintenance, repairs, disposal).
- PL/SQL packages for incident FNOL, progress queries, image URL capture (URLs stored instead of binaries for 3G/4G reliability).
-
Allocation & Inventory Service
- Computes right-vehicle/right-loadout by job type, terrain, weight, and kit.
- Pulls inventory/vehicle fitment and driver proximity; reserves vehicle and tool kits.
- Publishes allocation events to JMS for downstream notifications.
-
Driver & Workforce Service
- Central driver roster, license/endorsement validity, training/permit expiries.
- Drools rules evaluate assignment eligibility (license class, ADR/HV, towing limits).
- HR/LDAP lookups for role-based access (Driver, Fleet Manager).
-
Incident & Repair Orchestration
- Mobile FNOL → create incident; attach photos via URL collection; capture police details; set liability and roadworthy flags.
- Insurer integration over SOAP/XML via JMS request/reply; repairer authorizations and status transitions (AWAITING ESTIMATE, AUTHORISED, JOB COMPLETE, etc.).
- Progress API for mobile and manager views; validates user/role against vehicle access.
-
Integration Layer (on-prem)
- JMS (TIBCO EMS) queues with main + DLQ + replay workers.
- SOAP/XML partner connectors for insurers and police notifications; REST for internal UIs.
- Custom Dashboards built on JMX-exposed metrics with AppDynamics/Dynatrace integration for SLA and error tracking.
-
Data & Storage
- Oracle for canonical fleet, incidents, allocations, repairers, insurance profiles.
- Append-only audit tables for decisions, rule versions, and partner payload hashes.
- NAS-backed media service for evidence image URLs referenced from incidents.
-
Governance & Observability
- Service catalog with versioned contracts; backward-compatible schemas.
- PII scoping and masked logs; structured auditing for insurer/police evidence packs.
- Runbooks for DLQ triage, replay, and rule rollbacks.
Diagram 1 - Context Diagram — Fleet & workforce backbone with partner orchestration
Diagram 2 - Sequence — Mobile FNOL to insurer/police with async progress updates
Diagram 3 - Operations — JMS DLQ/replay, rule lifecycle, and idempotency keys
Process Flow
- Planner or Mobile App requests an assignment → Orchestration invokes Allocation to find eligible driver + vehicle + loadout using Drools rules.
- Driver access validation (role + vehicle link) and license/permit checks run synchronously; allocation is persisted; notifications go via JMS.
- Engineer departs; telemetry/status updates mark in-service and track SLA clocks.
- Incident (FNOL) raised in Mobile App → Orchestration calls Incident PL/SQL package to create the case, store image URLs, and capture police details.
- Orchestration posts insurer notification over SOAP/XML via JMS; awaits reply or timeout; retries per policy.
- Repairer authorization flows back (estimate received → authorised → repair start/completion); progress is exposed to Mobile/Manager via Progress API with user/role filters.
- Vehicle lifecycle transitions (replacement vehicle issued, roadworthy check, return to service) update Fleet Core; allocation reconciliation ensures inventory/tooling match.
- Dashboards reflect p95 assignment time, FNOL completeness, repair cycle times, and SLA adherence; DLQ items are triaged/replayed.
Outcomes
- Higher utilization via right-vehicle/right-loadout and fewer misassignments.
- Improved SLA adherence through governed FNOL and repair orchestration with retries/callbacks.
- Auditability & compliance with driver eligibility checks, liability flags, and evidence linkage (image URLs).
Strategic Business Impact
- +5–9% fleet utilization (Modeled) — assumes baseline idle time, mix of job types, and reduction in misassignments after rule enforcement.
- −20–35% repair cycle variance (Proxy) — stabilization from standardized insurer/repairer steps and DLQ/replay discipline.
- +8–12% SLA on-time arrivals (Modeled) — driven by allocation latency reduction and eligibility gating.
Role & Scope
Owned architecture & delivery of the orchestration layer, Spring services (Fleet Core, Allocation, Driver, Incident), Oracle schema extensions, JMS patterns (DLQ/replay), partner SOAP connectors, mobile/manager APIs, rule governance, and runtime SLO dashboards.
Key Decisions & Trade-offs
- API-led orchestration over point-to-point calls → adds a layer but yields uniform SLAs, retries, and auditing.
- On-prem VMware + Oracle + JMS to fit 2017–2019 constraints → slower to scale than cloud, mitigated with horizontal VM scaling and connection pooling.
- SOAP/XML partner adapters retained for insurers/police → canonical mapping reduces coupling at the cost of transformation logic.
- Image URLs over binary upload for FNOL → faster on unreliable networks; demands URL governance and expiry.
- Drools rules for eligibility → transparency and fast iteration; requires versioning and test harness.
- Read-optimised projections for allocation lookups → extra storage/ETL but sub-second selection.
Risks & Mitigations
- Missed partner callbacks / timeouts → JMS DLQ + replay, exponential backoff, and partner health scoring.
- Data quality (licenses, fitment, inventory) → nightly reconciliation jobs and exception queues with owner assignment.
- PII exposure in logs → structured logging with field masking and redaction; restricted evidence access.
- Rule drift / regressions → rule version pinning per release; golden-path test packs and canary evaluation.
- Oracle contention under spikes → connection pools, partitioning for hot tables, and batching for progress writes.
Suggested Metrics (run-time SLOs)
- p95 allocation decision time (request→reservation).
- p95 FNOL create time and evidence link attach rate.
- Repair status transition latency (estimate→authorised→start→complete).
- Driver eligibility failure rate (by rule category) and override count.
- Callback success rate & JMS DLQ depth / replay age.
- Incident progress API latency and role-validation error rate.
Closing principle
Build one governed backbone for assets and people—let rules, not habits, decide who drives what, where, and when.