Fortune 500 Logistics Provider — Tender Orchestration Gateway

Summary
Built an API-led Tender Orchestration Gateway on Azure APIM + AKS that normalizes partner tender requests, returns immediate acknowledgements, and delivers final results asynchronously via secure callbacks. Per-customer mappings are externalized (no code changes), and onboarding is governed through subscription keys and OAuth2 — enabling faster partner activation and steadier fulfillment.
Problem
- Partner/aggregator payloads varied widely; small changes broke integrations and created tender fallout.
- Some partners required async response patterns; teams lacked a clean way to acknowledge immediately and deliver results later without timeouts.
- Onboarding new partners was slow: each needed subscription keys, customer-specific mappings, and environment-specific endpoints.
Solution Mechanics
Primary pattern: API-led orchestration (Java + Spring Boot on AKS).
-
Entry & security
- Azure API Management front door with OAuth2 client credentials and subscription key headers, enforced per environment (external managed gateway).
- Immediate ACK on tender create/cancel; long-running work shifts to async flow.
-
API Orchestration Layer (AKS / Spring Boot)
- Tender Gateway: validates request against published OpenAPI, routes create/cancel, and correlates responses.
- Mapping Engine: loads per-customer Jolt specs (request/response/error) keyed by subscription; mapping files live in Blob Storage so changes don’t require redeploys.
- Callback Handler: exposes secure notify endpoints for final tender results; correlates and persists outcomes.
-
Downstream & data
- Calls internal Tender/TMS APIs; where required, JSON→XML translation is applied before hitting the transportation system.
- Azure SQL for audit, correlation, and idempotency records.
- Azure Service Bus for reliable fan-out to CRM/analytics and for callback retries (DLQ + replay).
- Azure Monitor / App Insights for logs/metrics across namespaces and clusters.
-
Onboarding & governance
- Partners/aggregators are onboarded with Liable Party IDs and APIM subscription keys; requests must carry the key in headers.
- Contract-first API; per-partner differences live in mapping specs (request/response/error) rather than code.
Diagram 1 - Context Diagram — Tender orchestration gateway on Azure
Diagram 2 - Sequence — Tender create/cancel with async notify
Diagram 3 - Config & Onboarding — Keys and mapping specs
Process Flow
- Partner/aggregator sends tender create/cancel to Azure APIM with OAuth2 token and subscription key; gateway validates and immediately acknowledges the request.
- APIM forwards to Tender Gateway (AKS). The gateway validates schema, assigns a correlation ID, and persists intake metadata.
- Mapping Engine fetches the customer’s Jolt mapping JSONs (request/response/error) based on the subscription key; request is transformed to the internal tender format.
- For create, gateway calls internal Tender/TMS API; for cancel, it invokes the cancel endpoint with identifiers (e.g., SCAC + tenderId).
- The call flow then becomes asynchronous: final tender result is produced later by the back-end process.
- The back-end posts to a notify endpoint (Callback Handler). Handler verifies headers, correlates to the intake record, and saves the result.
- Handler publishes events to Azure Service Bus for CRM/analytics; failures land in DLQ for replay.
- Observability: teams review logs/metrics in Azure Monitor/App Insights and AKS logs; environment URLs follow the APIM/AKS conventions.
Outcomes
- Reduced tender fallout via contract-first intake, enforced headers, and per-customer mapping outside code. (Proxy; mapping defects detected early vs. runtime.)
- Predictable partner experience: immediate ACKs and async callbacks prevent gateway timeouts and replays. (Verified in environments using notify endpoints.)
- Faster onboarding: subscription-key onboarding + Blob-stored mappings shorten partner activation cycles. (Proxy; onboarding steps codified in APIM docs and mapping procedure.)
Strategic Business Impact
- Steadier fulfillment pipeline (Proxy): fewer dropped/corrupted tenders improve downstream planning.
- Partner onboarding speed-up (Modeled): contract discipline + externalized mappings reduce lead time for new partners.
- Lower support load (Proxy): immediate ACK + notify pattern cuts “where is my tender?” tickets.
Method tags: Verified (observed in env tests), Modeled (capacity/flow sims), Proxy (leading indicators: mapping errors found pre-deploy, callback success rate).
Role & Scope
Owned architecture for APIM policies, AKS services (Gateway, Mapping Engine, Callback Handler), mapping governance (Blob), SQL schema for audit, Service Bus topics/queues, and observability; aligned onboarding steps and headers with platform guidance.
Key Decisions & Trade-offs
- API-led front door vs direct partner→TMS integration: contract stability & security at the cost of an extra hop.
- Externalized mappings (Jolt in Blob) vs code transforms: faster changes, but requires strong versioning and tests
- Async notify vs synchronous blocking: resilient under partner/TMS latency, but demands correlation and idempotency.
- Azure Service Bus for fan-out/retries vs direct writes: operational safety with DLQs, traded for extra components.
- Environment URL discipline for APIM/AKS to reduce drift across UNT/UAT/PRD.
Risks & Mitigations
- Missing/incorrect mapping specs → schema validation in CI, sample payload tests, and blue/green mapping rollout.
- Partner headers misconfigured (OAuth2/keys) → APIM policy checks + clear 4xx with remediation hints.
- Callback lost or duplicated → signed callbacks, idempotent upserts, Service Bus DLQ + replay tooling; alert on gaps.
- TMS latency/format mismatch → timeouts and retries set per adapter; JSON↔XML validation before dispatch.
Suggested Metrics (run-time SLOs)
- Intake p95 (APIM→Gateway ACK) and error rate by partner.
- Callback delivery success and end-to-end tender turnaround p95.
- Mapping failure rate (request/response/error), Blob version drift incidence.
- Service Bus: queue depth, DLQ size, retry success.
- Environment conformance: % traffic using correct APIM/AKS base URLs.
Closing principle
Contract-first gateway, async by default. Lock payloads and headers at the edge, keep mappings outside code, and deliver results via reliable callbacks.