VOG AI Pilot — Phase 1 Assessment at a Glance

Why this is the right first move

The biggest source of manual checking in the worksheet is also the cleanest to scope: one team, one document set, one acceptance metric — and the same engine later powers the next phases.

Largest, most-reported pain. Document work is 21 items / ~124 hrs per week — the team's biggest category. Cross-check pain is named independently by Sophia, Jessica, Eva and Kelly, so it is structural rather than personal.

Bounded, measurable, easy to hand over. Read → compare → a person approves. No ERP write-back, no full automation. Acceptance is a number on an evaluation set, not a feeling.

Builds a reusable foundation. The DCSA-aligned extraction this pilot produces is exactly what later unlocks finance follow-up (C), supplier tracking (D), and a real dashboard (G).

De-risked by precedent. A 2024 production deployment at a Silicon Valley fintech runs the same extract-and-cross-check pattern on heterogeneous shipping documents, confirming feasibility at production scale.

Seven areas, ranked transparently

Each area scored 1–5 against six criteria (business value ×3, feasibility ×2, time-to-value ×1, measurability ×2, scope & handover ×1, strategic leverage ×2). Max 55. Bars below are the resulting weighted totals.

Lead pilot Part of lead Fast-follow (1b) Add-on Phase 2 Defer

A · Document cross-check

PI / Invoice / PL / IC · 13–20 d · Tomek 3–5 / Lulu 10–15

Lead pilotHigh feasibility · clean acceptance

B · Shipping-doc review

Extends A · +3.5–6 d · Tomek 0.5–1 / Lulu 3–5

Part of ASame engine, more doc types

—

C · Finance & payment follow-up

6–10 d · Tomek 1–2 / Lulu 5–8 · ERP-dependent

Fast-follow (Phase 1b)High value · gated by ERP access

F · English email support

1.5–3.5 d · Tomek ≈ 0.5 / Lulu 1–3

Add-onFast · saves 8 hrs/wk · not approval-gated

D · Supplier & delivery tracking

10–15 d · Tomek 2–3 / Lulu 8–12

Phase 2Data consolidation first

G · Reporting / visibility

4–7 d · Tomek ≈ 1 / Lulu 3–6

Add-on / Phase 2Best as a layer on A's output

E · Complaint summaries

3.5–6 d · Tomek 0.5–1 / Lulu 3–5

DeferLow volume → low lead-ROI

The Excel worksheet's own Top-20 is hours-weighted, which puts design/CAD at the top despite being marked low AI-fit. Ranking by AI-fit (above) corrects for that bias — and surfaces the genuine document/finance/email cluster.

What the workflow actually does

End-to-end, six steps. The model reads, code compares, a person decides — no step does more than one job.

Ingest

code

Pick up a shipment's document set (PI, Commercial Invoice, Packing List, Import Certificate) from an agreed location.

›

Classify

model

Identify each document's type and route it to a type-specific extraction prompt.

›

Extract → DCSA schema

model

Read key fields into a standardised JSON record aligned to the DCSA eBL 3.0 trade-document model.

›

Compare · 5 anchors

code

Deterministic field-by-field check against five cross-document anchors. Item-code aliases normalised first (XXL ≡ 2XL).

›

Review & approve

human

A reviewer sees the documents, extracted fields, and any flagged discrepancies side-by-side; approves, rejects or corrects.

›

Log & report

code

Outcome is logged and surfaced in a thin reporting view: flagged vs cleared, by discrepancy category, over time.

Model — language understanding only Code — deterministic, auditable, no hallucination Human — final decision authority

The five cross-document anchors

The comparison checks these consistency points across every shipment. They are the contract between documents — and the basis of acceptance testing.

Reference / PO

primary alignment key

Parties

name · tax id · role

Cargo

desc · HS · qty · weight · volume

Charges

unit × qty = amount · ccy · term

Container / seal

container no. · seal no.

In scope · what it does

Reads documents, extracts fields, flags discrepancies
Presents flagged items for human approval
Writes outcomes to a thin reporting view
First cut: 1 doc pair × 1 supplier lane (e.g. IXS), digital-first

Out of scope · what it does not do

No write-back to ERP or any system of record
No automatic emails or actions without human approval
No action on flagged shipments without a person's decision
Other doc types / supplier lanes added later via add-on B

One month, gated and parallelised

Discovery is a real step with an exit gate, not a kickoff. The build clock starts only when sample documents, schema, access, a single point of contact and tooling are confirmed.

Week 1

Week 2

Week 3

Week 4

Phase 0

Gated discovery

Samples · schema · access · SPOC · tooling

Tomek

Architect

Architecture · DCSA schema · data-source audit

Lulu

Implementer

Extraction · comparison engine

Review UI · reporting view

Both

Acceptance & handover

Eval · acceptance · training · handover

⚑

Why the gate matters

The biggest risk to the timeline is not work effort but elapsed time — many stakeholders, scattered documents, format surprises. The gated discovery and a single point of contact at VOG turn this from silent overrun into a visible client dependency: if those items are delayed, the timeline shifts via change control, not by squeezing the build window.

Phase 1 component	Low	High	Mid	Notes
Phase 0 — Discovery & setup (gated)	3	5	4	Samples, schema, access, SPOC, tooling
Core A — extraction + comparison + review UI	13	20	16.5	First cut: 1 doc pair × 1 supplier lane
Add-on — thin reporting view	2.5	3.5	3	Rides on the structured output
Cross-cutting — eval, testing, docs, handover	3	5	4	Including team training
Contingency / coordination buffer	2	3	2.5	Scan quality, ERP access unknowns
Phase 1 total (developer-days)	23.5	36.5	≈ 30	~25–30% Tomek · ~70–75% Lulu

Acceptance is a number, not a feeling

Targets are proposed below and confirmed jointly with VOG during Phase 0 against an agreed evaluation set built from real historical documents (including the known problem cases).

≥ 95%

Discrepancy recall on the eval set — share of real issues we catch

≥ 90%

Discrepancy precision — share of flags that are real (not false alarms)

False positives on a clean control set — reviewers are not flooded

≥ 70%

Reduction in per-shipment check time vs Phase 0 baseline

≥ 0.95

Field-extraction F1 on the key cross-document anchor fields

Acceptance is signed off when all proposed targets are met on the agreed eval set and the workflow has been demonstrated end-to-end with a VOG reviewer. A regression set of top templates (IXS / HY / Bangladesh) is run before any prompt change ships.

SOW essentials at a glance

Mapped one-to-one to the items the client asked the Statement of Work to define. The full text lives in the Phase 1 Assessment document.

10.1 · Deliverables

What VOG receives

Live cross-check workflow
Review & approval interface
Thin reporting view
Labelled eval set + test report
SOP, prompt library, runbook
Two training sessions (recorded)

10.2 · Selected workflow

The specific pilot

Shipping-document cross-check for PI / Invoice / Packing List / Import Certificate. First cut: 1 doc pair × 1 supplier lane, digital-first, locked at the end of Phase 0.

10.3 · Acceptance

Sign-off basis

Targets in section 05 above, met on the agreed eval set, plus end-to-end demonstration with a VOG reviewer. Final targets fixed in Phase 0.

10.4 · IP ownership

VOG owns everything built

Workflow, prompts, schema, review interface, reporting view, dashboards, code, documentation, configurations and training materials. IP assigns to VOG on payment.

10.5 · Data security

No training on VOG data

AI tool agreed in Phase 0 on a contractual no-training tier
Confidentiality, least-privilege access, residency confirmed
Access revoked at handover

10.6 · Fixed fee includes

What is covered

Phase 0 discovery and setup
Core build of the first-cut workflow
Thin reporting view
Eval set + acceptance testing
Handover pack & two training sessions
Tomek & Lulu time during Phase 1

Not included: third-party LLM / cloud subscriptions (VOG-owned account) and any travel.

10.7 · Out of scope

Explicitly not in Phase 1

ERP write-back / full automation
Doc types or supplier lanes beyond the first cut
Phase 2: C finance, D tracking, full G dashboard
Historical-data cleanup not needed for the pilot
Production SLAs / 24-7 / managed service

10.8 · Change control

How extras get approved

Either party proposes a change
ATS notes impact: days, schedule, fee
VOG approves in writing (email is fine)
Only then does ATS start the extra work

10.9 · Training & handover

So VOG can run it

Session 1 (90 min): operating the workflow
Session 2 (90 min): configuration & maintenance
SOP, recordings, runbook, FAQ
Two-week post-handover hand-holding included

How decisions were made

Scoring criteria & weights

Designed so high scores are good for VOG and ATS alike — the best first pilot maximises client value while minimising delivery risk.

Business value

Hours saved + cost of errors avoided (freight/customs rework, payment disputes)

×3highest

Feasibility / confidence

Capability maturity × data accessibility — discounts high-value but risky items

×2

Time-to-value

Less effort to a working result scores higher

×1

Measurability

Whether a clean acceptance metric exists — protects both sides

×2

Scope & handover

Bounded to one team so VOG can own it after handover

×1

Strategic leverage

Builds a reusable foundation that unlocks later phases

×2

Open questions — what to confirm next

These cross-cutting items most affect the estimate. They are sent to the client during Phase 0 and the answers feed the SOW.

Document format — digital PDF vs scanned ratio?

scopes A

Document volume — per week / month?

ROI

ERP system — API or export available?

gates C / D / G

Where documents live — NAS / Google / email?

access

Approved AI tooling — security & no-training tier?

security

Item-code alias table — does one already exist?

comparison

Excel source references

For cross-checking against the VOG bottleneck worksheet. Each entry points to a sheet, item number and priority score so the assessment can be verified at the source.

Open the source map · sheet · item · score

Aggregate	'Team Overview' / '`團隊總覽`' sheets — 139 items, 11 staff · ≈ 711 hrs/wk · 28 High AI-fit · 39 scored ≥ 15 · Documents 21 items / 124h, Finance 49h, Email 38.5h.
Method caveat	'Team Overview EN' ranks 1, 2, 6, 7, 8, 16 = Ken (design / CAD), score 18–22, AI-fit Low / "No" — the hours-weighted bias the assessment corrects for.
A	`Sophia #18` IXS IC+PI+HY PL cross-check, ≈ 10 h/wk, score 17; document revised 6× · `Jessica #5` Invoice / PL compare, 3 h/wk, 'Team Overview EN' rank 5 / 20.2 · `Eva` invoice / packing qty, amount, model errors.
B	`Sophia #2` IN+PL (score 13), `#4` customs-declaration check, `#5` cert-of-origin check, `#6` BL draft, `#9` advance-payment PI.
C	`Jessica #4` payment tracking (rank 9 / 19), `#9` accounts receivable (13 / 18.5), `#7` bookkeeping review (12 / 18.5) · `Jane` AR + cash-level alert · `Kelly #9` rank 4 / 20.4.
D	`Jessica #3` shipment tracking (11 / 18.5) · `Kelly #5` lead-time · `Eva #5 / #21` · `Jason` data scattered across ERP / Excel / Email / Google / Evernote.
E	`Jessica #11` complaint triage (14 / 18.5).
F	`Jessica #1` English email, 8 h/wk, 'Team Overview EN' rank 3 / 21.5 — top high-fit item · `Eva` email / translation · `Noella` translation.
G	`Jessica #3` dashboard · `Sophia #10` (score 15) · `Jane` P&L / cashflow.