From the seven candidate areas in VOG's bottleneck worksheet, we recommend a shipping-document cross-check pilot covering PI, Commercial Invoice, Packing List and Import Certificate. It is the highest-value, lowest-risk first move — and it builds the foundation that later unlocks finance follow-up, supplier tracking and reporting.
The biggest source of manual checking in the worksheet is also the cleanest to scope: one team, one document set, one acceptance metric — and the same engine later powers the next phases.
Largest, most-reported pain. Document work is 21 items / ~124 hrs per week — the team's biggest category. Cross-check pain is named independently by Sophia, Jessica, Eva and Kelly, so it is structural rather than personal.
Bounded, measurable, easy to hand over. Read → compare → a person approves. No ERP write-back, no full automation. Acceptance is a number on an evaluation set, not a feeling.
Builds a reusable foundation. The DCSA-aligned extraction this pilot produces is exactly what later unlocks finance follow-up (C), supplier tracking (D), and a real dashboard (G).
De-risked by precedent. A 2024 production deployment at a Silicon Valley fintech runs the same extract-and-cross-check pattern on heterogeneous shipping documents, confirming feasibility at production scale.
Each area scored 1–5 against six criteria (business value ×3, feasibility ×2, time-to-value ×1, measurability ×2, scope & handover ×1, strategic leverage ×2). Max 55. Bars below are the resulting weighted totals.
End-to-end, six steps. The model reads, code compares, a person decides — no step does more than one job.
The comparison checks these consistency points across every shipment. They are the contract between documents — and the basis of acceptance testing.
Discovery is a real step with an exit gate, not a kickoff. The build clock starts only when sample documents, schema, access, a single point of contact and tooling are confirmed.
The biggest risk to the timeline is not work effort but elapsed time — many stakeholders, scattered documents, format surprises. The gated discovery and a single point of contact at VOG turn this from silent overrun into a visible client dependency: if those items are delayed, the timeline shifts via change control, not by squeezing the build window.
| Phase 1 component | Low | High | Mid | Notes |
|---|---|---|---|---|
| Phase 0 — Discovery & setup (gated) | 3 | 5 | 4 | Samples, schema, access, SPOC, tooling |
| Core A — extraction + comparison + review UI | 13 | 20 | 16.5 | First cut: 1 doc pair × 1 supplier lane |
| Add-on — thin reporting view | 2.5 | 3.5 | 3 | Rides on the structured output |
| Cross-cutting — eval, testing, docs, handover | 3 | 5 | 4 | Including team training |
| Contingency / coordination buffer | 2 | 3 | 2.5 | Scan quality, ERP access unknowns |
| Phase 1 total (developer-days) | 23.5 | 36.5 | ≈ 30 | ~25–30% Tomek · ~70–75% Lulu |
Targets are proposed below and confirmed jointly with VOG during Phase 0 against an agreed evaluation set built from real historical documents (including the known problem cases).
Acceptance is signed off when all proposed targets are met on the agreed eval set and the workflow has been demonstrated end-to-end with a VOG reviewer. A regression set of top templates (IXS / HY / Bangladesh) is run before any prompt change ships.
Mapped one-to-one to the items the client asked the Statement of Work to define. The full text lives in the Phase 1 Assessment document.
Shipping-document cross-check for PI / Invoice / Packing List / Import Certificate. First cut: 1 doc pair × 1 supplier lane, digital-first, locked at the end of Phase 0.
Targets in section 05 above, met on the agreed eval set, plus end-to-end demonstration with a VOG reviewer. Final targets fixed in Phase 0.
Workflow, prompts, schema, review interface, reporting view, dashboards, code, documentation, configurations and training materials. IP assigns to VOG on payment.
Not included: third-party LLM / cloud subscriptions (VOG-owned account) and any travel.
Designed so high scores are good for VOG and ATS alike — the best first pilot maximises client value while minimising delivery risk.
These cross-cutting items most affect the estimate. They are sent to the client during Phase 0 and the answers feed the SOW.
For cross-checking against the VOG bottleneck worksheet. Each entry points to a sheet, item number and priority score so the assessment can be verified at the source.
| Aggregate | 'Team Overview' / '團隊總覽' sheets — 139 items, 11 staff · ≈ 711 hrs/wk · 28 High AI-fit · 39 scored ≥ 15 · Documents 21 items / 124h, Finance 49h, Email 38.5h. |
|---|---|
| Method caveat | 'Team Overview EN' ranks 1, 2, 6, 7, 8, 16 = Ken (design / CAD), score 18–22, AI-fit Low / "No" — the hours-weighted bias the assessment corrects for. |
| A | Sophia #18 IXS IC+PI+HY PL cross-check, ≈ 10 h/wk, score 17; document revised 6× · Jessica #5 Invoice / PL compare, 3 h/wk, 'Team Overview EN' rank 5 / 20.2 · Eva invoice / packing qty, amount, model errors. |
| B | Sophia #2 IN+PL (score 13), #4 customs-declaration check, #5 cert-of-origin check, #6 BL draft, #9 advance-payment PI. |
| C | Jessica #4 payment tracking (rank 9 / 19), #9 accounts receivable (13 / 18.5), #7 bookkeeping review (12 / 18.5) · Jane AR + cash-level alert · Kelly #9 rank 4 / 20.4. |
| D | Jessica #3 shipment tracking (11 / 18.5) · Kelly #5 lead-time · Eva #5 / #21 · Jason data scattered across ERP / Excel / Email / Google / Evernote. |
| E | Jessica #11 complaint triage (14 / 18.5). |
| F | Jessica #1 English email, 8 h/wk, 'Team Overview EN' rank 3 / 21.5 — top high-fit item · Eva email / translation · Noella translation. |
| G | Jessica #3 dashboard · Sophia #10 (score 15) · Jane P&L / cashflow. |