The 1% Problem

How domain expertise + Claude let a 2-person team hit #1 on a global classification benchmark.

11:30 - 12:00 Gahee Seo / Federation Founder stage IMG_7424 to IMG_7430

Index

This v3 page reconstructs the visible slide content from IMG_7424, IMG_7425, IMG_7426, IMG_7427, IMG_7428, and IMG_7430. IMG_7429 is not present in the source photo folder. Original images are appended beneath each reconstructed slide and can be clicked to expand.

Session Frame

Source: session page

The session describes Federation's approach to HS classification in trade compliance: a small team using Claude plus domain expertise to compete on a global classification benchmark where wrong answers can be expensive.

Original Code w/ Claude session page

Each System Needed A Different Capability

Source: IMG_7424

System The challenge Claude capability What failed before
HS code GRI 1-6 reasoning + citation discipline Long context + structured reasoning + reliable citations Other AI systems misclassified Kitty Charm.
Regulatory Parse 100+ page CBAM x 4 jurisdictions Extended context, multi-jurisdiction reasoning Smaller models hallucinated citations. Rule engines could not handle ambiguity.
Supply chain 80-step chain: HFI → WGI → 5-factor → sourcing Multi-step instruction following + reliable tool use LLM-wrapped automation tools silently halt at step 3-4. Not errors. Just stops.
Trade assistant Loop: Think → Act → Observe → Decide; know when to stop Agent loop discipline + self-verification Others default to "do everything" without compliance discipline.

Single Agent Worked, But Took Too Long

Source: IMG_7425

Single agent

Classified correctly, but took 6+ minutes per answer.

Decomposed parallel

Target path shown as approximately 30 seconds.

The problem

  • Single agent classified correctly - just unusably slow.
  • 1-week broker baseline meant the system had to be orders of magnitude faster.

The lesson

  • Decompose by stage: Section → National.
  • Each stage gets its own retrieval context.
  • Parallel where independent, sequential where dependent.

The System Did Not Fail, Customer-Written Descriptions Did

Source: IMG_7426

Supplier invoice: "Industrial machinery part"

System classifies: HS 8479.89 - generic machinery

Actual code: HS 8418.69 - medical refrigeration

The problem

  • The model did not fail. The input did.
  • Customers write in shorthand they understand, not HS-grade specificity.

The lesson

  • Build an input-quality gate as its own agent.
  • When ambiguous, ask follow-up or flag for human review.
  • Self-aware system > confident system.

Delete The Stuffed Prompt

Source: IMG_7427

The team first put every trade rule into the system prompt, then removed that approach.

Stuffed prompt

  • WCO rules, TBT, FTA matrix, and all regulations compliance rules.
  • Opaque. One edit broke unrelated cases.
  • Slow and expensive.

Retrieval + verifier

  1. Agent query
  2. Knowledge base returns
  3. Verifier agent cross-checks citations
  4. Output: code + citations + trace

Demo Output: Final Classification Decision

Source: IMG_7428

The photo shows the Federation assistant producing a final HS classification decision with rationale, factors, citations, and tariff summary.

FieldVisible value
Final HS code8479.50.00.00
DescriptionMachines and mechanical appliances having individual functions, not specified or included elsewhere in this chapter; parts thereof.
Decision rationaleClassified under subheading 8479.50 as an industrial robot not elsewhere specified or included.
Key determining factorsMulti-purpose programmable industrial robot arm; absence of a more specific heading in Chapter 84 or 85; Section XVI and Chapter 84 classification rules.
Tariff summaryVisible text states a 2.5% MFN tariff rate for imports into the United States.

Five Lessons Learned

Source: IMG_7430

Decompose by stage

Match model to profile

Do not trust input

Outside knowledge

Teach the workflow