Folium Systems

AI systems for real operations

AI Evaluation Scorecard

Evaluate the whole AI workflow, not only the answer.

A model can sound confident and still fail the business. A useful scorecard checks whether the AI used the right sources, stayed inside its permissions, helped the user complete the workflow, and failed safely when needed.

Guide section

Quality categories

Score the behavior that matters to the actual workflow instead of relying on a single overall score.

  • Task completion and answer usefulness
  • Source grounding and citation strength
  • Safe-tool routing and escalation behavior
  • Staff and customer clarity

Guide section

Critical failure gates

Some failures should block promotion even if the rest of the score looks good.

  • Live-action or unauthorized execution claims
  • Private data or source leakage
  • Unsupported factual claims
  • Unsafe legal, financial, or compliance advice

Guide section

Promotion record

Every candidate should leave a record that explains the tested workflow, version, model or prompt change, known failures, and recommended disposition.

  • Versioned test set and results
  • Latency and reliability checks
  • Browser and user-journey proof
  • Known failures and next action

Interactive resource

Use the guide while you read.

These local controls turn the same resource into a checklist, scorecard, or planning board. Nothing is submitted, stored, or sent to a model.

Task completion and answer usefulness

Quality categories

Source grounding and citation strength

Quality categories

Safe-tool routing and escalation behavior

Quality categories

Staff and customer clarity

Quality categories

Live-action or unauthorized execution claims

Critical failure gates

Private data or source leakage

Critical failure gates

Unsupported factual claims

Critical failure gates

Unsafe legal, financial, or compliance advice

Critical failure gates

Versioned test set and results

Promotion record

Latency and reliability checks

Promotion record

Browser and user-journey proof

Promotion record

Known failures and next action

Promotion record

Start here

Turn the guide into a first proof.

The best next step is a narrow workflow, visible evidence, and a plan your team can explain.

Folium operating standard

Proof should move like machinery, but feel human to operate.

Every Folium path points back to the same discipline: protect the business, make the work visible, give people control, and move only when the evidence is strong enough to carry the next decision.

  1. 01 Understand

    Translate pressure into one workflow the team can explain.

  2. 02 Prove

    Make the future visible before private data or dependency.

  3. 03 Control

    Define owners, permissions, runtime, evidence, and rollback.

  4. 04 Operate

    Improve the system after launch instead of leaving a fragile demo.