May 30, 20233 min read

The first time you integrate AI into a production healthcare product, you discover very quickly that the margin for error is different from a typical web application. This is the account of our first AI feature in a clinic management system, what we got right, what nearly went wrong, and what we now do differently as a result.

What We Built and Why

The client was a multi-branch clinic running a proprietary patient management system we had built for them the year prior. The intake process - new patient registration, symptom collection, triage priority assignment - was handled by a combination of front desk staff and paper forms, then entered into the system manually.

The brief was to reduce the manual data entry load on front desk staff and flag high-priority patients faster. We scoped an AI-assisted intake: a structured conversation that collects symptoms and patient history through a guided interface, with a language model processing the responses and generating a structured intake summary for the attending staff.

The feature does not diagnose. It summarizes and flags. That distinction was fundamental to everything that followed.

What We Got Right

The most important early decision was defining what the AI could and could not say. We wrote an explicit constraint set before we wrote a single prompt: the system could summarize what the patient reported, it could flag keywords associated with high-priority conditions according to the clinic's own triage criteria, and it could generate structured data for staff review. It could not make diagnostic statements, suggest medications, or present any output as clinical assessment.

This sounds like obvious caution. In practice, it required pushing back on some initial scope ideas from the client who wanted the system to "tell staff what was probably wrong." That framing was not appropriate for this feature, and we had to explain why clearly enough that the client agreed to reframe it.

The second good decision was keeping a human review step mandatory. Every AI-generated intake summary went to a front desk staff member before it touched the patient record. The staff member could edit, reject, or approve the summary. The AI was a drafting tool, not an autonomous input.

What Almost Broke Us

We underestimated the variability of patient language. Our initial prompts were tuned on structured symptom descriptions. Real patients describe symptoms in ways that are colloquial, sometimes vague, sometimes in Tagalog, sometimes mixing languages mid-sentence. The model's outputs on highly unstructured input were significantly worse than on our test cases.

The fix required a better pre-processing step and more careful prompt engineering than we had initially scoped. We had not budgeted sufficient time for this, and the feature launch was delayed by three weeks while we worked through it.

We also had one incident during testing where the model generated a summary that technically fit within our constraints but used phrasing that the clinic's medical director felt was too clinical in tone - implying more certainty than a patient summary should carry. We revised the output formatting to make the provisional nature of the summary explicit in every output.

What We Do Differently Now

When we scope AI solutions for healthcare or any high-stakes workflow, we now include an explicit "red line" session in discovery. We work with the client to define the exact outputs that are acceptable and the exact outputs that are not, before we write a prompt. That session creates the constraint document that guides prompt engineering and testing.

We also scope language variability as a first-class concern, not a secondary edge case. If a product will be used by people who do not communicate in formal, structured language, the AI feature needs to be tested against that full range of input before it goes live.

The feature is working well now. Front desk processing time dropped measurably and high-priority cases are flagged more consistently. It took longer to get there than we planned, but it is reliable.

If you are thinking about an AI integration for a clinical or operational workflow, the most important conversation is about constraints and error modes before it is about capabilities.

Start a project →

Need this built for your business?

Let's scope it together.

Start a project

Shipping Our First AI Feature Inside a Clinic Product

What We Built and Why

What We Got Right

What Almost Broke Us

What We Do Differently Now

Let's scope it together.

Read next