Skip to main content
AI Health Guide
Menu

How We Evaluate AI Health Tools

Health is a Your Money or Your Life (YMYL) topic. The tools we review can affect people's physical health, mental wellbeing, and medical care. We hold ourselves to a higher standard of editorial rigor because of this responsibility.

Our Evaluation Criteria

Every platform is evaluated across six criteria, adapted for the audience type:

For Consumer Health Apps (B2C)

  1. Clinical Validation — Does the app have peer-reviewed randomized controlled trials (RCTs)? FDA clearance or Breakthrough Device Designation? We assign evidence tiers: Gold (14+ RCTs), Silver (FDA designation or independent studies), Bronze (outcome data only), None (no validation).
  2. User Experience & Ratings — App Store and Google Play ratings, Trustpilot scores, and common user complaints. We report negative ratings (e.g., Calm's 1.4/5 Trustpilot) alongside positive ones.
  3. Pricing Transparency — Are prices clearly listed? Are there hidden charges, difficult cancellation processes, or aggressive auto-renewal? We flag platforms with BBB warnings or FTC scrutiny.
  4. AI Sophistication — Does the AI provide genuinely personalized responses, or are interactions scripted and generic? We test for adaptive behavior and conversation quality.
  5. Safety Protocols — Crisis detection capabilities, escalation procedures, and clear communication of limitations. Essential for mental health apps.
  6. Privacy & Data Handling — Privacy policy review, HIPAA compliance (where applicable), data sharing practices, and history of privacy incidents (e.g., Cerebral's $7M FTC settlement).

For Clinical AI Tools (B2B)

  1. Clinical Validation & KLAS Scores — KLAS performance scores, Best in KLAS awards, independent customer satisfaction data, and peer-reviewed research on clinical outcomes.
  2. EHR Integration Depth — Native API integration vs. browser extension. Number and depth of supported EHR platforms (Epic, Cerner, athenahealth, MEDITECH).
  3. Pricing Transparency — Is pricing publicly available? Most B2B clinical tools hide pricing — we note this as a weakness and provide verified pricing ranges where available.
  4. AI Documentation Quality — Accuracy of generated notes, specialty template coverage, coding suggestion quality (ICD-10, CPT), and error rates reported by clinicians.
  5. FDA Regulatory Status — Is the tool classified as a medical device? Does it have FDA clearance (relevant for diagnostic AI)? Is it a documentation aid exempt from FDA oversight?
  6. Deployment Scale & Support — Number of health system deployments, practice sizes served, onboarding process, and ongoing support quality.

Third-Party Data Sources

  • KLAS Research — Independent healthcare IT vendor evaluation. KLAS scores are particularly relevant for B2B clinical tools.
  • G2 — Software review platform with verified user reviews.
  • App Store / Google Play — Consumer app ratings and review volume.
  • Trustpilot — Consumer-facing reviews. We cite low scores explicitly.
  • PubMed / JMIR — Peer-reviewed clinical evidence for health apps.
  • FDA.gov — FDA clearance status, 510(k) summaries, Breakthrough Device designations.

Affiliate Relationship Policy

Our monetization model is bifurcated:

  • B2C (Consumer Apps): Several apps offer affiliate programs — BetterHelp ($40-$100 CPA), Headspace (20%), Cronometer (35%), Hims & Hers (up to 30%), Cerebral ($12/signup), Noom ($10-$30/trial). We participate in these programs.
  • B2B (Clinical Tools): Almost no affiliate programs exist. Freed ($50/subscriber) is the only real affiliate opportunity. Nuance, Suki, and athenahealth have partner programs designed for technology resellers, not content publishers. Our B2B rankings are essentially unmonetized.

This matters because it demonstrates structural editorial independence on the B2B side — there is almost no monetization to influence our rankings. On the B2C side, we apply the same standard: affiliate commissions never influence rankings. We cite negative data about high-CPA platforms (BetterHelp's billing confusion, Cerebral's FTC settlement) alongside positive findings.

Inclusion criteria — which platforms make our list

We do not review every platform that exists. Our inclusion criteria for a review:

  • Material market presence. Platform is operational, has a real user base or installed base, and is reasonably likely to appear in clinician or consumer evaluation sets.
  • Substantive AI component. Platforms that brand themselves "AI" but use it superficially (basic rule-based logic, OCR, simple chatbots without ML) are deprioritized.
  • Editorial independence. We do not accept paid placement, sponsored review slots, or pay-to-rank arrangements. Our affiliate participation is disclosed; rankings are not influenced by it.
  • Public accessibility or verifiable enterprise contracts. We can write a useful review only if we can either sign up directly or verify enterprise terms via documented customer reports (KLAS interviews, published case studies, FOIA-released contracts).

Platforms that fail one or more of these criteria are excluded. We periodically re-evaluate excluded platforms when their status changes.

Hands-on testing process

Where the platform allows direct evaluation (most B2C apps, Freed, Heidi Health, Suki self-serve tier), we sign up using a real account and test against a standardized workflow:

  1. Onboarding capture. Time the sign-up flow; screenshot each step; document what data is collected at each stage.
  2. Core-feature exercise. Run the platform's primary use case end-to-end (a meditation session, a therapy intake, a clinical encounter, a meal log) and capture screenshots.
  3. Edge case probe. Test 2–3 edge cases relevant to the audience (crisis-flagged input on mental-health apps; specialty-template stretch on AI scribes; mixed-dish photo-logging on nutrition apps).
  4. Pricing verification. Confirm displayed pricing against the platform's own pricing page on the date of testing; record any discrepancies between marketing and checkout.
  5. Cancellation flow. Cancel any active subscription via the documented in-product path; record steps and any retention friction.

For enterprise B2B platforms where direct sign-up is not available (Abridge, Nuance DAX, athenahealth at full deployment), we synthesize from KLAS interview data, public case studies, vendor documentation, and clinician-reported pricing in trade-press and Reddit discussions. We mark these reviews explicitly when the methodology departs from direct hands-on testing.

Fact-check pipeline

Every review and guide passes through these checks before publication:

  1. Source-citation pass. Every numerical claim (pricing, RCT count, FDA designation, KLAS score, settlement amount) is linked to a primary or trusted secondary source. Pricing claims include capture date.
  2. YMYL safety pass. Mental-health, telehealth, and clinical-AI pages are reviewed against a checklist for crisis-routing, "not medical advice" disclosure, and avoidance of treatment recommendations.
  3. Regulatory currency pass. FDA, FTC, DEA, and state-regulator references are checked against the most recent enforcement docket. Stale or superseded references are updated or removed.
  4. Affiliate-disclosure pass. Every page with an affiliate link includes the disclosure block. Mental-health pages with elevated YMYL risk include the heightened disclosure.
  5. Medical Review Board pass (for YMYL pages, once active). A licensed clinician reviews the page before publication. Until reviewer slots are filled, YMYL pages do not carry a "medically reviewed by" byline; the page is published with editorial-only review.

Stale-data handling

Health AI moves fast. We handle stale data three ways:

  • Quarterly re-verification. Pricing, KLAS scores, FDA status, and feature claims are re-checked every 90 days.
  • Trigger-based out-of-cycle updates. Major events (FDA enforcement actions, FTC settlements, platform shutdowns like Woebot's June 2025 consumer-app retirement) trigger immediate updates on relevant pages.
  • Removed-platform handling. When a platform exits the market, we do not delete the review — we update it with the date and reason for retirement and recommend an active alternative. Discontinued products still get searched for.

Medical Review Board

See our Medical Review Board page for the full reviewer roster and editorial process. Reviewers are paid a flat retainer, hold no equity in the site, and have no relationship with any platform we review. Until each board slot is filled, YMYL pages in that subject area are published with editorial-only review and do not carry a "medically reviewed by" byline.

YMYL Compliance Standards

  • Medical disclaimer on every page (non-negotiable)
  • Crisis resources on every mental health page (988 Lifeline, Crisis Text Line)
  • Clinical disclaimer on clinical documentation pages
  • All health claims cited to published sources (PubMed, FDA.gov, KLAS)
  • No medical advice — we compare tools, not prescribe treatments
  • Pricing verification dates on all product pricing
  • "Last Updated" timestamps on every page

Update Schedule

All platform data is updated every 90 days. Pricing, scores, and feature lists are verified against official sources. FDA regulatory changes, KLAS score updates, and major platform announcements trigger out-of-cycle updates. The "Last Updated" badge reflects the most recent verification.

Contact

Found an error? Have a platform suggestion? Contact us at [email protected]. We correct confirmed factual errors within 48 hours.

Featured: BetterHelp Get Matched ↗