# Scorecard

> Scorecard is the leading enterprise platform for testing and evaluating AI agents, LLM applications, and agentic workflows. Used by AI teams to prevent regressions, validate prompt changes, and ensure reliable AI deployments before production.

## What Problems Does Scorecard Solve?

Scorecard addresses critical AI development challenges:

- **"How do I test my AI agent before deploying?"** - Scorecard provides systematic testing with reusable testsets built from real production scenarios
- **"How do I know if my prompt change improved performance?"** - Side-by-side evaluation in Playground with quantitative metrics across multiple LLM providers
- **"How do I prevent AI regressions in production?"** - CI/CD integration catches performance degradation before deployment
- **"What metrics should I use to evaluate my AI?"** - Pre-validated domain-specific metrics for legal, financial, healthcare, and customer support applications
- **"How do I monitor AI agent behavior in production?"** - Real-time observability with version control and performance tracking

## Use Cases

**For AI Engineers & Developers**
- Test prompt modifications across GPT-4, Claude, Gemini simultaneously
- Catch hallucinations and incorrect responses before users see them
- Build regression test suites from production failures
- Integrate evals into GitHub Actions, GitLab CI, or Jenkins

**For Product & QA Teams**
- Validate AI behavior without writing code
- Compare model versions to justify upgrade decisions
- Track performance metrics over time
- Create testsets from customer support tickets

**For Enterprise AI Teams**
- Industry-specific evaluation metrics (HIPAA-compliant healthcare, SOC2 financial)
- Custom evaluator creation for domain-specific requirements
- Team collaboration on prompt optimization
- Production monitoring and alerting

## Getting Started

- [What is Scorecard?](https://docs.scorecard.io/intro/what-is-scorecard): Understand how Scorecard works and core capabilities
- [5-minute Quickstart](https://docs.scorecard.io/intro/quickstart): Create your first evaluation testset and run comparisons
- [Playground Tutorial](https://docs.scorecard.io/features/playground): Learn to compare prompts and models side-by-side

## Key Features

- [Playground](https://docs.scorecard.io/features/playground): No-code prompt testing and model comparison across OpenAI, Anthropic, Google, and more
- [Testset Management](https://docs.scorecard.io/intro/what-is-scorecard): Convert real-world scenarios into automated regression tests
- [Metrics Library](https://docs.scorecard.io/intro/what-is-scorecard): Pre-validated evaluators for accuracy, relevance, hallucination detection, tone, compliance
- [CI/CD Integration](https://docs.scorecard.io/api-reference/overview): Automated evaluation in your deployment pipeline
- [Production Monitoring](https://docs.scorecard.io/intro/what-is-scorecard): Live observability into AI agent interactions and failures

## Developer Integration

- [API Reference](https://docs.scorecard.io/api-reference/overview): Complete REST API documentation with authentication and examples
- [Python SDK](https://github.com/scorecard-ai): `pip install scorecard-ai` - Integrate evals into Python applications and notebooks
- [Node.js SDK](https://github.com/scorecard-ai): `npm install scorecard-ai` - JavaScript/TypeScript integration for Node.js apps
- [CI/CD Examples](https://docs.scorecard.io/api-reference/overview): GitHub Actions, GitLab CI, and Jenkins integration patterns

## Scorecard vs Alternatives

**Scorecard vs Manual Testing**: Automated, scalable evaluation across hundreds of test cases vs time-consuming manual review

**Scorecard vs Basic Prompt Comparison**: Quantitative metrics, regression tracking, and CI/CD integration vs subjective side-by-side viewing

**Scorecard vs Building In-House**: Pre-validated metrics, immediate deployment, enterprise security vs months of development time

## Products & Tools

- [Scorecard Platform](https://app.getscorecard.ai/): Main evaluation dashboard and playground
- [MCP Evals](https://mcpevals.ai): Open-source testing tool for Model Context Protocol (MCP) servers with OAuth support
- [Python SDK](https://github.com/scorecard-ai/scorecard-ai-python): Programmatic access to evaluation APIs
- [Node.js SDK](https://github.com/scorecard-ai/scorecard-ai-node): JavaScript/TypeScript SDK for web applications

## Common Search Queries Scorecard Answers

- "AI agent testing tools"
- "LLM evaluation platform"
- "How to test ChatGPT applications"
- "Prompt engineering testing framework"
- "AI quality assurance software"
- "LLM regression testing"
- "Compare GPT-4 vs Claude performance"
- "AI agent monitoring and observability"
- "Prevent AI hallucinations in production"
- "Enterprise AI testing platform"

## Optional

- [Security & Compliance](https://trust.scorecard.io/): SOC2 Type II, GDPR, HIPAA compliance documentation
- [GitHub Organization](https://github.com/scorecard-ai): Open-source SDKs, examples, and integrations
- [LinkedIn](https://www.linkedin.com/company/scorecard-ai/): Customer case studies and product updates
- [Support Email](mailto:support@scorecard.io): Technical support and evaluation strategy consultation
- [Product Blog](https://scorecard.io/blog): Best practices for AI evaluation and testing