We present an evaluation of AIbert — an autonomous AI agent for Level 0 IT support deployed in a production Slack environment. AIbert leverages a Deep Research engine for semantic search across an internal knowledge base (10 000+ Confluence pages, 15 000 Jira tickets, 45 Git repositories) and generates contextual solution suggestions for Level 1 agents. During a 4-month evaluation on 3 847 support requests, AIbert achieves a ticket deflection rate of 71.3 %, reduces average Mean Time to First Response (MTFR) from 2.4 hours to 12 seconds, and achieves a First Contact Resolution (FCR) rate of 64.8 %. For routine requests (password reset, VPN, software installations), the deflection rate exceeds 85 %. We identify the main limitations: complex multi-system incidents (deflection 23 %) and new, undocumented issues require human escalation. A feedback loop based on Slack reactions continuously improves response quality with a quarter-over-quarter accuracy increase of 8.2 percentage points.
IT support teams face a growing volume of repetitive requests — according to Gartner (2023), routine inquiries account for 60–70 % of all tickets in a typical IT organization. Level 1 agents spend a significant portion of their working time searching for solutions in documentation, historical tickets, and knowledge bases, instead of solving complex problems that require expertise.
Knowledge-Centric Service Management (KCS v6, Consortium for Service Innovation) defines the principle "solve it once, use it many times" — systematic capture and reuse of knowledge during the resolution process. Automation of the Level 0 layer (self-service + AI triage) implements this principle at scale, with conversational AI in IT support environments demonstrably reducing ticket volume by 20–40 % (Forrester Research 2022).
This study evaluates AIbert in a production environment and addresses three questions: (1) What ticket deflection rate is achievable for different request categories? (2) How does response quality change over time thanks to the feedback loop? (3) Where are the boundaries of AI-driven Level 0 support?
Table 1. Knowledge base sources indexed in the AIbert Deep Research engine.
| Source | Content Type | Document Count | Update Frequency |
|---|---|---|---|
| Confluence | Documentation, SOP, architecture | 3,200 pages | Real-time webhook |
| Jira | Historical tickets + comments | 15,000 tickets | Real-time webhook |
| Git repositories | README, docs/, CHANGELOG | 45 repositories | Daily (cron) |
| Runbooks | Operational procedures | 180 documents | On change |
| Slack archive | Historical conversations | 50,000+ messages | Weekly |
| Total indexed | Embedded chunks | 124,600 | — |
Table 2. Distribution of intent categories during the evaluation period (n = 3,847).
| Category | Count | Share | Typical Example |
|---|---|---|---|
| How-to / guide | 1,423 | 37.0% | How to set up VPN? Where can I find the API key? |
| Incident | 1,038 | 27.0% | Deployment is not working. Build fails on CI. |
| Service Request | 846 | 22.0% | I need access to repo X. Password reset. |
| Feedback / bug report | 347 | 9.0% | Dashboard shows incorrect data. |
| Escalation (L1/L2) | 193 | 5.0% | Production is down. I need on-call. |
Table 3. Comparison of key metrics before and after AIbert deployment.
| Metric | Before AIbert | After AIbert (4 mo.) | Change |
|---|---|---|---|
| Mean Time to First Response (MTFR) | 2.4 h | 12 s | −99.9% |
| First Contact Resolution (FCR) | 31.2% | 64.8% | +33.6 pp |
| Ticket deflection rate | 0% | 71.3% | +71.3 pp |
| L1 agent workload (tickets/day) | 48 | 14 | −70.8% |
| CSAT (1–5) | 3.1 | 4.3 | +1.2 |
| Mean Time To Resolution (MTTR) | 4.8 h | 1.2 h | −75% |
Table 4. Confidence score distribution and escalation metrics.
| Confidence Band | Share of Responses | Correctness | Auto-escalation |
|---|---|---|---|
| High (> 0.85) | 42% | 94.2% | No |
| Medium (0.60 – 0.85) | 35% | 78.6% | No (with disclaimer) |
| Low (0.40 – 0.60) | 15% | 52.1% | Optional |
| Very low (< 0.40) | 8% | 31.4% | Automatic |
Table 5. Evolution of response accuracy by month (feedback loop effect).
| Month | Accuracy (%) | Thumbs Up Rate | Escalation Rate |
|---|---|---|---|
| July 2025 (M1) | 62.4% | 58% | 34% |
| August 2025 (M2) | 68.1% | 64% | 29% |
| September 2025 (M3) | 74.6% | 71% | 24% |
| October 2025 (M4) | 78.8% | 76% | 20% |
Table 6. Comparison of AIbert with commercial AI support solutions.
| Solution | Deflection Rate | Knowledge Sources | Integration | Price |
|---|---|---|---|---|
| AIbert | 71.3% | Confluence+Jira+Git+Slack | Slack native | On-premise |
| Zendesk AI | ~50% | Help Center articles | Zendesk only | $89/agent/mo |
| ServiceNow Virtual Agent | ~60% | SNOW KB + CMDB | ServiceNow | Enterprise license |
| Moveworks | ~65% | Multi-source | Slack, Teams | $50k+/yr |
| ChatGPT + docs | ~35% | Upload (manual) | Web only | $20/user/mo |
Complex multi-system incidents (23 % deflection) require log correlation across multiple systems, which exceeds the capacity of semantic search within a knowledge base. These cases require human expertise and access to production systems.
New, undocumented issues have no corresponding sources in the knowledge base. AIbert correctly identifies low confidence and escalates, but cannot provide a solution — this is an inherent limitation of retrieval-based approaches.
Hallucinations occur in 6.2 % of responses, primarily when combining partially relevant sources. Implementation of confidence-based disclaimers reduces the impact on users.
Table 7. Proposed extensions for AIbert.
| Strategy | Predicted Impact | Complexity |
|---|---|---|
| Proactive incident detection (monitoring integration) | MTTR −40%, ticket prevention | Medium |
| Auto-ticket creation in Jira from Slack conversations | Reduction of L1 manual work by 30% | Low |
| Multi-channel (Teams, Email, Web portal) | 100% communication channel coverage | Medium |
| PagerDuty + Grafana integration | Automatic alert correlation with KB | Medium |
| Predictive support (ML on ticket trends) | Incident prediction 2–4h ahead | High |
1. Ticket deflection of 71.3 % confirms that AI-driven Level 0 support is production-ready with immediate ROI — L1 agent workload decreased by 71 %.
2. MTFR from 2.4 hours to 12 seconds transforms the support experience — users receive responses in real time instead of waiting for a human agent.
3. The feedback loop works — accuracy increased from 62.4 % to 78.8 % over 4 months without manual retraining, based solely on thumbs up/down signals.
4. Confidence scoring is essential for quality — responses with confidence > 0.85 achieve 94.2 % correctness, while automatic escalation at < 0.40 protects against low-quality output.
5. Complex incidents remain the domain of humans — 23 % deflection for multi-system problems defines a clear boundary between AI L0 and human L1/L2 support.
Deployment: GoSpace Labs / BIT Technology, on-premise Docker Compose.
Project reference: bittechnology.bemooore.com/referencie/aibert-jira-support
Author: Ing. Stanislav Pittner, CEO of BIT Technology s.r.o.
© 2025 Ing. Stanislav Pittner — BIT Technology s.r.o.