AIbert — Technical Report TR-2025-005
← Back 📥 Download PDF
BIT Technology Research Series • AI Service Management

AIbert: Autonomous AI Agent for Level 0 IT Support — Evaluating the Effectiveness of Deep Research and Knowledge Retrieval in a Production Slack Environment

Ing. Stanislav Pittner
BIT Technology s.r.o., Trstínska cesta 9, 917 01 Trnava, Slovakia
Published: November 22, 2025 Evaluation period: July – October 2025 Version 1.0
DOI: 10.5281/bittechnology.2025.tr005 (preprint)

Abstract

We present an evaluation of AIbert — an autonomous AI agent for Level 0 IT support deployed in a production Slack environment. AIbert leverages a Deep Research engine for semantic search across an internal knowledge base (10 000+ Confluence pages, 15 000 Jira tickets, 45 Git repositories) and generates contextual solution suggestions for Level 1 agents. During a 4-month evaluation on 3 847 support requests, AIbert achieves a ticket deflection rate of 71.3 %, reduces average Mean Time to First Response (MTFR) from 2.4 hours to 12 seconds, and achieves a First Contact Resolution (FCR) rate of 64.8 %. For routine requests (password reset, VPN, software installations), the deflection rate exceeds 85 %. We identify the main limitations: complex multi-system incidents (deflection 23 %) and new, undocumented issues require human escalation. A feedback loop based on Slack reactions continuously improves response quality with a quarter-over-quarter accuracy increase of 8.2 percentage points.

Keywords: AI support, chatbot, ITSM, Slack, knowledge retrieval, ticket deflection, Deep Research, NLP, conversational AI

1. Introduction

IT support teams face a growing volume of repetitive requests — according to Gartner (2023), routine inquiries account for 60–70 % of all tickets in a typical IT organization. Level 1 agents spend a significant portion of their working time searching for solutions in documentation, historical tickets, and knowledge bases, instead of solving complex problems that require expertise.

Knowledge-Centric Service Management (KCS v6, Consortium for Service Innovation) defines the principle "solve it once, use it many times" — systematic capture and reuse of knowledge during the resolution process. Automation of the Level 0 layer (self-service + AI triage) implements this principle at scale, with conversational AI in IT support environments demonstrably reducing ticket volume by 20–40 % (Forrester Research 2022).

This study evaluates AIbert in a production environment and addresses three questions: (1) What ticket deflection rate is achievable for different request categories? (2) How does response quality change over time thanks to the feedback loop? (3) Where are the boundaries of AI-driven Level 0 support?

2. System Architecture

AIbert — Level 0 AI Support Pipeline
Slack Event
Socket Mode
Intent Classifier
NLP / Zero-shot
Knowledge Retrieval
Semantic Search
LLM Synthesis
Citation + Confidence
Slack Reply
+ Feedback Buttons
Jira API • Confluence API • Git Search • Runbooks • Slack History (50k+ messages)

3. Data and Methods

3.1 Indexed Knowledge Sources

Table 1. Knowledge base sources indexed in the AIbert Deep Research engine.

SourceContent TypeDocument CountUpdate Frequency
ConfluenceDocumentation, SOP, architecture3,200 pagesReal-time webhook
JiraHistorical tickets + comments15,000 ticketsReal-time webhook
Git repositoriesREADME, docs/, CHANGELOG45 repositoriesDaily (cron)
RunbooksOperational procedures180 documentsOn change
Slack archiveHistorical conversations50,000+ messagesWeekly
Total indexedEmbedded chunks124,600

3.2 Request Classification

Table 2. Distribution of intent categories during the evaluation period (n = 3,847).

CategoryCountShareTypical Example
How-to / guide1,42337.0%How to set up VPN? Where can I find the API key?
Incident1,03827.0%Deployment is not working. Build fails on CI.
Service Request84622.0%I need access to repo X. Password reset.
Feedback / bug report3479.0%Dashboard shows incorrect data.
Escalation (L1/L2)1935.0%Production is down. I need on-call.

4. Results

4.1 Ticket Deflection Rate

Figure 1. Ticket deflection rate by request category (n = 3,847). Routine requests (password reset, VPN) achieve > 85 % deflection, while complex incidents reach only 23 %.

Table 3. Comparison of key metrics before and after AIbert deployment.

MetricBefore AIbertAfter AIbert (4 mo.)Change
Mean Time to First Response (MTFR)2.4 h12 s−99.9%
First Contact Resolution (FCR)31.2%64.8%+33.6 pp
Ticket deflection rate0%71.3%+71.3 pp
L1 agent workload (tickets/day)4814−70.8%
CSAT (1–5)3.14.3+1.2
Mean Time To Resolution (MTTR)4.8 h1.2 h−75%

4.2 Accuracy by Knowledge Source

Figure 2. AIbert response accuracy by primary knowledge source (% of responses rated as correct). Runbooks provide the highest accuracy due to their structured format.

4.3 Confidence Score and Escalation

Table 4. Confidence score distribution and escalation metrics.

Confidence BandShare of ResponsesCorrectnessAuto-escalation
High (> 0.85)42%94.2%No
Medium (0.60 – 0.85)35%78.6%No (with disclaimer)
Low (0.40 – 0.60)15%52.1%Optional
Very low (< 0.40)8%31.4%Automatic

4.4 Feedback Loop — Improvement Over Time

Table 5. Evolution of response accuracy by month (feedback loop effect).

MonthAccuracy (%)Thumbs Up RateEscalation Rate
July 2025 (M1)62.4%58%34%
August 2025 (M2)68.1%64%29%
September 2025 (M3)74.6%71%24%
October 2025 (M4)78.8%76%20%

5. Comparison with Alternatives

Table 6. Comparison of AIbert with commercial AI support solutions.

SolutionDeflection RateKnowledge SourcesIntegrationPrice
AIbert71.3%Confluence+Jira+Git+SlackSlack nativeOn-premise
Zendesk AI~50%Help Center articlesZendesk only$89/agent/mo
ServiceNow Virtual Agent~60%SNOW KB + CMDBServiceNowEnterprise license
Moveworks~65%Multi-sourceSlack, Teams$50k+/yr
ChatGPT + docs~35%Upload (manual)Web only$20/user/mo

6. Limitations and Error Analysis

Complex multi-system incidents (23 % deflection) require log correlation across multiple systems, which exceeds the capacity of semantic search within a knowledge base. These cases require human expertise and access to production systems.

New, undocumented issues have no corresponding sources in the knowledge base. AIbert correctly identifies low confidence and escalates, but cannot provide a solution — this is an inherent limitation of retrieval-based approaches.

Hallucinations occur in 6.2 % of responses, primarily when combining partially relevant sources. Implementation of confidence-based disclaimers reduces the impact on users.

7. Recommendations for Further Development

Table 7. Proposed extensions for AIbert.

StrategyPredicted ImpactComplexity
Proactive incident detection (monitoring integration)MTTR −40%, ticket preventionMedium
Auto-ticket creation in Jira from Slack conversationsReduction of L1 manual work by 30%Low
Multi-channel (Teams, Email, Web portal)100% communication channel coverageMedium
PagerDuty + Grafana integrationAutomatic alert correlation with KBMedium
Predictive support (ML on ticket trends)Incident prediction 2–4h aheadHigh

8. Conclusions

1. Ticket deflection of 71.3 % confirms that AI-driven Level 0 support is production-ready with immediate ROI — L1 agent workload decreased by 71 %.

2. MTFR from 2.4 hours to 12 seconds transforms the support experience — users receive responses in real time instead of waiting for a human agent.

3. The feedback loop works — accuracy increased from 62.4 % to 78.8 % over 4 months without manual retraining, based solely on thumbs up/down signals.

4. Confidence scoring is essential for quality — responses with confidence > 0.85 achieve 94.2 % correctness, while automatic escalation at < 0.40 protects against low-quality output.

5. Complex incidents remain the domain of humans — 23 % deflection for multi-system problems defines a clear boundary between AI L0 and human L1/L2 support.

References

  1. Gartner (2023). Market Guide for AIOps Platforms. Gartner Research.
  2. Forrester Research (2022). The State of Chatbots in IT Service Management. Forrester.
  3. Consortium for Service Innovation (2023). KCS v6 Practices Guide. serviceinnovation.org.
  4. Paramesh, S.P. & Shreedhara, K.S. (2019). Automated IT ticket classification using machine learning. IJCSIT, 10(2), 21–35.
  5. Lewis, P., et al. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. Proc. NeurIPS 2020.
  6. Brown, T., et al. (2020). Language models are few-shot learners. Proc. NeurIPS 2020.
  7. Ram, O., et al. (2023). In-context retrieval-augmented language models. TACL, 11, 1316–1331.
  8. HDI (2023). Technical Support Practices & Salary Report 2023. HDI/ITSM Academy.
  9. ServiceNow (2024). Global Impact Report: AI in IT Service Management. ServiceNow Research.
  10. Anthropic (2024). Model Context Protocol specification v1.0. github.com/modelcontextprotocol.
  11. Slack Technologies (2024). Slack Socket Mode API Reference. api.slack.com.
  12. Atlassian (2024). Jira REST API v3 Documentation. developer.atlassian.com.

Deployment: GoSpace Labs / BIT Technology, on-premise Docker Compose.

Project reference: bittechnology.bemooore.com/referencie/aibert-jira-support

Author: Ing. Stanislav Pittner, CEO of BIT Technology s.r.o.

© 2025 Ing. Stanislav Pittner — BIT Technology s.r.o.