BIT Technology Research Series • AI Service Management

AIbert: Autonomous AI Agent for Level 0 IT Support — Evaluating the Effectiveness of Deep Research and Knowledge Retrieval in a Production Slack Environment

Ing. Stanislav Pittner

BIT Technology s.r.o., Trstínska cesta 9, 917 01 Trnava, Slovakia

Published: November 22, 2025 Evaluation period: July – October 2025 Version 1.0

DOI: 10.5281/bittechnology.2025.tr005 (preprint)

Abstract

We present an evaluation of AIbert — an autonomous AI agent for Level 0 IT support deployed in a production Slack environment. AIbert leverages a Deep Research engine for semantic search across an internal knowledge base (10 000+ Confluence pages, 15 000 Jira tickets, 45 Git repositories) and generates contextual solution suggestions for Level 1 agents. During a 4-month evaluation on 3 847 support requests, AIbert achieves a ticket deflection rate of 71.3 %, reduces average Mean Time to First Response (MTFR) from 2.4 hours to 12 seconds, and achieves a First Contact Resolution (FCR) rate of 64.8 %. For routine requests (password reset, VPN, software installations), the deflection rate exceeds 85 %. We identify the main limitations: complex multi-system incidents (deflection 23 %) and new, undocumented issues require human escalation. A feedback loop based on Slack reactions continuously improves response quality with a quarter-over-quarter accuracy increase of 8.2 percentage points.

Keywords: AI support, chatbot, ITSM, Slack, knowledge retrieval, ticket deflection, Deep Research, NLP, conversational AI

1. Introduction

IT support teams face a growing volume of repetitive requests — according to Gartner (2023), routine inquiries account for 60–70 % of all tickets in a typical IT organization. Level 1 agents spend a significant portion of their working time searching for solutions in documentation, historical tickets, and knowledge bases, instead of solving complex problems that require expertise.

Knowledge-Centric Service Management (KCS v6, Consortium for Service Innovation) defines the principle "solve it once, use it many times" — systematic capture and reuse of knowledge during the resolution process. Automation of the Level 0 layer (self-service + AI triage) implements this principle at scale, with conversational AI in IT support environments demonstrably reducing ticket volume by 20–40 % (Forrester Research 2022).

This study evaluates AIbert in a production environment and addresses three questions: (1) What ticket deflection rate is achievable for different request categories? (2) How does response quality change over time thanks to the feedback loop? (3) Where are the boundaries of AI-driven Level 0 support?

2. System Architecture

AIbert — Level 0 AI Support Pipeline

Slack Event
Socket Mode

→

Intent Classifier
NLP / Zero-shot

→

Knowledge Retrieval
Semantic Search

→

LLM Synthesis
Citation + Confidence

→

Slack Reply
+ Feedback Buttons

Jira API • Confluence API • Git Search • Runbooks • Slack History (50k+ messages)

3. Data and Methods

3.1 Indexed Knowledge Sources

Table 1. Knowledge base sources indexed in the AIbert Deep Research engine.

Source	Content Type	Document Count	Update Frequency
Confluence	Documentation, SOP, architecture	3,200 pages	Real-time webhook
Jira	Historical tickets + comments	15,000 tickets	Real-time webhook
Git repositories	README, docs/, CHANGELOG	45 repositories	Daily (cron)
Runbooks	Operational procedures	180 documents	On change
Slack archive	Historical conversations	50,000+ messages	Weekly
Total indexed	Embedded chunks	124,600	—

3.2 Request Classification

Table 2. Distribution of intent categories during the evaluation period (n = 3,847).

Category	Count	Share	Typical Example
How-to / guide	1,423	37.0%	How to set up VPN? Where can I find the API key?
Incident	1,038	27.0%	Deployment is not working. Build fails on CI.
Service Request	846	22.0%	I need access to repo X. Password reset.
Feedback / bug report	347	9.0%	Dashboard shows incorrect data.
Escalation (L1/L2)	193	5.0%	Production is down. I need on-call.

4. Results

4.1 Ticket Deflection Rate

Figure 1. Ticket deflection rate by request category (n = 3,847). Routine requests (password reset, VPN) achieve > 85 % deflection, while complex incidents reach only 23 %.

Table 3. Comparison of key metrics before and after AIbert deployment.

Metric	Before AIbert	After AIbert (4 mo.)	Change
Mean Time to First Response (MTFR)	2.4 h	12 s	−99.9%
First Contact Resolution (FCR)	31.2%	64.8%	+33.6 pp
Ticket deflection rate	0%	71.3%	+71.3 pp
L1 agent workload (tickets/day)	48	14	−70.8%
CSAT (1–5)	3.1	4.3	+1.2
Mean Time To Resolution (MTTR)	4.8 h	1.2 h	−75%

4.2 Accuracy by Knowledge Source

Figure 2. AIbert response accuracy by primary knowledge source (% of responses rated as correct). Runbooks provide the highest accuracy due to their structured format.

4.3 Confidence Score and Escalation

Table 4. Confidence score distribution and escalation metrics.

Confidence Band	Share of Responses	Correctness	Auto-escalation
High (> 0.85)	42%	94.2%	No
Medium (0.60 – 0.85)	35%	78.6%	No (with disclaimer)
Low (0.40 – 0.60)	15%	52.1%	Optional
Very low (< 0.40)	8%	31.4%	Automatic

4.4 Feedback Loop — Improvement Over Time

Table 5. Evolution of response accuracy by month (feedback loop effect).

Month	Accuracy (%)	Thumbs Up Rate	Escalation Rate
July 2025 (M1)	62.4%	58%	34%
August 2025 (M2)	68.1%	64%	29%
September 2025 (M3)	74.6%	71%	24%
October 2025 (M4)	78.8%	76%	20%

5. Comparison with Alternatives

Table 6. Comparison of AIbert with commercial AI support solutions.

Solution	Deflection Rate	Knowledge Sources	Integration	Price
AIbert	71.3%	Confluence+Jira+Git+Slack	Slack native	On-premise
Zendesk AI	~50%	Help Center articles	Zendesk only	$89/agent/mo
ServiceNow Virtual Agent	~60%	SNOW KB + CMDB	ServiceNow	Enterprise license
Moveworks	~65%	Multi-source	Slack, Teams	$50k+/yr
ChatGPT + docs	~35%	Upload (manual)	Web only	$20/user/mo

6. Limitations and Error Analysis

Complex multi-system incidents (23 % deflection) require log correlation across multiple systems, which exceeds the capacity of semantic search within a knowledge base. These cases require human expertise and access to production systems.

New, undocumented issues have no corresponding sources in the knowledge base. AIbert correctly identifies low confidence and escalates, but cannot provide a solution — this is an inherent limitation of retrieval-based approaches.

Hallucinations occur in 6.2 % of responses, primarily when combining partially relevant sources. Implementation of confidence-based disclaimers reduces the impact on users.

7. Recommendations for Further Development

Table 7. Proposed extensions for AIbert.

Strategy	Predicted Impact	Complexity
Proactive incident detection (monitoring integration)	MTTR −40%, ticket prevention	Medium
Auto-ticket creation in Jira from Slack conversations	Reduction of L1 manual work by 30%	Low
Multi-channel (Teams, Email, Web portal)	100% communication channel coverage	Medium
PagerDuty + Grafana integration	Automatic alert correlation with KB	Medium
Predictive support (ML on ticket trends)	Incident prediction 2–4h ahead	High

8. Conclusions

1. Ticket deflection of 71.3 % confirms that AI-driven Level 0 support is production-ready with immediate ROI — L1 agent workload decreased by 71 %.

2. MTFR from 2.4 hours to 12 seconds transforms the support experience — users receive responses in real time instead of waiting for a human agent.

3. The feedback loop works — accuracy increased from 62.4 % to 78.8 % over 4 months without manual retraining, based solely on thumbs up/down signals.

4. Confidence scoring is essential for quality — responses with confidence > 0.85 achieve 94.2 % correctness, while automatic escalation at < 0.40 protects against low-quality output.

5. Complex incidents remain the domain of humans — 23 % deflection for multi-system problems defines a clear boundary between AI L0 and human L1/L2 support.

References

Gartner (2023). Market Guide for AIOps Platforms. Gartner Research.
Forrester Research (2022). The State of Chatbots in IT Service Management. Forrester.
Consortium for Service Innovation (2023). KCS v6 Practices Guide. serviceinnovation.org.
Paramesh, S.P. & Shreedhara, K.S. (2019). Automated IT ticket classification using machine learning. IJCSIT, 10(2), 21–35.
Lewis, P., et al. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. Proc. NeurIPS 2020.
Brown, T., et al. (2020). Language models are few-shot learners. Proc. NeurIPS 2020.
Ram, O., et al. (2023). In-context retrieval-augmented language models. TACL, 11, 1316–1331.
HDI (2023). Technical Support Practices & Salary Report 2023. HDI/ITSM Academy.
ServiceNow (2024). Global Impact Report: AI in IT Service Management. ServiceNow Research.
Anthropic (2024). Model Context Protocol specification v1.0. github.com/modelcontextprotocol.
Slack Technologies (2024). Slack Socket Mode API Reference. api.slack.com.
Atlassian (2024). Jira REST API v3 Documentation. developer.atlassian.com.

Deployment: GoSpace Labs / BIT Technology, on-premise Docker Compose.

Project reference: bittechnology.bemooore.com/referencie/aibert-jira-support

Author: Ing. Stanislav Pittner, CEO of BIT Technology s.r.o.