Automating Network Anomaly Detection with AI: Simplifying RF Engineering

I often get messages that go something like this:
“It looks like there was a throughput degradation last [insert day] in this area—can you check what happened?”

As someone focused on optimization, I usually know when a test didn’t go as planned. But when it comes to issues outside my immediate area of work, it’s challenging to pinpoint degradations that happened several days ago—especially when they’re not clearly tied to O&M incidents. Unless you’re monitoring the network 24/7 for unusual behavior, spotting these anomalies can feel like searching for a needle in a haystack.

Keeping an eye on KPIs like downlink throughput, uplink SINR, or ERAB drops is essential—but let’s be honest, digging through massive datasets from large databases like ENIQ or OSS manually is both time-consuming and prone to errors.

So how do we tackle this in a smarter way today?

The answer: AI agents and intelligent tools.
They allow us to automate anomaly detection and drastically improve efficiency.

In this post, I’ll walk you through how I built a Python-based AI agent using the Isolation Forest model to detect network anomalies. I’ll explain why this model is a great fit for the job and how integrating AI into daily RF engineering tasks can completely change the game.

The Challenge: Spotting Network Anomalies in Real Time

With over 15 years of experience optimizing multi-technology networks—from 2G to 5G—I’ve seen firsthand how quickly performance can take a hit due to issues like faulty antennas or misconfigured parameters. Fortunately, the network often gives us early warning signs: a sudden spike in ERAB drop rates or a dip in downlink throughput can signal that something’s starting to go wrong. And of course, these issues don’t just affect KPIs—they impact the user experience.

Traditionally, RF engineers have had to rely on manual analysis of drive test data or KPI reports. But in multi-vendor, multi-technology environments, generating, parsing, and analyzing all that data can take hours—and even then, it’s easy to miss the subtle signs of trouble.

To tackle this challenge, I built an AI agent designed to monitor key network KPIs. When selecting which KPIs to include, I didn’t just consider their individual importance—I also looked at how they correlate (or don’t) with each other. This kind of relationship matters when training certain machine learning models, as redundant or weakly related metrics can impact accuracy.

I narrowed it down to a focused list: downlink and uplink throughput, downlink and uplink efficiency, RRC and eRAB accessibility, eRAB drop rate, SINR, average number of RRC connected users, and of course, payload.

The AI agent pulls this data from a CSV file—whether sourced from a single database or multiple ones—then detects anomalies and produces actionable reports and visualizations. What used to take hours can now be done in just minutes.

Choosing the model: Why Isolation Forest?

I chose the Isolation Forest model for anomaly detection due to its effectiveness and efficiency in telecom applications. Here’s why it’s a great fit:

Handles High-Dimensional Data: With 10 KPIs to monitor, Isolation Forest excels at detecting anomalies in multivariate datasets without requiring extensive feature engineering. This is key, because as I mentioned in previous posts feature engineering using network KPIs is very time consuming.
No Assumption of Data Distribution: Unlike other models (e.g., Gaussian-based), Isolation Forest doesn’t assume KPIs follow a specific distribution, which is critical for telecom data with varying patterns across sites and cells.
Fast and Scalable: The model’s tree-based approach is computationally efficient, making it ideal for processing large datasets from tools like XCAL or OSS in near real-time.
Robust to Noise: Telecom data often includes outliers (e.g., temporary KPI fluctuations). Isolation Forest isolates anomalies by partitioning data, minimizing false positives.
Ease of Integration: Built using Scikit-learn in Python, the model integrates seamlessly with my existing workflow, leveraging my skills in Pandas and SQL.

At the end this choice aligns with my experience developing machine learning models for other use cases like traffic forecasting and antenna failure prediction. There are many ML models to choose, but when you fine tune hyperparameters, better to understand a handful of models, than trying every single model at disposal.

How the AI Agent Works

The agent, written in Python, processes a CSV file containing timestamped KPI data for nodes (e.g., site1_cella) and plus 10 major KPIs. It:

Loads and Cleans Data: Handles missing values by replacing them with column means, ensuring robust preprocessing. I use DB Visualizer to query ENIQ (Ericsson database) and although you can choose the way nulls are handled when exporting the query report I have assumed the default “(null)” to be used.
Detects Anomalies: Uses Isolation Forest to identify outliers across all KPIs, assigning an anomaly score for ranking.
Generates Outputs: Produces a detailed report and a visualization of dl_throughput with anomalies marked.

Here’s a sample output with all fake data (I cannot share network data due confidentiality agreements with our client), just to have a glimpse of how it looks:

Anomaly Detection Report - 20250724_121300
Total Records: 1000
Anomalies Detected: 10

Anomalous Records:
Timestamp: 2025-07-24 08:02:00
Node: site1_cella
Anomaly Score: -0.1234
KPI Values:
  dl_throughput: 80.00
  ul_throughput: 20.00
  dl_efficiency: 60.00
  ul_efficiency: 55.00
  rrc_accessibility: 95.00
  erab_accessibility: 94.00
  erab_drop: 2.00
  ul_sinr: 5.00
  rrc_connected_users: 200.00
  payload: 300.00
--------------------------------------------------

It also generates an image to visualize the anomalies. If we have a snapshot of the network, an anomaly will look like this:

Here we can easily spot that on 2025-07-03 something was going on as a sharp drop on dl Throughput indicates a potential issue, such as interference or a misconfigured parameter. In the script, we can check all KPI to see how the anomalies are spotted in more detail, ROP by ROP and check if they makes sense:

Simplifying Daily RF Engineering Work

This AI agent transforms how RF engineers tackle network optimization:

Time Savings: Automating anomaly detection cuts analysis time by 40-60%, freeing engineers (like myself) to focus on root cause analysis and parameter tuning. For example, identifying a throughput drop that took hours of manual analysis now takes minutes.
Proactive Issue Resolution: The agent flags issues like high ERAB drop rates or low SINR in real-time, enabling faster fixes. I’ve improved mobility and DL efficiency through timely interventions—automation makes this scalable.
Scalability Across Multivendor Systems: The agent works with data from any vendor (Nokia, Ericsson, Huawei, etc) It just need database access. It integrates with OSS tools like ENM, streamlining workflows.
Actionable Insights: The report provides detailed KPI values and anomaly scores, guiding engineers to prioritize critical issues (e.g., nodes with high ul_sinr or extremely low rrc_connected_users).
Enhanced Reporting: The visualization is perfect for team discussions or even my blog :), making complex data accessible to engineers and stakeholders.

If it were that simple…

Artificial Intelligence is, at its core, just a tool. And while tools can be powerful, they don’t drive change on their own—processes do. In my experience, especially within the telecommunications industry, processes tend to be deeply rooted and slow to evolve.

Introducing AI agents into our daily work has shown clear benefits—faster analysis, better anomaly detection, and more efficient operations. But despite these results, I’ve found it challenging to shift the way things are done. Both internally, within my own company, and externally with our clients, there’s often hesitation—even resistance—to adopting new approaches, no matter how promising they are.

It’s a reminder that innovation isn’t just about building something new—it’s also about helping others see the value in changing how we work.

Next Steps

I’m enhancing the agent to fetch real-time data from OSS via APIs. Why? the idea is to identify any parameter change in the network that might explain the degradation spotted. Also it will be great if it can send alerts through email when a major disruption is found… still a little behind that. Future iterations could incorporate reinforcement learning for automated parameter tuning, building on my 5G NSA optimization experience.

For fellow RF engineers, I encourage experimenting with AI agents to automate repetitive tasks. The code for this agent is available on my GitHub (Telecom AI Agents), and I’d love to hear your thoughts or use cases in the comments!

Cheers!

Diego Gonçalves Kovadloff is a Senior RF Engineer with 15+ years of experience in 5G, LTE, and UMTS optimization. He’s currently pursuing a Master’s in Applied Artificial Intelligence at Tecnologico de Monterrey.

References:

GeeksforGeeks. (2025, July 23). What is Isolation Forest? GeeksforGeeks. https://www.geeksforgeeks.org/machine-learning/what-is-isolation-forest/

And Grok.com help me to review and troubleshoot my script

Radio Frequency Optimization Notes: Essential Tips for Network Performance

Explore key strategies and practical tips for optimizing radio frequencies to enhance network performance and reliability

Automating Network Anomaly Detection with AI: Simplifying RF Engineering

Leave a comment Cancel reply

Share this:

Related

Leave a comment Cancel reply