COMPARISON OF OPTIMIZATION STRATEGIES AND ESTIMATION TECHNIQUES FOR RADIO NETWORK PLANNING AND OPTIMIZATION PROBLEMS

Background. Radio network planning is one of the main phases of the cellular network lifecycle, as it determines capital and operating costs and allows system performance evaluation at any given time. An accurate and comprehensive analysis of existing network statistics is necessary for proper cell planning during network expansion. These statistics are collected throughout the life cycle of the cellular network and usually have certain imperfections (heterogeneity of statistics, which have different densities in different parts of the search space, up to the presence of significant voids, etc.) The system describing the functioning of the radio network can be represented as a black box because its internal processes are too complex to be defined by mathematical functions. This determines the need to use appropriate tools. Objective. The purpose of the paper is to create a toolkit that allows finding the proper relationships between network parameters to define target values that will help to build an effective network plan in terms of performance and costs for its creation and operation. The tools should be able to work efficiently using the minimum set of available statistical data, as well as taking into account their imperfections. Methods. Mathematical estimation and optimization methods are used, namely Ordinary Least Squares, Ridge Regression, Lasso, Elastic-net, LARS lasso, Bayesian Ridge Regression, Automatic Relevance Determination, Stochastic gradient descent, Theil-Sen estimator, Huber Regression, Quantile regression, Polynomial regression. We consider 12 estimation methods in combination with two optimization strategies. Additionally, the method of partial analysis of the search space with different number of configurations is considered. Results. A software package using the Python programming language has been created, which contains a practical implementation of all the considered estimation and optimization methods, as well as tools for evaluating arbitrary configurations of the software package (benchmark) and visualizing the results. The best estimation method is Ordinary Least Squares for finding the optimal configuration of the statistical parameters of the 4G radio network to maximize the download speed. To obtain satisfactory results, it is enough to consider 25 initial and 250 estimated points - a larger number of points will not significantly increase prediction accuracy. Conclusions. The results indicate the possibility of using the created software package for radio network planning tasks. Further research is aimed at expanding the created software package's functionality and considering additional estimation methods and optimization strategies.


Introduction
An accurate and comprehensive analysis of existing network performance statistics is necessary for proper cell planning during network expansion. These statistics are collected during the whole life cycle of a cellular network. Hence, a tool is needed to find the correct relationships between network parameters and propose target values that will help build an optimal network plan. There are many comparative studies on optimization and estimation methods for different domains. However, for cellular network planning using modern approaches, there is a need for such a study.
Another point is the volume of statistical data. For the case of a radio network consisting of 5000 base stations, daily statistics aggregated by the hour will contain approximately 360,000 rows with dozens of parameters. As a result, we can consider these statistics as big data. So, another topic for research is algorithm complexity and execution time.
The objectives of this research are as follows: -In general, this work aims to perform a comprehensive study on optimization strategies and estimation techniques for radio network planning.
-The research subject is 2G, 3G, 4G, and 5G networks with their parameters. This paper will present only an initial analysis of 4G network parameters.
-The optimization strategies considered in this paper are Random Search and Grid Search.
-The estimation techniques used for this paper are 13 linear models.
The paper addresses the following issues: -Optimization strategies: ways to find the optimal network configuration.
-Estimation techniques: determination of dependencies between radio network parameters.
-Case study: description of input data.
-Conclusions and future work.

Optimization strategies: ways to find the optimal network configuration
The radio network planning process also includes optimization of the existing network. Optimization is the problem of finding an extremum (minimum or maximum) of an objective function in a particular region of a finite vector space bounded by a set of linear and nonlinear equations and inequalities [1]. The optimization problem consists of the following: -The objective function to be optimized.
-Parameters -variable characteristics that define a particular system.
-Search space. The list of all allowed parameter values.
Optimization methods strongly depend on the degree of openness of the system to be optimized. Such transparency allows for studying all dependencies and processes inside the system. Systems can be represented as: White Box, any internal elements that can be explored without restrictions. An example of such a system is open-source software.
-Black Box is the complete antithesis of the white box. An example of a black box system is any proprietary software. None of the internal processes or dependencies can be investigated. The only thing allowed when working with such a system is observing the system's reaction to specific input data.
-Gray Box is a system that is located between the white and black boxes. Some internal elements and processes of the system can be studied, while others remain closed. A grey box system can be interpreted as a combination of some subsystems of the black and white boxes. An example of such a system is any opensource software containing proprietary modules. Any dependencies between statistical parameters of a radio network can be represented as a black box. These dependencies are affected by factors that need to be better studied or contain random processes: radio propagation physics, user profile, etc.
A typical optimization process for a black box system includes the following steps: -Identify a list of candidate configurations.
-Evaluate such configurations. -Repeat the above steps until certain conditions are met (e.g., the best configuration is not updated for some time, etc.).
However, the nature of radio networks and the peculiarities of statistics collection limit the use of the assessment phase. If any changes occur in the network, their impact will be visible only after some time (up to several days). Accordingly, it is necessary to change the estimation stage -the main idea is to predict some value and use it as a real value without further waiting for the real value. Combining estimation with evaluation is one of the topics for further work. The new network planning optimization process contains the following steps: -Define a list of candidate configurations. In this study, the way to create such a set is determined by the optimization strategy.
-Evaluation of the candidate configurations. This study is done using various estimation techniques.
-Determination of the best candidate configuration. Until real statistics are available, the predicted value will be considered real.
This article evaluates two of the most straightforward strategies for optimizing black box systems: random search and grid search: -Random search. This approach involves randomly selecting the following configuration to be evaluated. In [2], the authors claim that in most cases, random search gave better results and took less computation time than grid search.
-Grid search. This approach creates a subset of the search space by discretizing each parameter. The discretization step for each parameter is separate. The total number of configurations to be evaluated is equal: Where � is the size of the domain space of parameter i, and � is the sampling step of parameter i.
The main disadvantage of this approach is the exponential growth of the number of configurations with an increasing number of parameters or decreasing sampling rate. In addition, not all parameters are equally valuable for determining the optimal configuration. The result is an increase in the time spent finding the optimal solution. This disadvantage can be eliminated by increasing the sampling step. However, in this case, there is a risk of missing the zone of potentially better parameter values. This situation is shown in Fig. 2. In the case of using the grid search algorithm, it is necessary to determine the balance between time costs and the quality of the result. In general, these approaches have low efficiency and require significant computational and, accordingly, time resources.

Estimation techniques: determination of dependencies between radio network parameters
This article contains a study of 13 simple linear models that can be used to predict the value of a parameter y that depends on a variable (or set of variables) x. In general, prediction is the search for dependencies that can be used to determine values even if the input data is incomplete [3]. This study will use different Estimation techniques to obtain values for each candidate configuration identified by the optimization strategy.
The Scikit-learn library of the Python programming language contains implementations of the linear models mentioned in this article [4]. These models are listed below in the text.

Ordinary Least Squares [5]
-fits a linear model with coefficients � � � , … , � � to minimize the residual sum of squares between the observed targets in the dataset and the targets predicted by the linear approximation. Mathematically it solves a problem of the form: Ridge Regression [6] -addresses some of the problems of Ordinary Least Squares by imposing a penalty on the size of the coefficients. The ridge coefficients minimize a penalized residual sum of squares: The complexity parameter � 0 controls the amount of shrinkage: the larger the value of , the greater the amount of shrinkage, and thus the coefficients become more robust to collinearity.
Lasso [7] -a linear model that estimates sparse coefficients. Mathematically, it consists of a linear model with an added regularization term. The objective function to minimize is: The lasso estimate thus solves the minimization of the least-squares penalty with || || added, where is a constant and || || is the � -norm of the coefficient vector. [8] is a linear regression model trained with both � and � -norm regularization of the coefficients. The objective function to minimize in this case:

Elastic-net
LARS lasso [9][10] is a lasso model implemented using the LARS (Least Angle Regression) algorithm, which yields the exact solution, which is piecewise linear as a function of the norm of its coefficients. [11] estimates a probabilistic model of the regression problem by introducing uninformative priors over the model's hyperparameters. The prior for the coefficient a is given by a spherical Gaussian:

Bayesian Ridge Regression
Automatic Relevance Determination [12] is like Bayesian Ridge Regression but can lead to sparser coefficients . Each coefficient � is drawn from a Gaussian distribution, centered on zero and with a precision � : Stochastic gradient descent [13] uses different (convex) loss functions and different penalties.
Theil-Sen estimator [14] uses a generalization of the median in multiple dimensions. [15] is different to Ridge because it applies a linear loss to samples that are classified as outliers. A sample is classified as an inlier if the absolute error of that sample is lesser than a certain threshold.

Huber Regression
It differs from Theil-Sen Regressor because it does not ignore the effect of the outliers but gives a lesser weight to them.
Quantile regression [16] estimates the median or other quantiles of conditional on , while ordinary least squares (OLS) estimate the conditional mean. The Quantile Regressor gives linear predictions �� , � � for the -th quantile, � �0,1�. The following minimization problem then finds the weights or coefficients: Polynomial regression [17]. A simple linear regression can be extended by constructing polynomial features from the coefficients.

Case study: description of input data
The dataset used to evaluate the performance of optimization strategies, and Estimation techniques is an example of 4G radio network statistics consisting of 50,000 rows of data. This set contains 40 parameters, but we will use a few of them: -CQI (channel quality indicator), �� -Total traffic value per cell, -Average channel speed towards the user (downlink), Accordingly, the black-box system will be optimized using some combination of optimization strategy and estimation technique. Determining the effectiveness of each combination is the subject of this paper. The input data set is defined by different distributions for each parameter (Fig. 3) with a significant number of empty combinations (Fig. 4), which complicates the prediction task.

Experimental studies
To conduct experimental studies, a program was created using the Python programming language, which contains the following modules: -Optimization, which contains the implementation of optimization strategies.
-Estimation, which contains all estimation techniques.
-Search area, which describes all types and limits of all parameters. Additionally, this module contains functions for normalizing and denormalizing data regardless of type. Also, in case of receiving a point that does not exist in the given set, the module finds the nearest point from the set. This is done using the Python NGT library (Neighbourhood Graph and Tree for Indexing High-dimensional Data), which allows you to quickly operate with data consisting of millions of records for hundreds of parameters.
-CSV (Comma Separated Values) data processing. This module contains functions for processing files with .csv extension and Python NGT implementation. As input data, you can select one parameter to be optimized and an arbitrary number of input parameters. As further work, it is possible to implement reading and store data in other formats.
-Visualization includes implementing several types of graphs (histograms to determine the distribution of a one-dimensional value, dot diagrams and heat maps for three-dimensional space, and parallel coordinates for space with an arbitrary number of dimensions).
The procedure of the program is as follows: The task was to find the optimal pair of the channel quality indicator (CQI -Channel Quality Indicator) and the total volume of cellular traffic to maximize the download speed: Where v is the download speed, �� , are the channel quality indicator, and the total traffic volume, respectively.
In total, 38400 measurements were performed -384 combinations of input parameters with 100 repetitions to reduce the impact of outliers and improve the accuracy of measurements. Each combination of optimization strategy and Estimation technique was evaluated for several variations of the number of initial and total configurations: -Prediction (P), kbit/s -defines the download speed value of the optimal configuration obtained because of optimization.
-Real (R), kbps -the download speed value for the optimal configuration obtained from the statistical data set.
-Difference between P and R, % -determines the delta between the predicted and real values of the download speed. It should be noted that since 100 repetitions were performed, this indicator determines the variance of the predicted value.
-Difference between R and the best, % -defines the delta between the real value of the download speed for the optimal configuration and the best value presented in the data set.
-Percent of points worse than predicted -defines the percentage of points with a worse loading speed value than the point specified in the prediction.  Table 1 shows that almost all estimation techniques find the same zone of optimal values (column of real values). However, estimation techniques have different values of forecasting accuracy and variance. The most accurate methods are Elastic Net, Lasso, Lasso Lars, and Ridge Regression with cross-validation. Still, all of them have a significant value of the variance of the predicted value and, accordingly, require a considerable number of repetitions of measurements. The methods Ordinary Least Squares, and Ridge Regression, in addition to the high forecasting accuracy, have a smaller value of variance and, accordingly, require fewer measurements. The best method by the leading indicators is Ordinary Least Squares. The main disadvantage of the considered methods is the need to repeat measurements to obtain accurate results. From Table 2 we can see that on average, the grid search of the space of parameters �� , determines the best optimal configuration of these parameters at which the download speed is the highest (see the parameters Real (P), kbps, Percent of points worse than predicted).  Tables 3 and 4 show that as the number of points considered increases, the accuracy of finding the optimal configuration of parameters x_1, x_2 also increases. However, it is possible to determine the minimum number of points sufficient for the desired accuracy -further increasing the number of points does not give a corresponding advantage. This is important for large datasets -this approach saves computational and time resources. The optimal value for the number of initial points is 25, and for the number of validation points, 250.
The execution time of such a program is a hundredth of a second due to the Python NGT library, which is aimed at quickly finding values both in the data set itself and the nearest value from the data set. Such execution time allows us to analyse statistical data and make appropriate changes to the network configuration in real-time. At the same time, the execution time for different estimation methods, as well as for different optimization strategies, differs within the error.

Conclusions and future work
In this study, for the problem of finding the optimal configuration of the statistical parameters of the 4G radio network to maximize the download speed, the best Estimation technique is Ordinary Least Squares. At the same time, the methods Ridge Regression, Elastic Net, Lasso, Lasso Lars, and Ridge Regression with cross-validation are worse in prediction accuracy and variance but also quite effective when the number of measurements is increased. Random search works slightly worse with sparse statistical data space containing gaps than grid search. However, these two optimization methods do not guarantee the optimal point � . . . � , either in the list of existing points from the statistical data set or as close as possible to one of such points.
It can also be concluded that 25 initial and 250 evaluation points are sufficient to obtain satisfactory results -a more significant number of points will not significantly increase forecasting accuracy.
Future work of this research will include the following directions: -Implement in software code and evaluate additional, more sophisticated optimization strategies and Estimation techniques.
-To carry out measurements for other dependencies of statistical parameters of 4G radio network functioning and additional parameters of 2G, 3G, and 5G networks.
-Add configuration parameters of base stations to the statistical parameters of the radio network functioning, which will allow optimizing these parameters to ensure a better functioning quality.
-Add marketing parameters, which will allow forecasting of the future radio network's statistical parameters. This will provide the possibility of early reconfiguration of the base station.
-Evaluate the effectiveness and impact on the change of statistical parameters of changes in the network topology (commissioning of additional base stations, multi-beam antennas, etc.). This will create a bank of solutions with their characteristics and limitations. In turn, using such a bank will ensure the selection of the optimal solution (changing the parameters of the base station or changing the network topology) to improve the quality of the network.