Regression and log-linear models.

Regression and log-linear models can be used to approximate the given data, In (simple) linear regression, the data are modeled to fit a straight line. For example, a random variable, Y (called a response variable), can be modeled as a linear function of another random variable, x(called a predictor variable), with the equation

                                                          y = wx + b

where the variance of y is assumed to be constant, In data mining x and y are numerical database attributes. The coefficients, w, and b are called regression coefficients, they specify the slope of the line and the Y-intercept, respectively. These coefficients can be slived for by the method of least squares, which minimizes the error between the actual line separating the data and the estimate of the line. Multiple linear regression is an extension of (simple) linear regression, which allows a response variable, y, to be modeled as a linear function of two or more predictor variables.

  Log-linear models approximate discrete multidimensional probability distributions. Given a set of tuples in n dimensions (e>g>, described by n attributes), we can consider each rule as a point in an n-dimensional space. Log-linear models can be used to estimate the probability of each point in a multidimensional space for a set of discretized attributes, based on a smaller subset of dimensional combinations. This allows a higher-dimensional data space to be constructed from lower-dimensional spaces. Log-linear models are therefore also useful for dimensionality reduction and data smoothing.

Regression and log-linear models can both be used on sparse data, although their application may be limited While both methods can handle skewed data, regression does exceptionally well. Regression can be computationally intensive scalability for up to 10or so dimensions.

No comments:

Post a Comment

Algorithm For Loss Function and introduction

Common Loss functions in machine learning- 1)Regression losses  and  2)Classification losses .   There are three types of Regression losses...