z-logo
open-access-imgOpen Access
Three methods of coding nominal variables in regression analysis
Author(s) -
С. М. Лапач,
AUTHOR_ID
Publication year - 2021
Publication title -
matematičeskie mašiny i sistemy
Language(s) - English
Resource type - Journals
ISSN - 1028-9763
DOI - 10.34121/1028-9763-2021-4-35-45
Subject(s) - coding (social sciences) , numbering , regression analysis , variable length code , binary number , mathematics , statistics , variable (mathematics) , algorithm , regression , variables , computer science , arithmetic , mathematical analysis
The paper compares three methods of coding nominal variables in regression analysis: coding of each level as a separate variable, coding with binary code, numbering of factor levels. Although these methods have existed for a long time and even have a theoretical justification (except for encoding with binary code), there were no recommendations and comparisons for their practical application. The features of the application of each method and the existing limitations are analyzed. In the article, there are considered two examples that provide a detailed comparison of these three methods. Comparative analysis has been carried out in the following areas: the presence of restrictions in use; statistical properties of plans; labour intensity and difficulty of obtaining mathematical models and the final result of their building; convenience of semantic analysis and use. Additionally, there have been made comparisons with models based on Chebyshev orthogonal polynomials. It has been established that different methods of coding nominal variables, when used correctly, lead to regression models that are approximately identical in their properties. Moreover, the method of encoding each level as a separate variable is possible only if there are experiments in which there is no nominal variable as an influence effect. The binary coding method is inconvenient to use with a large number of levels of variation of the nominal variable and inconvenient to analyze. When coding by level numbering, it is necessary that the average response values, according to the dispersion diagram of this factor, are sorted by value in accordance with the assigned numbers. With this encoding method, a natural number of factors is preserved. Sharply distinguishable best results are achieved with this coding method using Chebyshev orthogonal polynomials. The highest accuracy and uniformity of approximation are ensured.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here