Optimizing big data distributed processing: Algebraic foundations and the concept of information

P. V. Golubtsov

Memoirs of the Faculty of Physics 2023. N 5.

Article

Export Citation

Annotation

An algebraic formalization of distributed processing of big data is considered. The concept of information space is defined for a given data processing procedure and a criterion for its minimality is established. The existence of a minimal information space is proved, which provides the most compact form of representation of the information contained in the data and allows the most efficient parallelization of data processing. An element of this space describes in a consistent way the information contained in the corresponding data set. It is shown that in terms of the information space, the concepts of information addition and information quality are naturally expressed, reflecting the intuitive idea of the very concept of information. The advantages of using the minimal information space in the MapReduce distributed data processing model are also considered. In the context of this model, Map transforms the original data sets into information space elements, and Reduce combines all these pieces of partial information into a single element representing all the original data. By way of illustration, several examples of data processing procedures are analyzed and the corresponding minimal information spaces are presented.

Received: 2023 July 20

Approved: 2023 September 28

PACS:

89.70.-a Information and communication theory
07.05.Kf Data analysis: algorithms and implementation; data management

Authors

P. V. Golubtsov
$^1$Department of Mathematics, Faculty of Physics, M.V.Lomonosov Moscow State University

Issue 5, 2023

Moscow University Physics Bulletin

Science News of the Faculty of Physics, Lomonosov Moscow State University

This new information publication, which is intended to convey to the staff, students and graduate students, faculty colleagues and partners of the main achievements of scientists and scientific information on the events in the life of university physicists.