Studia Scientiarium Mathematicarum Hungarica 34. (1998)

1-3. szám - Demetrovics J.-Katona G. O. H.-Miklós D.-Seleznjev O.-Thalheim B.: Functional dependencies inrandom databases

Studia Scientiarum Mathernaticarum Hungáriái 34 (1998), 127-140 FUNCTIONAL DEPENDENCIES IN RANDOM DATABASES J. DEMETROVICS, G. O. H. KATONA, D. MIKLÓS O. SELEZNJEV and 13. THALHEIM To the memory of Alfréd Rényi 1. Introduction A database can be considered as a matrix, where the rows contain the data of one individual (object, etc.) and the columns contain the data of the same type: last name, first, name, date of birth, etc. The types of data are called attributes. These data are sometimes logically dependent. Consider the following example, where the attributes are the last name (denoted by a), the first name (6), the year of the birth (c), the month of the birth (d), the day of the birth (e), the age in years (/), the age in months (g) and the age in days (/?,). It is obvious that c determines /. On the other hand, the pair {c, d} determines both / and g, finally the set {c, d, a} determines all of /, g and h. This is formalized in the following way. Let R be an m x n matrix with different rows and ÍI denote the set of its columns, that is, |0| — n. Suppose that A C 0, b € il. We say that h functionally depends on A and write A-Ab if R contains no two rows containing equal entries in the columns lielonging to A and different entries in b. In most of the database theory it is supposed that the functional depen­dencies A -A b are a priori known by the logic of the data, as in the above example. Our way of looking at, the situation is different. We suppose that we have to find the functional dependencies in a lai'ge database (both in and n are large). If nothing is known about R, it, is natural to assume that the entries are independently chosen. The question is: what the typical size of the minimal sets A such that A-Ab is. Thus the first, mathematical question is the following. Choose the entries of the matrix R totally independently, following the probability distribution (qi,... ,q,i)- What is the minimum size / of A such that A-)b holds with * 081 Supported by the Hungarian Foundation for Scientific Research (OTKA) Grant, T 016524, T 016389 and the German Natural Science Research Council Contract BD II-Bl-3141-211(94). 1991 Mathematics Subject Classification. Primary G8P15, G8R05, 15A52. Key words and phrases. Functional dependency, random database, random matrix, sieve method. 0081 -6006/98/$ 5.00 ©1908 Akadémiai Kiadó, Budapest

Next