# Os research paper

Relational Model A relational database consists Of a set of relations. A relational schema which is used to describe a relation r, denoted by R(AI AH : : : An ) is made up off relation named R and a list of attributes AH , AH, . , An . Each attribute Ai is the name of a role played by some domain D in the relation R. A relation r of the relational schema R(AI AH : : : An ) is a set of tepees r = oft t 2 : t m g. A relational database is described by a relational database schema that consists of a set of relational schemas.

If for any two distinct tepees TTL and to in the relation r of R there exists attribute (set of tat tributes) K such that TTL K] to K], then such attribute (set of attributes) is called key. It is common to designate one of possible keys as the primary key, which is used to identify tepees in the relation. A set of attributes FAKE in the relational schema R is a foreign key of RI if two rules are satisfied: The attributes in FAKE have the same domain as the primary key PC of another relational schema attributes FAKE are said to reference relation RE , A value of FAKE in a duple TTL of RI either occurs as a value of PC for some duple to in RE or is null. 2. Dimensional Model et us suppose that a hospital database contains admission data (such as number of days and 0 A Procedure Of Conversion Of Relational into Multidimensional Database Schema value) of the patient, the date of admission, and the diagnosis. The admission data is determined by three attributes Patient, Diagnosis and Time. They are referred to as dimensions while the determined admission attributes are referred to as measures or facts or fact attributes. The facts are mostly numerical, preferably continuously valued and additive. They vary over time.

In the statistical database field (Shania, 1997) the dimension corresponds to category attribute, and the measure to summary attribute. The cube del, shown in Fig. 1, is very useful in presenting the concept of the multidimensional space. There is no a priori distinction between dimensions and measures while any attribute can play either role (Senses, 1996). There is no formal way to decide which attributes are dimensions and which attributes are measures. This decision has to be solved during database design. The business are stored (Kimball, 1996) such as Ambassadors and Value in the hospital database.

The other tables are the dimension tables. Dimension attributes describe the item in the dimension, and are virtually constant over time. The primary key of the fact table is composite or concatenated key, which is the combination of as many foreign keys as many dimensions there are in the schema. Each component of the composite key is a foreign key referencing the primary key of a dimension table. In other words, every fact table represents a many-tomato relationship, that is, it contains as many foreign keys as many dimensions there are in the schema.

A multidimensional database may consist of any number of star join schemas with some dimension tables overlapping. According to (Leaner, 1998) we shall introduce some definitions which will elf placing and solving the problem of conversion of relational database schema into multidimensional database schema. The notion of functional dependency is of fundamental importance: If A and B are (sets of) attributes in a database, B is functionally dependent on A or A functionally determines B, denoted by A ! B, if and only if for each specific value a 2 A, there exists at most one value b 2 B. Fig. 1.

Cube model. The structure of the dimensional model can be represented by the star join schema (Kimball, 1996) shown in Fig. 2. The center of the schema is the fact table, which is the only table in he schema with multiple joins connecting it to other tables. The fact table is where the facts of Dimensions are usually organized into hierarchies that specify aggregation level and hence granularity of viewing data. A dimension is defined over a dimensional schema. A dimensional schema is a set Of dimensional attributes D =fed : : : Dc g where for each Did 2 D there exists a DDCD ; fed g such that either Did !

Dc or Dc ! Did . A categorization C of a dimensional schema D is a sequence of functional De- Fig. 2. Star join schema. A Procedure of Conversion of Relational into Multidimensional Database appendices ODL ! DO , Dc;l ! Dc where Did 2 D for each I = 1, . , k The attribute Did is the tit categorization level. A dimensional attribute EDT 2 D is called terminal, if there is no D 2 D; fit g such that D! EDT. Example: A simple dimensional schema of the product dimension consists of the dimension attributes Product = product, Production, productivity. The hierarchy on the product dimension Product !

Production ! Productive is a set of functionally interrelated dimensional attributes Product, Production and Productive. The attribute Product is terminal and belongs to 1st category level, Production to 2nd and Productive to 3rd étagère level. A multidimensional database is defined over a set of multidimensional schemas. A multidimensional schema M (fed : Din g S) consists of a set of dimensional schemas fed : Din g, measures or facts S and a functional dependency fed : : : Din g! S. Dimension tables are described by dimensional schemas and a fact table by the dependency fed : : : Din g ! S.

A dimension defined by a dimensional schema D is in dimensional normal form (DIN) if the following conditions are satisfied: There exists exactly one terminal dimension attribute EDT 2 D The values of EDT are complete, I. E. Each object has to be assigned to some alee of EDT (see explanation of the completeness condition in Section on Summarization) A multidimensional schema M = (fed : : : Din g, S) is in multidimensional normal form if the following conditions are satisfied: Each Did is in dimensional normal form The dimensions are orthogonal to each other, I. E. There exists no functional dependency C!

Did and and 6 The fact attribute S is fully functionally determined by the set of the terminal dimension attributes of the dimensions Summarization of Fact Attributes Multidimensional databases are mostly used to perform statistical analysis. Summarization of 1 fact attributes is an extremely important property of a multidimensional schema. In general, summarization (Leaner, 1998) is needed to ensure correct automatic aggregation operations that are important in OLAP drill- down and roll-up operations. An incorrect summarization that is not obviously seen can lead to erroneous conclusions.

The following example (Lent, Shoshone, 1997) illustrates the problem. Fig. 3. Summarization example. Data in Fig. 3 provide a count of the number of university students organized by department and year. Suppose we wish to find out the number of students that attended ACH department over the whole period of 3 years. If students attended the department for more than one year, then adding the counts for each year is incorrect. The total for math department 14+20 +18 = 52 is incorrect. If the math department started in 1997 there were 14 students in 1997 who also attended it in 1998.

Consequently, there were only 6 new students in 1998. These 6 students continued in 1999, and 12 new students joined for a total of 18. Thus, the total number of students who attended the math dept. Over area period is 14 + 6 + 12 = 32. Likewise, for chemistry dept. It is: 10 +2+7= 19. The problem is more complicated and, in general, it is not possible to compute summaries for a non-summarize fact because the semantic rule may not be known. In the example above, it is assumed that all students attend a 2-year program but in reality this is not true.

Some students attend more than 2 years some drop after the first year. Thus, for non-summarize fact the totals can be extracted only from the base data, which includes the information about each individual student. This non-summarization of the number of students is not absolute for both dimensions. The number of students is innumerable to the department dimension. In the year 1997 there were 24 students, 14 in the math department and 10 in the chemistry department. 72 The summarization problem is more complicated than shown in the previous example. Lent, Shoshone, 1997) discuss problem more thoroughly and argue that three conditions are sufficient for solving summarization problem. The necessary conditions for summarization are: Disjointedness of dimensions (dimension attributes) Completeness Type compatibility of function and fact attribute Disjointedness of dimension attributes states that the dimension attributes must form disjoint subsets. This condition is satisfied if categorization of a dimensional schema D is a sequence of functional dependencies ODL ! DO : ! Dc where Did 2 D for each I = The example is the hierarchy on the Time dimension Day!

Month! Year. The year 1999 consists of a disjoint set of month attribute: Jan 1999, Feb.. , ; , Deck 1999. The month Jan 1999 consists of a disjoint set of day 1999 attribute: 1st day of Jan 1999, 2nd day of Jan 1999 etc. Once the disjointedness condition is satisfied, it is necessary to test whether the grouping of objects into clusters is complete. First, all elements of the cluster must exist, I. E. He union of all clusters must constitute the entire set of objects. There must be no missing object, I. E. The object that does not belong to a cluster.

In other words, completeness states that each object must be assigned to some category. For example, if some product is not assigned to a product group, the condition is violated. Second, all members of subset values must exist and their union must constitute the entire set. Fig. 4 shows examples of violation of the disjointedness and the completeness conditions. Type compatibility condition depends on the type of fact attribute as well as on the statistical function applied. For every fact attribute and for every dimension it must be determined whether summarization is allowed or not.

Fact attributes (Kimball, 1996; Lent, Shoshone, 1997) may be classified as: additive, if it is additive along all dimensions (such as flow or rate): annual income, weekly number of rainy days assimilative, if it is not additive along one or more dimensions, usually temporal dimension (such as level or stock): number of students in a class, inventory of goods indicative, if it is additive along no dimension (such as value-per-unit): item price Flow refers to periods and is recorded at the end of each period. It records some cumulative effect over a period.

Level is measured and recorded at a particular point of time. It records the state at the specific point in time. Value-percent is usually the current value of the measured unit. The most important function is Sorts. Sum is prohibited to value-per-unit data in general and likewise to level data along temporal dimensions. Therefore, summarization is performed differently to temporal dimensions as opposed to non-temporal dimensions. Next table (Fig. 5) (Lent, Shoshone, 1997) shows how the type of attribute affects the sum function along temporal and non-temporal dimensions.

Fig. 4. Examples of violation of the disjointedness and the completeness conditions. 73 Fig. 5. Sum function along temporal and non-temporal dimensions. In the design phase of a multidimensional schema the type of the dimension (temporal, non-temporal), the type of fact attribute (additive: flow or rate, assimilative: level or stock, indicative: value-per- unit) has to be determined. (Leaner et al. , 1 998) suggest that in the graphical notation every temporal dimension is marked by a dot.

If we look at the definition Of the multidimensional normal form, we can see that it ensures disjointedness of dimension attributes and completeness repertory. However, it does not automatically ensure type compatibility because it is dynamic property. Therefore, type compatibility cannot be checked in the static conditions of multidimensional normal form but in the dynamic conditions when statistical operations are performed. 2. Procedure of Conversion The conversion of a relational into a multidimensional database schema has the fool lowing goals: Find all dimensions that exist but are hidden in a relational database schema.

Get the multidimensional database schema that is equally expressive as its initial relational schema, I. . The multidimensional schema must reflect the same application knowledge. The resulting multidimensional schema must be in the multidimensional normal form to ensure summarization (Leaner et al. , 1998). Input relational database schema must be normalized. Since this procedure is based on the notion Of functional dependencies the necessary normal form is Boyce-cod normal form (BC). A relation R is in BC if whenever a functional dependency X! A holds in R, X is a superset of R.

For the sake of clarity in the following procedure we shall use the term relation in the relational del and the term table for the same entity in the dimensional model. We assume that all primary and foreign keys of relational schema are known. The procedure of conversion of a relational into a multidimensional database schema consists of these steps: A. Discovering potential dimensions B. Defining fact and dimension attributes C. Defining fact and dimension tables D. Defining attributes in fact tables E. Defining attributes in dimension tables F. Checking multidimensional schema(s) A.

Discovering Potential Dimensions Each row in a relation represents an object instance, which is an instance of a impel object (such as PATIENT object) or an instance of a composition of simple objects (such as RESERVATION which is the relationship between PERSON, HOTEL and DATE object instances). The object instance’s properties are represented by object’s attributes. Primary key uniquely identifies the object instance while the other attributes refer to mapping instances that the object instance has with instances of the other objects (Martin, Dell, 1995).

Foreign keys refer to mapping instances that the object instance has with instances of existing objects. Monkey attributes refer to mapping instances that the object instance has tit nonexistent objects instances and play the role of property attributes of the object. For example, the Date attribute in the OPERATION relation represents the date of operation performed on the patient. It represents the mapping of the OPERATION instance object to 74 the DATE object instance.

Usually, in the operational (transactional) databases the DATE object does not exist so the DATE relation does not exist either. Therefore, the Date attribute becomes the attribute of the OPERATION object. In data warehouses the Date is an existing object, called dimension, having the property attributes such s Week, Weekending, Holiday, Season, Month and Year in the dimension relation. Looking for monkey attributes in a relational schema, I. E. Attributes that are not part of primary or foreign keys, we may discover potential foreign keys to hidden dimensions, I. . Dimensions that have no dimension relation. We may open such a hidden dimension by creating a new dimension relation. The potential foreign key becomes the real foreign key referencing the primary key of newly created dimension relation. Let us assume a relation described by its relational schema Ward# (WARD. Ward#), Doctor# (DOCTOR. Doctor#), Date, Ambassadors, Value, Result) where the primary key is underlined and foreign keys are italicized and followed by the referenced relation and its primary key (referential integrity of foreign key to its primary key).

In the ADMISSION relation we have discovered a potential foreign key Date to the time dimension. We may open this dimension by creating the new TIME relation with properties (attributes) Date, Month, Quarter and Year. The attribute Date in the ADMISSION relation becomes a foreign key to the new TIME relation. The result is ward# (WARD. Ward#), Doctor# (DOCTOR. Doctor#), Date(TIME. Date), TIME(Date, Month, Quarter, Year) At the same time we may define hierarchy on the time dimension by functional dependencies B.

Defining Fact and Dimension Attributes By inspecting all monkey attributes in every relational schema, I. E. The attributes that are neither primary nor foreign keys, we shall define the monkey attributes that are fact attributes and the monkey attributes that are dimension attributes. This is the key decision while there is no formal way to decide which attributes are fact attributes. If an attribute value shows some fact of the business, varies over time, and is preferably mutinously valued, it is a strong candidate for a fact attribute.