Dizertačné práce

Spracovanie dát v rozsiahlych databázach

Autor práce: Ing. Ľuboš Takáč
Školiteľ: Doc. Ing. Michal Zábovský, PhD.
Dátum obhajoby: 20.8.2013
Študijný program: 9.2.9 Aplikovaná informatika
Oponent 1: prof. Ing. Karel Šotek, CSc.
Oponent 2: doc. Ing. Ján Genči, PhD.
Oponent 3: doc. Ing. Karol Grondžák, PhD.

Slovenský abstrakt:

V dizertačnej práci sú popísané aktuálne poznatky o spracovaní dát v rozsiahlych databázach, hlavne relačných, ich použitie a ich nedostatky. Za tú dobu čo boli relačné databázy vyvíjané paralelne aj s inými informačnými technológiami a hardvérom sa dostali do takého stavu, že už nie je problém dáta zhromažďovať, robiť operácie nad nimi, rýchlo ich sprístupniť v krátkom čase na celom svete, ale pochopiť ich a orientovať sa v nich. Technológia nás predbehla v tomto smere a bez zmeny prístupu k týmto dátam a informáciám sa v nich budeme čoraz viacej strácať. Súčasné veľké objemy dát si vyžadujú nové metódy ich spracovania a používania. Bez toho aby sme sa v dátach vedeli orientovať a poznali vzťahy medzi nimi sú nám zbytočné a nevyužijeme nikdy plne ich potenciál a informácie v nich ukryté. V práci sa venujem hlavne možnostiam spracovania veľkých objemov dát a následne ich efektívneho pochopenia človekom pri danej forme spracovania a podania informácie. Takýto problém tiež nastáva pri snahe o pochopenie rozsiahlej relačnej databázy. Stovky tabuliek a vzťahov medzi nimi sa nedajú chápať naraz a ťažko odhadovať čo je podstatné a kde hľadať relevantné dáta a informácie. V práci sa rieši sémantika relačných databáz a možnosti ako uľahčiť používateľom používanie relačných databáz (najmä rozsiahlych), v ktorých sa ťažko hľadajú informácie a kde sa zle orientuje. V práci sú vytvorené tri prístupy na riešenie tohto problému. Prvým prístupom je vhodná vizualizácia tabuliek a vzťahov (ER diagramu) rozsiahlej relačnej databázy, ktorá umožňuje lepšiu orientáciu a hľadanie kľúčových častí v databáze. Druhým prístupom je skúmanie charakteristík a vizualizácia grafu (siete) vytvoreného zo schémy a dát relačnej databázy. V tomto prístupe je ukázané, že siete vzťahov vo veľkých relačných databázach sú bezškálové a táto vlastnosť sa dá využiť na ich skúmanie. Metódy oboch týchto prístupov sú aplikované vo vytvorenom softvérovom nástroji RDBAnalyzer. Tretím prístupom je využitie sémantických metadát z relačnej databázy na automatické generovanie ontológie, ktorá umožňuje vyhľadávať informácie a orientovať sa v databáze. Všetky tieto prístupy boli overené na reálnej rozsiahlej databáze. Vytvorené prístupy a metódy sa dajú aplikovať aj na nerelačné databázy resp. na všetky databázy, kde sú vzťahy medzi dátami. Kľúčové slová: VLDB, relačné databázy, databázové metriky, rozsiahle databázy, vizualizácia dát, vizualizácia databáz, mapovanie relačných databáz do ontológií, spracovanie rozsiahlych dát, bezškálové siete.

Anglický abstrakt:

This dissertation thesis describes the current knowledge about data processing over very large databases, especially using and insufficiency in relational databases. During the time that were relational databases developing in parallel with other information technologies and hardware we got to a state where is no longer a problem to collect data, do operations with them, quickly access data in very short time over the whole world but to understand the date and to orientate in them. Technology has overtaken us in this direction and without changing the access and the approaches for processing these data and the information we will be confused with it. The actual large volume of data requires new methods for processing and using it. If we cannot orientate in the data and we do not understand the relationships between them we can never utilize the full potential of this data and discover the hidden information in them. In this thesis I focus mainly in possibilities of processing over large volumes of data and effective understanding by human. This problem also occurs when people try to understand large relational databases and find out information in them. Hundreds of tables and relations between them cannot be seen at once and it is hard to estimate what is important and where to find relevant data and information. This thesis deals also with semantics of relational databases and possible ways to facilitate users to use relational databases (especially large) when is difficult to find information and hard to orientate. In this thesis there are developed and documented three approaches that deal with mentioned problems. The first approach is suitable visualization of relations and tables (ER diagram) in large relational database which allows better orientation and finding the key parts in the database. The second approach is to examine the characteristics and visualization of graph (network) generated from the scheme and data of relational database. In this approach it is shown that the network of relationships in large relational databases is scale - free and this feature can be used for examine it. All methods of these two approaches are applied in developed software tool RDBAnalyzer. The third approach use semantics metadata from relational database to automatically generate an ontology which allows to search information and to orientate in the database. All of these approaches have been tested on real large relational database. Created approaches and methods can be applied to not only relational databases but also for all databases where the data are in relationships. Keywords: VLDB, relational databases, database metrics, very large databases, data visualization, database visualization, mapping relational databases into ontologies, large data processing, scale free networks.

Autoreferát dizertačnej práce

Späť

Partneri FRI