A multidimensional database can be simply understood as: storing data in an n-dimensional array instead of storing it as a relational database. So it has a large number of sparse matrices, and people can observe the data through multidimensional views.
Multidimensional databases refer to storing data in a gate-dimensional array rather than as a relational database. So it has a large number of sparse matrices, and people can observe the data through multidimensional views. Multidimensional database adds a time dimension. Compared with relational database, it has the advantage of improving data processing speed, speeding up response time and improving query efficiency.
1. Introduction to multidimensional databaseMulti Dimesional Database (MDD) can be simply understood as: storing data in an n-dimensional array instead of storing it as a relational database. So it has a large number of sparse matrices, and people can observe the data through multidimensional views. Multidimensional database adds a time dimension. Compared with relational database, it has the advantage of improving data processing speed, speeding up response time and improving query efficiency. The MDD information is stored as an array, so it can update the data without affecting the index. Therefore MDD is very suitable for reading and writing applications.
1.1. Problems with relational databases
Limitations of using relational database queries with SQL:
1) The query becomes more cumbersome because of the need to "join" multiple tables, and the query statement (SQL) is not well programmed;
2) The overhead of data processing tends to be large due to the need for relational databases to access complex data.
The limitations of the relational database management system itself:
1) Limitations on the data model
The two-dimensional table data model used by relational databases cannot effectively handle multidimensional data that is typically present in most transaction processing applications. The inevitable result is that in complex ways, the number of interaction tables has proliferated, and models that simulate real-world data relationships are not well provided. Because of the large number of data models used in relational databases, it may also cause a massive increase in storage space and a large amount of waste, and the system's response performance will continue to decline. Moreover, in real data, there are many types that are not well handled by relational databases.
2) Performance limitations
The relational database management system designed for static applications such as report generation does not undergo an optimization process for efficient transaction processing. The result is often that some relational database products do not achieve the desired results in the GUI and Web transaction processing. Unless more hardware investment is added, this does not solve the problem fundamentally.
Using the two-dimensional table data model of a relational database, you can handle typical multidimensional data in most transaction processing applications, but the result is often the establishment and use of a large number of data tables, it is still difficult to build a data model that can simulate the real world. And when the data needs to be output as a report, it is necessary to reverse the set of two-dimensional data tables, and then use the index and other technologies to connect the tables, then all the required data can be found, which will inevitably affect The response speed of the application system.
3) Extend the limitations on scalability
The ability of relational database technology to effectively support application and data complexity is limited. The normalized design method that the relational database originally relied on has been powerless for the design and performance optimization of complex transaction processing database systems. In addition, high development and maintenance costs are unaffordable for businesses.
4) Retrieval strategies for relational databases, such as composite indexing and concurrent locking techniques, can create complexity and limitations in their use.
1.2. Related definitions of multidimensional databases
Dimension: It is a specific angle of people's observation of data. It is a kind of attribute when considering a problem. The set of attributes constitutes a dimension (time dimension, geographic dimension, etc.).
Level of the dimension: People can observe a certain angle of the data (ie, a certain dimension) and can also have various descriptions of different levels of detail (time dimension: date, month, quarter, year).
Member of a dimension: A value of a dimension, which is a description of the position of a data item in a dimension. ("A certain day of the month" is a description of the position in the time dimension).
Measure: The value of a multidimensional array. (January 2000, Shanghai, laptop, 0000).
The basic multidimensional analysis operations of OLAP are Drill-up and Drill-down, Slice and Dice, and Pivot.
Drilling: It is to change the level of the dimension and transform the granularity of the analysis. It includes Drill-down and Drill-up/Roll-up. Drill-up summarizes low-level detail data to high-level summary data in a certain dimension, or reduces the number of dimensions. Drill-down, on the other hand, compares the summary data to the detailed data to observe or add new dimensions.
Slice and dicing: The distribution of the metric data over the remaining dimensions after selecting values ​​on a portion of the dimension. If there are only two remaining dimensions, it is a slice; if there are three or more, it is a slice.
Rotation: is the direction in which the dimension is transformed, that is, the placement of dimensions (such as row and column interchange) is rearranged in the table.
1.3. Characteristics of multidimensional databases
The main feature of post-relational databases is the integration of multidimensional processing and object-oriented techniques into relational databases. This database combines the speed and adjustability of a processed multidimensional data model with powerful and flexible object technology. Because of its unique compatibility, post-relational databases are ideal for developing high-performance exchange processing applications. In the post-relational database management system, a more modern multidimensional model is adopted as the database engine. Moreover, this unique multidimensional database architecture based on sparse arrays is inherited and developed from the database language that has become an international standard, and is an advanced and reliable technology that has accumulated practical experience.
Multidimensional data models make data modeling easier because developers can easily use it to describe complex real-world structures without having to ignore real-world problems or forcing them into technically manageable forms, and The multidimensional data model greatly reduces the time required to perform complex processing. For example, when developing a clothing chain information management system, if you use a relational database, you need to create a number of tables, one for each color and size of each style, and another for creating clothing and suppliers. Map and indicate whether it has been sold, in addition to the need to build some tables to indicate price changes, inventory of stores, and so on. Every time a transaction is made, all of these tables need to be modified, and soon these relational databases become cumbersome and slow. In the multidimensional data model, you can think of these data as being in a "cube" with enough "faces" to fully classify the data, such as style, color, price, inventory, etc. The data can be mapped to each other immediately, the data is extremely fast, and the multidimensional database is very simple, not only easy to use, but also more economical due to the elimination of redundant data.
2. Existing multidimensional database related classificationThe existing multi-dimensional databases are mainly divided into "pure" multi-dimensional databases and "quasi" multi-dimensional databases. The former mainly uses caches, which are not dependent on relational databases, and the latter mainly relies on relational databases, and extracts them on top of them. Data generation multidimensional data tables facilitate statistics and analysis.
Most of the existing quasi-multidimensional databases are based on relational databases, and multidimensional data is built on the basis of data provided by relational databases for easy query and analysis. It mainly includes the following: Oracle-based ORACLE EXPRESS SERVER, SQL Server-based Microsoft SQL Server Analysis Services, DB2-based OLAP Server, and Hyperion Essbase database.
2.1. Cache database
The Caché database is an object-oriented, multidimensional database that supports SQL access. In the division of the database, the transcendental relational database is called the third generation, post-relational database. The database has the following characteristics:
1, fast. The Cache database queries the same data under the same conditions faster than ordinary databases such as Oracle. Cache is based on common relational databases such as Oracle, SQL server, Sybase, etc. and has been improved. Caché is comparable in performance to in-memory databases, enabling tens of thousands of insertions per second on a single notebook. Caché's unique dynamic bitmap indexing technology enables the database to be queried and analyzed while updating, without compromising performance.
2, easy to use. The Cache database supports standard SQL statements, so users who are not familiar with the M language can still easily manipulate the data in the database.
3, the interface is easy. The Cache database supports the ODBC standard interface, so it is very easy to exchange data with other systems. At the same time, the Cache can also output the data into a text file format for other system access calls.
4. The real 3-layer structure. The Cache database can truly realize the 3-layer structure and realize the real distributed service. Easy to upgrade and expand. Because of the above-mentioned distributed 3-layer structure, when the hospital needs to increase the client PC or the hospital to expand the scale, it is not necessary to re-purchase or update the primary server, and only need to appropriately increase the number of secondary servers, and the secondary server is relatively It is much cheaper than the main server, so the hospital can save money and reduce duplication of investment.
5. Object type editing. The Cache database is a real object-type database. When developing, users can directly define the objects they want by using the database, and then call the methods and properties of the object in other development tools to complete the development work, which is very convenient. Support for remote mapping and mirroring. The Cache database supports remote mapping and mirroring. For example, between different cities or between different areas in the same city, the Cache can perform mirroring to synchronize the Cache databases in different areas, although in different areas. Everyone uses it like sharing a database.
6, flexibility. Application software based on Caché database can be run on multiple operating system platforms (such as Windos98/NT, various UNIX and Linux environments without modification, or can be deployed at two or three layers C/ at will. The S structure is in the client/server environment, or the B/S structure is in the browser/server environment, and the number of application servers and database servers is arbitrary at runtime to increase the expansion without affecting the operation.
7, support WEB development. The Cache database provides its own web development tools, which are easy to use and maintain, and are in line with the development trend of the software industry today.
8, the price is cheap. The price of the Cache database is much cheaper than Oracle.
This kind of database has surpassed the limitations of the traditional relational database. Under the mission critical and sudden large load in the Internet or Client/Server environment, Caché has a unique and superior high response rate, highly flexible scalability, and high strength. Online processing capabilities.
In InterSystems' global business, 50% are in the medical industry, 30% in the financial industry, and 20% in shipping, hotel management, and conference systems. All of these industries have one thing in common, they all need a huge database, and the data needs to be updated soon. There are many successful cases abroad, such as: the top ten hospitals in the United States, the three major medical and health experimental institutions, the world's largest online securities trading company, the Merrill Lynch Investment Group, the US Department of Defense, etc. all use the Caché database; there are some applications in China. However, it is mainly limited to the medical industry, such as: Beijing Anzhen Hospital, Fuzhou Military Region General Hospital and the First Affiliated Hospital of Harbin Medical University. Among the HIS systems in the United States and Europe, the CACHE database accounts for the largest proportion and is recognized by the medical community as the preferred database.
2.2. Microsoft SQL Server Analysis Services
The SQL Server 2008 Analysis Services tool implements the construction of a multidimensional analysis database, while providing management tools and user access software. It uses a technique called "Block Computation," which allows the aggregation of partitions and copies of their source data to be stored in a multidimensional structure on the analytics server computer. Take advantage of the scarcity of cubes and only process non-NULL data to improve query efficiency. More suitable for partitioning in frequently used cubes and the need for fast query response. It can greatly improve the efficiency of the query, so it can be analyzed at a finer granularity. Data mining algorithms in Analysis Services provide this predictive analytics capability, while SQL Server 2008 Analysis Services improves data mining algorithms for more comprehensive analysis.
SQL Server 2008 Analysis Services introduces a new set of innovative Best Practice Design Alerts that automatically notify potential design issues early in the development process, reducing time wastage due to design errors And facilitate the implementation of a faster development process. SQL Server 2008 Analysis Services further enhances developer productivity with new and improved cubes, dimensions, and attribute designers. Analysis Services scales to support many databases that scale to terabytes and serve thousands of users. SQL Server 2008 Analysis Services provides Dynamic Management Views similar to those used by the database engine. These features provide real-time enterprise system information for monitoring, analysis, and performance tuning. After SQL Server 2008 Analysis Services, the new backup storage subsystem causes the backup time to grow linearly as the database size increases.
Analysis Services 10.0 The OLE DB provider (msolap100.dll) is the interface through which an application interacts with Microsoft Analysis Services. ADOMD.NET is the Microsoft .NET Framework data access interface for communicating with Microsoft SQL Server Analysis Services.
2.3. Oracle Express Server
Oracle Express Server is an advanced computer engine and data cache. It uses a multidimensional model that best reflects the user's way of thinking about their business, extending the rows and columns of the spreadsheet to three or more dimensions. The dimension can be time, product, product line, region, and the object of user analysis can be comprehensive data like unit sales. Queries for multidimensional models are very fast. These queries are arithmetic calculations for a part of an array. Therefore, this array supports the largest and most complex OLAP applications.
Express Server can store and manage multidimensional arrays or provide direct-oriented analysis through a complex multi-dimensional caching scheme with little or no indexing. Oracle Express Server not only supports multidimensional models, but also has the ability to analyze, predict, model, and perform what-if analysis. Built-in features for math, finance, statistics, and time series management. Scalability, robustness, and application-based features support multiple users and achieve integrity control for large libraries. Flexible data organization, data can be stored in the Express Server, or directly on the RDB, with built-in analysis functions and 4GL user-customized queries.
Oracle Express Server is an advanced multi-dimensional computing engine that is the foundation for OLAP analysis. The latest version of Express Server is 6.3, which has significant improvements in processing power, analysis capabilities, and more.
1. Improved processing power: Express Server 6.3 demonstrates the fastest computing power and query performance of OLAP servers. Express Server 6.3 introduces a number of new features that greatly enhance Express Server's support for large data volumes and large concurrent users.
2. Faster summary calculations: Express Server 6.3 introduces a new summary calculation management mechanism. The new rollup mechanism allows for custom summary methods and can significantly reduce the time to load and roll up calculations.
3. Improved analysis capabilities: The newly introduced statistical analysis function will significantly improve the analytical capabilities of Express Server 6.3.
4. Improved predictive capabilities: The new forecasting system will provide data sampling and the ability to recommend best predictive methods based on data patterns.
5. Web-based management tools: The management of Express Server will be unified by the new Express Instance Manger, a Java-based application that integrates with Oracle Enterprise Manager. This allows DBAs to manage multidimensional databases on NT or UNIX through Oracle Enterprise Manager's Java window or Browser.
6. Oracle Express Support for Web Technology: An important development strategy for Express Server is to support Internet computing, which is an important aspect of Express products that are ahead of their competitors. The DBA can manage multidimensional databases on NT or UNIX through the Oracle Enterprise Manager's Java window or Browser. Express Server has added Express Web Agent option since version 6.0, enabling Express Server-based OLAP applications to have Web publishing capabilities.
7. Support the integration of various relational database systems.
2.4. DB2 OLAP Server
IBM offers a suite of business intelligence (BI) solutions based on visual data warehouses, including: Visual Warehouse (VW), Essbase/DB2 OLAP Server 5.0, IBM DB2 UDB, and front-end data presentation tools from third parties (eg BO) and data mining tools (such as SAS). Among them, VW is a very powerful integrated environment for data warehouse modeling and metadata management, as well as for data extraction, transformation, loading and scheduling. Essbase/DB2 OLAP Server supports definition of "dimensions" and data loading. Essbase/DB2 OLAP Server is not a ROLAP (Relational OLAP) server, but a hybrid ROLAP server (ROLAP and MOLAP). After Essbase completes data loading, the data is stored in the DB2 UDB database specified by the system.
Strictly speaking, IBM does not provide a complete data warehousing solution. The company is adopting a partner strategy. For example, its front-end data presentation tools can be Business Objects BO, Lotus's Approach, Cognos's Impromptu, or IBM's Query Management Facility; multi-dimensional analysis tools support Arbor Software's Essbase and IBM (jointly developed with Arbor) DB2 OLAP servers; The statistical analysis tool uses a SAS system.
IBM DB2 OLAP Server integrates Hyperion Essbase's OLAP engine with DB2's relational database. Fully compatible with the Essbase API, the data is stored in the relational database DB2 using a star model.
2.5. Hyperion Essbase
Hyperion Essbase is an online analytical processing (OLAP) server that uses a multidimensional model to extract data from a range of data sources, synthesize them after calculation, and then provide quick access to results. Is a multidimensional database server that can create a "block store" or "aggregate store" database for small, high-density data sets that require read/write access, and for sparseness with many dimensions and read-only access , sales analysis type of application.
Essbase is a multidimensional database of BI software hyperion and has been updated to version 11. It is different from the relational database in the usual sense. Essbase divides the data into "blocks", and each data block defines different dimensions. Essbase has 7 default dimensions and can define 13 user dimensions. The seven default dimensions are Account, Period, Year, Scenario, Currency, Version, Entity.
Features of Essbase:
1, high performance: quickly query response
2, calculation / analysis capabilities
polymerization
Unrestricted cross-dimensional computing power
Scenario hypothesis analysis
Apportion
Trend analysis / regression analysis
Decision Tree / Neural Network / Association Analysis
Financial intelligence / currency conversion
Mathematical function
prediction
Hyperion Essbase Status: Server-centric distributed architecture – with over 100 applications; more than 300 developers using Essbase as a platform; with hundreds of calculation formulas to support multiple calculations; users can build their own Complex queries; fast response time, support for simultaneous reading and writing by multiple users; more than 30 front-end tools to choose from; support for multiple financial standards; integration with ERP or other data sources; more than 1,500 users worldwide
3. Brief comparisonThe comparison of the above five multidimensional databases is as follows:
As the medical industry is recognized as the preferred database, the Cache database has many advantages, such as faster query speed, simpler use and flexibility, etc., suitable for direct use in the development stage, due to the form of direct insertion of data, The old products that are not using Cache are not suitable, and the mobility of data is not very good.
As a typical product of Microsoft's application on multidimensional data, Analysis Services has better query and analysis performance, and because of the wide application of Sql Server, Analysis Services has more applications. Sql Server 2008 has made some improvements to Analysis Services to further improve query efficiency and analysis capabilities. It is applicable to a variety of data sources in terms of data sources, but the application environment of this system is only windows, and cannot be used on Linux systems.
Oracle supports relational data storage and multidimensional data storage, and uses Oracle Express Server to achieve multidimensional data storage. Prior to the release of Express Server 6.3, Oracle Express Server's technology update was too slow, taking up a lot of memory and affecting its applications. Express Server 6.3 improves processing power and analysis capabilities, but the degree of improvement needs to be verified.
DB2 OLAP Server is a hybrid HOLAP server with ROLAP and MOLAP. After Essbase completes data loading, the data is stored in the DB2 UDB database specified by the system. Fully compatible with Essbase's API. The main application is on the DB2 database.
Hyperion Essbase is a multi-dimensional database server that supports extracting data from a wide range of data sources, but unlike Oracle OLAP, which stores data outside of a relational database engine, it typically stores data on its own dedicated server for faster query response. Calculate analytical capabilities.
Considering that the database currently applied is an Oracle database, although the cache database and the DB2 OLAP Server have higher performance, it is unlikely that the two products are used in a short time, and the existing database data cannot be quickly and smoothly transplanted or applied to the Cache database. DB2 OLAP Server will rely on the Hyperion Essbase database, which can be cumbersome to use with the current Oracle database. So temporarily choose the following three strategies.
Strategy 1: Oracle + Oracle Express Server
Strategy 2: Oracle + Hyperion Essbase
Strategy 3: Oracle + SQL Server Analysis Service
A comprehensive comparison of the above three strategies is as follows:
Table 1: Comparison of the three strategies
Ncm Battery,Lithium-Ion Battery With Ncm Material,18650-2800Mah Battery,Nicomn Lithium Battery
Henan Xintaihang Power Source Co.,Ltd , https://www.taihangbattery.com