| Open Sky Home | Get this document in PDF format (40K) |
Information processing, not data processing has become the dominant strategic requirement in computing. Open Sky Technologies has found that the basic architectural design of traditional RDBMS products is not particularly well suited to today's enterprise application, decision support and network computing environments.
To avoid RDBMS and post relational DBMS limitations, Open Sky Technologies moved beyond and outside of RDBMS design, and completely reinvented database architecture. The MPbase architecture uses several new and interrelated design features, including a Content-Addressable-Memory Schema, Resource-Centric Processing, Network-Centric Architecture, and Multi-dimensional Data Intelligent Run Length Encoding. Each feature multiplies the effectiveness of the others. The resulting capabilities are extraordinary and dramatic, and vastly simplify the operating environment.
Each MPbase is designed from the ground up to completely avoid
or resolve long standing RDBMS limitations including:
All current DBMS's, even the more sophisticated implementations
of RDBMS such as Teradata, Model 204, Oracle 8i, and Sybase IQ
are saddled with these same basic limitations.
In resource-centric processing the resources are in control, not the tasks. In the current, and almost universally used, task-centric model, the tasks are in control of the system. The system's resources are queued up and shared by all of the tasks currently running in the system. Queuing theory states that the total utilization of each queued and shared resource will only be about 50% of its total capacity. In short the price of task-centric multi-processing can be up to 50% or more of the total system's resources.
In the resource-centric processing model, it is the tasks that are split up and queued, to be shared by the system resources. This allows each resource to be used at 100% of its capacity. Suddenly 100% CPU utilization is a good thing. Each resource can now run at 100% anytime it has work to do.
Imagine going to the library to check out many different books, each from a different category. In the task-centric model, you would need to wait in a separate line for each category to ask a librarian to retrieve each book. In the resource-centric model you would present your list of books and all of the librarians would be aware of the requirements concurrently. The librarian most qualified to retrieve each book would take that part of the assignment. MPbase is the first and only database management system design to use the resource-centric model.
MPbase uses a "Content Addressable Memory" (CAM) schema to store the data. With CAM, each data element describes its own storage location. Like data will always be found close together in the logical data structure. CAM bunches data up, which is detrimental to an RDBMS. Relational databases require the data to be normalized and uniformly spread across the physical media. MPbase avoids the systemic problems associated with the need for normalized data and uses a physical storage schema optimized to take advantage of CAM data bunching. The data in a CAM database sorts itself out, based on like data, during the load process. The update/insert function is as efficient and fast as any batch load process. MPbase has no batch load function, none is needed. In addition, the requirement for reorganizational unload/reloads is dramatically reduced. When this normally painful process is required, MPbase can accomplish it in a parallel and painless manner.
MPbase does not use a traditional index. With MPbase, the database and index are one and the same. The entire database is the index. This is a natural by-product of a CAM schema and allows for a single table logical view of the database resulting in the elimination of index meta-data.
In addition, MPbase supports virtual meta-data (of accumulations) for decision support & data mining. Virtual meta-data is created "on the fly" and so is never out of sync. This virtual meta-data can be created automatically based on the needs of the user. In an RDBMS, meta-data can be larger than the data itself. This typically requires a separate meta-data database. MPbase can eliminate the need for separate meta-data repositories and their associated storage requirements.
MPbase's architecture has laid the foundation for an innovative network attached, database-resident, I/O subsystem design. An MPbase database can be network attached via TCP/IP, and is massively parallel by design. It is composed of one or more loosely coupled independent processors that communicate with each other through TCP/IP. MPbase operates as the thinnest possible veneer on top of the UNIX operating system. Whereas, the RDBMSs are designed to be a separate operating system on top of an operating system and therefore, require a far more tightly coupled architecture to function.
An innovative and completely unique aspect of MPbase is the way
it stores and searches on data in a lossless, highly compressed
common format using Multi-dimensional Data Intelligent Run Length
Encoding (MDIRLE). MDIRLE is a computer readable shorthand that
makes MPbase the first database that searches on and processes
the compressed data. The compression ratio is anywhere from 2:1
to 100:1 depending on the type of data. All other databases must
first uncompress the data to search and process it. MDIRLE both
improves performance and reduces the physical storage requirements
for the database.
MPbase's design adds up to an incredibly efficient, versatile, powerful and fast way of managing and storing your data. Performance comparisons with the best-known RDBMS's have demonstrated improvements that often approach 2 orders of magnitude (100x). Tests have shown that MPbase's internal processing rate running on multiple nodes connected on a 100Mbps Ethernet can be significantly faster than an RDBMS calling the data directly from disk over a SCSI pipe.
The performance of MPbase scales at a linear rate or better. This means a 10 processor I/O cluster will run at least 10 times faster than a 1 processor cluster.
The MPbase architecture has NO upper limit to scaling size or performance. This means you are never in a position where you can not grow the system.
Loosely coupled resources provide other benefits. Nodes in the MPbase I/O cluster need not be in the same room or even located at the same physical site. The result is a single uncomplicated, easy to use/manage, consistent database image distributed on hardware across a TCP/IP enterprise. An RDBMS requires complex hardware, software, and a significant on-going technical effort to do the same. It must solve issues of real-time synchronization, updates, backups, etc. MPbase solves these issues internally. An RDBMS must bridge to the Internet/Intranet, whereas MPbase uses this same environment as its basic infrastructure.
The higher the volume, the more resources (disks, processors ) are used. The more resources used, the more efficient and faster MPbase becomes. This makes MPbase's parallel architecture ideal for all applications requiring incredibly high volumes of updates or inserts such as telemetry, seismic processing, satellite imagery, robotic exploration, credit card validation, and any other high demand application.
The MPbase architecture dramatically reduces the need for complex joins. This is due to the CAM schema. With an RDBMS, a complex join is the linking of several tables to complete a single query. This dramatically decreases database performance. MPbase almost never requires a complex join. This means MPbase can truly simplify and speed up the acquisition of information (the promise of data warehouse/decision support and data mining initiatives) giving users what they want when they want it.
Because a physically inconsistent image of the data is impossible with MPbase, rolling backup/restore processing can be done in segments without having the database quiesced. It is this same capability which when used with mirroring, provides a true fault-tolerant, 7 by 24 operation without requiring expensive, complex fault-tolerant system designs, software and hardware.
Multi-dimensional Data Intelligent Run Length Encoding in combination with CAM allows MPbase to identify data errors and interesting data. It does this by isolating data that does not conform. Data scrubbing is usually a Herculean task for any RDBMS. It is a natural byproduct of MPbase.
MPbase creates a new standard for accessibility. Any program or system that uses TCP/IP can access an MPbase database. MPbase can replace any relational, flat file, and hierarchical database engine by providing any needed view of the same data the applications and/or users require. Its ability to provide in parallel different views of the same data lets MPbase become the database engine for multiple concurrent, diverse applications.
MPbase can be accessed directly or cross platform, from SQL, HTML, XML, IP sockets, JAVA, RPCs, shell scripts, command line and custom programs.
All data (quantities and types) can be stored in MPbase. Data
that would normally require an external Big Binary Blob can be
stored and accessed in MPbase at the individual data point level.
As an example an image stored in MPbase can be accessed at the
individual pixel/color level. This allows a single stored image
to be viewed at any needed resolution. Such an image can be analyzed
at the pixel level while still in the database.
A database of 50 billion 84-byte rows designed to grow to 150 billion was built using 30 Sparc 5 workstations connected together over a 100Mbps TCP/IP based Ethernet. The selected, sorted, and returned access rate was over 90,000 rows/sec with a 25% key hit rate. This means the database was handling 360,000 keys/second. This extract rate was maintained in parallel with a 150-rows/sec-insert rate, and a 50-rows/sec-update rate. The database ran 7 by 24 with no downtime. Data compression achieved on the single copy was 95%. Data compression achieved on the combination of primary and mirrored copies was 90%. This implementation had no single point of failure. Next generation applications could access this database at up to 5,000,000 rows/sec or 18,000,000,000 (billion) rows/hour.
| Node/Server * | (15) 450 MHz Intel PII nodes | (1) UltraSPARC 6000 SMP (30) CPUs |
| Operating System | Linux | Solaris |
| Disk Storage | JBOD, 9GB SCSI disks | RAID-5 |
| Storage Requirements | Raw
. 325 GB
Meta-data .not needed | Raw
. 1.3 TB
Meta-data .1.3 TB |
| Qty | Cost | Qty | Cost | |
| Servers | 0 | $ 0 | 1 | $ 770,000 |
| Database Storage | *15/36 GB | $ 300,000 | 1.3 TB | $ 325,000 |
| Meta-data Storage | 0 GB | $ 0 | 1.3 TB | $ 325,000 |
| Total Hardware Costs | $ 300,000 | $ 1,420,000 | ||
| Open Sky Home | Get this document in PDF format (40K) |