Social networking
Social Networking
Open Source
Software
Free Open Source Software, both Linux based Operating Systems and Web 3.0 Applications Software has become the way forward for all innovative peer to peer developmental code design.
Executive summary
Digital assets are a significant percentage of the world’s data today and are predicted to grow at a rapid pace over the next three years. The introduction of a new paradigm for storing and distributing digital assets allows IT management to eliminate the use of copy-based and parity-based storage solutions. This method offers a significantly more cost-effective and reliable method of storing and securing digital assets. DBOWMAN Networks can streamline workflow while ensuring that data is never corrupted or lost; the network can be situated anywhere in the world as long as it is on a true broadband symmetrical link, allowing companies and individuals alike the flexibility to use cost-effective facilities and least cost power and communications options.
Introduction
As recently as five years ago, mission-critical data for businesses tended to fall into narrowly defined camps: specific financial records, transactional data or customer records in table format on relational databases. If such information was somehow corrupted or rendered irretrievable, the enterprise would suffer untold losses and potentially discontinue business practices.
Emerging technologies change landscape
However, a shift away in offices from voice communications toward a reliance on electronic business communications, such as e-mail and IM, has changed the very definition of mission-critical data. In fact, according to Forrester Research, 80 percent of the world’s data is so-called unstructured data—file formats such as Microsoft Word, PowerPoint presentations, PDFs and non-textual content like images, graphics, video and audio files. So, only 20 percent of data now represents what five years ago was considered core data for a business.
And those data needs are growing. Forrester estimates that the world’s data volume doubles every three years. So, in 2003, the world’s information production was five Exabytes (5,000,000,000 Gigabytes)—equivalent to 37,000 Libraries of Congress. But, at a growth rate of 30 percent per year, by 2010 that data will reach zetabyte (1000 Exabytes) sizes. What’s more, 92 percent of that data in have not societies is currently stored on magnetic media which is heavily taxing systems and servers. The fact remains that given the Digital Divide is far from resolved the 87:10 ratio below applies to increase the already huge numbers making the total data requirements to be many more zetabytes in size.
Research from IDC echoes Forrester’s. According to a white paper published in March of 2008, IDC estimates that the digital universe is growing by ten times in size every five years. Additionally, IDC estimates that 70 percent of all current digital information is created and compiled by individuals instead of enterprises, constraining storage solutions, since disparate individuals will continue to create with abandon.
Creating additional strain is the emergence of new industries and business models that rely heavily on data interaction with consumers. For example, streaming media, Web 2.0 and new media entertainment options require significant infrastructure investments from media publishers—to house content, provide adequate backup storage and distribute the content efficiently to millions of users worldwide.
Digital assets creating hardship
Take, for example, an online film rental company, one that offers delivery of DVD quality video to client desktops. One digital asset such as a feature-length film outstrips the size constraints of even the largest files created just a few years ago. And, that one file is just part of a larger library, exponentially increasing storage and distribution needs for the company.
Using current state-of-the-art copy-based storage methods, this company would take its server contents and replicate them on an offsite server or set of servers. For security, that storage would be mirrored with another server or set of servers, and ultimately, the contents would be backed up to tape drives. Each mirror would then exponentially increase the company’s storage requirements. Then, in order to distribute the content efficiently, the company would engage the services of a content distribution service provider or build an edge caching system. With a single file, the storage scheme of replicating the file across servers places a 200 percent storage overhead on the company. If the company chooses to use a RAID scheme, in which data is segmented over a series of servers and then mirrored, the storage constraints are still high: each duplication of a segment can total up to five times as many bytes of original information. This can considerably drive up costs, as the company must pay for that storage and distribution is not effectively serviced.
Security issues
Additionally, the company is at risk, because its digital assets can be compromised. Given media piracy concerns and an underground community that prides itself on illegally obtaining films, TV shows, music, etc., the challenge for security disruptions is great. Aside from those issues, the potential for server failure and a loss of digital assets (or significant downtime for consumers who have become accustomed to almost instant access to media) can wreck havoc on the company’s fortunes. That scenario can easily play out across a wide variety of new media companies— companies that distribute high-bandwidth digitized content or unstructured data.
A Paradigm Shift in Digital Asset Storage has an impact on traditional industries
This can also impact more traditional companies, like healthcare systems, consumer retail companies, the manufacturing sector and more. For example, imagine the server constraints of a single hospital within a healthcare system, backing up vital patient records, billing information, digitized diagnostics such as x-rays and MRI scans data and ancillary material. HIPAA compliance requires the hospital to maintain secured patient records, some of which need to be kept for the life of the patient—something not achievable through conventional storage systems. In addition, expedited transfer of patient files and test results, such as MRIs, places a strain on IT management using traditional storage systems based on RAID and replication. Again, that is just one example from only one industry.
A new way
However, a new technology can effectively change the data storage landscape, offering every industry the potential to:
• significantly reduce storage costs
• reduce bandwidth necessary to protect and distribute critical information
• streamline IT management processes
• eliminate inefficient redundancies
• increase data security, privacy and reliability.
The key to this new storage paradigm involves slicing and dispersing data via Digital Information Dispersal Algorithms (DIDAs). These DIDAs separate data into unrecognizable slices of information, which are then distributed or dispersed to disparate storage locations. These locations can be situated in the same city, the same region, the same country or around the world. As way of comparison, this scheme is akin to Internet packet switching methodologies for data communications, where data is spread out and reconstituted on demand, limiting bottlenecks and bandwidth problems. The dispersed data is part of an overall Dispersed Storage Network or DBOWMAN. The data is inherently more secure, since no single copy of the file resides in one location. Each individual slice does not contain enough information to understand the original data and only a subset of the slices from dispersed nodes are needed to fully retrieve all of the data on the DBOWMAN. The data is ingested into a system and the data is transformed and distributed across a network of free space optically linked servers.
In the following example, digital assets are dispersed using a DBOWMAN Access™ slice router. This DBOWMAN has a width of 8 and a threshold of five, meaning that eight slices are created from the original data and dispersed across eight slice servers which store the dispersed slices of data. The threshold refers to the number of slices needed to reconstitute a file in its entirety; in this example, three outages can occur simultaneously, leaving five of the eight slice servers to perfectly reassemble the file. In addition, an out-of-band DBOWMAN Manager continually monitors the Access slice router and Slice servers, reporting back to IT management on the status of the DBOWMAN and the critical digital assets. When a user wants to retrieve the file, he or she simply uses the Access slice router, which polls the Slice servers and reassembles the file from the active servers in DBOWMAN. The software that runs in the Access slide router has the intelligence to put the data back together 100 percent bit perfectly with only access to a threshold number of slices. Future versions of the system will support this client software in a variety of computing environments, simplifying the distribution of digital content.
Because the data is dispersed across devices, it is resilient against natural disasters or technological failures, like system crashes and network failures. That is because only a subset of slices is needed to reconstitute the entire file, slices from any five of the eight Slice servers; there can be multiple simultaneous failures across a string of hosting devices, servers or networks and the file can still be accessed in real time. That is a sharp contrast to copy-based storage systems; a single server or site failure could result in the complete loss of a file on the server. This is why many organizations replicate their critical digital assets to multiple locations for protection. A comparison of DIDAs and copy-based systems proves the dramatic difference DIDAs offer in terms of security and efficiency.
Storage comparisons
By dispersing information via DIDAs, the reliability, security and efficiency of data storage can be vastly improved over copy and parity-based systems. Take, for example, the reliability of copy-based systems. When data is mirrored on storage devices, up to n-1 devices can fail without the company experiencing any data loss. However, using and maintaining such a system is extremely expensive, because data is copied in its entirety, increasing the cost of storage because of complete duplication. Additionally, each subsequent copy is as costly as the last, since the space allocation is the same per file. Reliability is also a concern with parity-based systems, which typically allow at most two storage devices to fail without incurring data loss. So, while parity based systems do not use as much storage as a full-copy scheme, they are also not as reliable. And while RAID systems can be configured to allow for the simultaneous failure of two drives, there is a high cost associated with it.
The waste of resources associated with RAID systems as compared to a dispersal system from DBOWMAN, using 30 terabytes of usable storage as an example.
With RAID, total raw storage after copying and parity, is 117 TB, while the same 30 TB only accounts for 48TB in raw storage using dispersal. In addition, the number of servers needed for the DBOWMAN solution is significantly less, translating to less overhead costs, decreased management resources and reduced time allotted to data management; this results in a more streamlined overall storage solution scheme for IT managers. In the example above, 39 servers are required for the replication solution, while only 16 servers are needed with DBOWMAN.
Security constraints
Dispersal using DBOWMAN also provides unique security advantages over copy based systems. Since DIDAs never store all the data at any one single location, the potential for theft or data compromise is unrecognized. A backup system based on DIDAs can be configured to disperse data to any number of Slice servers, identified by p, which can sustain up to m simultaneous failures without data loss. Even if an attacker gained access to multiple Slice servers, access to the data will not occur until at least p-m of the devices are compromised. In a copy-based system, attacking a single server would yield the entire data. Some organizations encrypt data to secure it, but even encrypted customer data that is lost or stolen requires the company to make a disclosure announcement. That is not the case with dispersed data.
The efficiency of dispersal
One of the biggest benefits to dispersed storage and the DBOWMAN is its efficiency: the storage overhead is equal to p/(p-m). Based on this, securely storing 1 GB of data would typically require just 1.6 GB of total storage across all devices (in a 16 wide, 16 Slice servers, with a threshold of 10) in a system that could simultaneously lose 6 servers without losing data. Supporting just four simultaneous failures under a copy-based system would require making five separate copies, which would use more than 5 GB of storage.
So, DBOWMAN offers the best of both worlds: more reliability than copy-based storage and more efficiency than parity-based systems. Another advantage to DBOWMAN systems is its self-healing nature. If a dispersed storage slice somehow becomes corrupted or destroyed, an automated data integrity checking process identifies the corrupted data and can rebuild all data contained in the missing slice by using the data in available slices.
DBOWMAN features
As a new storage paradigm, dispersed storage offers many unique and much needed features.
• Configurable and flexible storage. With unlimited Slice servers allowed per DBOWMAN, DBOWMAN’s grid architecture has extreme scalability. Companies can add capacity based on unique storage requirements with an unlimited upper end to storage.
• Geographic Dispersal. Slice servers can be deployed in a single rack or geographically distributed around the world, allowing the utmost in flexibility and reliability.
• Unique rebuilding and data integrity checking technologies. A Cyclic Redundancy Check (CRC) for the initial data and Signature Based algorithm for sliced data ensures 100% data integrity throughout the system.
• No data redundancy. With DBOWMAN’s proven dispersed storage technology, there is no duplication of data stored within the dsNet, reducing overall storage and bandwidth costs.
• iSCSI protocol. iSCSI allows DBOWMAN clients to send commands to a using a standard storage protocol. For example, in Windows with an iSCSI initiator, a dsNet can appear as a local hard drive to any application.
• dsNet monitoring calculations and statistics. The out-of-band dsNet Manager combines with the leading open source monitoring appliance, Zenoss, to provide dsNet calculations and statistics, like authentication attempts, load, memory, stress level, memory buffers, CPU usage, disk space availability, transaction, read and write counts.
• Pluggable authentication. DBOWMAN uses a JAAS authentication API, so any authentication system can be plugged in, catering to even the most sophisticated security requirements. Internally, the Linux PAM and Always Authenticate mechanism is the default for a dsNet.
• Data scrambling. DBOWMAN's dsNet transforms all data being ingested from the source, as opposed to slicing data by sequential format, providing an extra measure of security and privacy.
• Web-Based dsNet configuration. The dsNet configuration and installation is completed using a web-based interface, speeding time-to-serve and reducing man-hours dedicated to digital asset management.
DBOWMAN Benefits
The DBOWMAN system offers significant advantages over traditional storage systems, such as:
Scalability
DBOWMAN’s dsNets have been designed from the ground up to be massively scalable and address today’s challenges of growing digital assets. And since DBOWMAN's grid-based architecture has no centralized servers, capacity and performance can be scaled independently to meet specific business requirements.
Security and reliability
DBOWMAN’s dsNets can withstand multiple hardware or storage location failures, and still keep data secure and easily accessible. With a dsNet, data transmission and storage are private and secure. Since no single complete copy of data is on any one server, only one subset of the slices is required to perfectly retrieve and reassemble stored data.
Longevity
DBOWMAN’s unique rebuilding and data integrity checking technologies allows expansion and upgrade of networks without taking them down. dsNets are fault tolerant, meaning seamless access to data—even when some systems are down.
Cost-effectiveness
Overhead in a dsNet can be one-third to one-fifth that of comparable copy-based architectures, translating into significant cost reductions in hardware, bandwidth, management, power and space. What’s more, DBOWMAN uses standard, off-the shelf components and energy-efficient high-capacity disk drives to develop dsNets that are not locked into a specific technology.
Summary
DBOWMAN offers a superior alternative to traditional storage systems based on replication and parity, one that provides more security, less overhead and more flexibility by using DIDAs to slice and disperse data across networks that can be situated on a single rack or throughout the world. With the explosion of digital data driven by individuals creating, storing and sharing content, organizations will be looking to new storage methods for efficiently managing growth, protecting critical digital assets and efficiently distributing data to the end clients.
Those organizations will do well to look at DBOWMAN.
Hardware
Distributed Broadband Optical Wireless Mesh Area Networking (DBOWMAN) is part of a novel hybrid Free Space Optical Networking architecture that is inexpensive supporting symmetrical 10 Megabit, 100 Megabit and 1 Gigabit per second bandwidth over a number of infrared thru the air meshed links over fairly short distances (>1km). Ad hoc links are made along secure, line of sight, redundant optical paths through the air supported by a fibre optic cable backbone pulled through existing public sewer systems. DIDAs above and DAVIDs (Digital Asset Virtual Information Devices) are a key part of this DBOWMAN Node design.