Big Data

Big Data And Big Payoff

Big data is a broad term for structured and unstructured data sets so massive or complex that they are beyond the capabilities of traditional data management/processing technologies. The immensity of data presents many challenges and questions. How do you capture, analyze, search, sort, share, visualize, store, and transfer the data ? How long should you keep the data?

Some examples of real-world big data applications:

Retailers such as Target, Walmart analyze customer info and transaction data for marketing, promotion and inventory management purposes.
Manufacturers collect mahhine operation data for diagnostics and maintenance purposes
eCommerce web site analyze customer shopping behavior to make recommendation to online shoppers.

Quadbase Systems supports big data platform, Hadoop with Hive and Spark; and NOSQL DBMS MongoDB. You can now use Quadbase’s reporting tools to visualize, report and analyze virtually unlimited volume of data with ease and high performance and superb scalability.

Hadoop platform

Apache Hadoop is an open-source software framework that enables the distributed processing of large data sets across clusters of commodity servers. It provides massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs, with a very high degree of fault tolerance and it is highly scalable.

Hive is the original SQL-on-Hadoop solution, which tries to emulate the behavior, syntax, and interface(s) of MySQL, including a command-line client. It also includes a Java API and JDBC drivers for those with an existing investment in Java applications that do MySQL-style querying. Despite its relative simplicity and ease of use, Hive is slow and read-only.

Apache Spark is an open-source cluster computing framework. It is a fast and general engine for large-scale data processing. In contrast to Hadoop's two-stage disk-based MapReduce paradigm, Spark holds intermediate results in memory. Its in-memory primitives provide performance up to 100 times faster for certain applications. Spark runs on Hadoop, Mesos, standalone, or in the cloud. It can access diverse data sources including HDFS, Cassandra, HBase, and S3.

It is easy to setup connections to Hadoop data source. Configuration including Hive and Spark is required. A JDBC drive that supports this configuration is bundled with the products. Choose Hadoop data source in the data registry and specify the driver and URL with the following format:

JDBC Driver class name: com.cloudera.hive.jdbc4.HS2Driver
URL format: jdbc:hive2://:/

Once this is set up, the user will be able to make queries and build reports/charts, dashboards seamlessly in the same way as using any supported database sources. A dashboard example accessing data from Hadoop here.

MongoDB
MongoDB is an open-source document database, and the leading NoSQL database. It provides high performance, high availability, and easy scalability.

Our reporting tools support MongoDB datasource by using UnityJDBC as MongoDB JDBC Driver. SQL support includes functions, expressions, aggregation, and joins.

We include the trial version of UnityJDBC, which has no expiration date and is fully functioning except that it is limited to returning up to 100 results. If your query produces more than 100 results, upgrade your UnityJDBC license for access to the complete result set.

1. JDBC Driver class name: mongodb.jdbc.MongoDriver
2. URL format: jdbc:mongo:///

URL example: jdbc:mongo://ds029847.mongolab.com:29847/tpch
tpch is a MongoDB instance provided at MongoLab, username/password is dbuser/dbuser.

A dashboard example accessing data from MongoDB can be found here.

This website uses cookies

Big Data And Big Payoff