Pages

Nature of Data

What is Data?

The quantities, characters, or symbols on which operations are performed by a computer, which may be stored and transmitted in the form of electrical signals and recorded on magnetic, optical, or mechanical recording media.

What is Big Data?

Big Data is a collection of data that is huge in volume, yet growing exponentially with time. It is a data with so large size and complexity that none of traditional data management tools can store it or process it efficiently. Big data is also a data but with huge size.

Following are some of the Big Data examples-

  • The New York Stock Exchange is an example of Big Data that generates about one terabyte of new trade data per day.
  • Social Media: The statistic shows that 500+terabytes of new data get ingested into the databases of social media site Facebook, every day. This data is mainly generated in terms of photo and video uploads, message exchanges, putting comments etc.
  • A single Jet engine can generate 10+terabytes of data in 30 minutes of flight time. With many thousand flights per day, generation of data reaches up to many Petabytes.

Types Of Big Data

  • Structured: Any data that can be stored, accessed and processed in the form of fixed format is termed as a ‘structured’ data. Over the period of time, talent in computer science has achieved greater success in developing techniques for working with such kind of data (where the format is well known in advance) and also deriving value out of it. However, nowadays, we are foreseeing issues when a size of such data grows to a huge extent, typical sizes are being in the rage of multiple zettabytes. Data stored in a relational database management system is one example of a ‘structured’ data. Examples Of Structured Data, An ‘Employee’ table in a database is an example of Structured Data
  • Unstructured: Any data with unknown form or the structure is classified as unstructured data. In addition to the size being huge, un-structured data poses multiple challenges in terms of its processing for deriving value out of it. A typical example of unstructured data is a heterogeneous data source containing a combination of simple text files, images, videos etc. Now day organizations have wealth of data available with them but unfortunately, they don’t know how to derive value out of it since this data is in its raw form or unstructured format. Examples Of Un-structured Data, The output returned by ‘Google Search’
  • Semi-structured: Semi-structured data can contain both the forms of data. We can see semi-structured data as a structured in form but it is actually not defined with e.g. a table definition in relational DBMS. Example of semi-structured data is a data represented in an XML file.
Types of Data based on some more factors,
  • Data with reference to time factor: Based on time factor, data can be classified into the following two types:
    • Time-independent data - The term refers to the data, which can be measured repeatedly, e.g., data in geosciences and astronomy such as geological structures, rocks, fixed stars, etc.
    • Time-dependent data - These can be measured only once, e.g., certain geophysical or cosmological phenomena like volcanic eruptions and solar flares. Likewise, data pertaining to rare fossils are time dependent data.
  • Data with reference to location factors: Data with reference to location factor can be categorized as follows:
    • Location-independent data - These are independent of the location of objects measured, e.g., data on pure physics and chemistry.
    • Location-dependent data - These are dependent on the location of objects measured. Data in earth sciences and astronomy normally belong to this category. Data on rocks are also location dependent.
  • Data with reference to mode of generation: There are three types of data under this category. These are:
    • Primary data - Data are primary when obtained by experiment or observation designed for the measurement, e.g., values of velocity derived by measuring length and time.
    • Derived (reformatted) data - These data are derived by combining several primary data with the aid of a theoretical model.
    • Theoretical (predicted) data - These are derived by theoretical calculations. Basic data such as fundamental constants are used in theoretical calculations, e.g., data concerning solar eclipses are predicted with the use of celestial mechanics.
  • Data with reference to nature of quantitative values: These are categorised into the following two classes:
    • Determinable data - Data on a quantity, which can be assumed to take a definite value under a given condition, are known as determinable data. Time-dependent data are usually determinable data, if the given condition is understood to include the specification of time.
    • Stochastic data - Data relating to a quantity, which take fluctuating values from one sample to another, from one measurement to another,. under a given condition are referred to as stochastic. In geosciences most data are stochastic.
  • Data with reference to terms of expression: The categorization in this case yields three classes of data:
    • Quantitative data - These are measures of quantities expressed in terms of well-defined units, changing the magnitude of a quality to a numerical value. Most data in physical sciences are quantitative data.
    • Semi-quantitative data - These data consist of affirmative or negative answers to posed questions concerning different characteristics of the objects involved, e.g., in biology, classification of organisms is based upon a set of 'Yes' and 'No' responses to questions concerning morphological, biochemical and other characteristics of species. Such data are regarded as semi-quantitative. 'Yes' and 'No' can be coded as '1' and '0' (zero) for obtaining numerical data.
    • Qualitative data - The data expressed in terms of definitive statements concerning scientific objects are qualitative in nature. Qualitative data in this sense are almost equivalent to established knowledge.
  • Data with reference to mode of presentation: These are categorized as numerical, graphic and symbolic data.
    • Numerical data - These data are presented in numerical values, e.g., most quantitative data fall in this category.
    • Graphic data - Here data are presented in graphic form or as models. In some cases, graphs are constructed for the sake of helping users grasp a mass of data by visual perception. Charts and maps also belong to this category.
    • Symbolic data - These are presented in symbolic form, e.g., symbolic presentation of weather data. These are, six basic types of scientific data based on the nature of data. Within these six types, there exist fifteen different classes of data.

No comments:

Post a Comment