Tag Archives: big data

Where does Big Data come from?

Searching for all green technical images, I found this one.

Where does big data come from?
How are they getting all this data?

 

 

 

 

 

 

 

 

 

Some of the many sources for online data collection going on every second, 24/7, every day:

  1. Surfed several websites gathering info for a presentation – My connectivity provider (Sprint) and Google added more info about me inside their already significant repositories
  2. Paid a few bills online – Credit card companies not only have my original transactions and buying habits but now they know my payment tendencies as well
  3. Downloaded a movie from iTunes for my trip – Apple adds this to the profile I have been populating for almost a decade
  4. Called my wife from my mobile phone to her car phone – Sprint and OnStar just got some more info about usage time and calling locations
  5. Printed boarding passes for a flight – Southwest knows where I have been, where I’m going and my general flight patterns.
  6. Drove to the airport – Sprint knows roughly the path I took from Home to the airport via my mobile phone switching between cell towers.
  7. Browsed email and some apps on my iPad – Apple and who knows what app developers just gathered access and some location data
  8. Landed in Los Angeles, El Paso & Dallas – Phone turned on at every stop and has tracked some level of information on my location and calls.
  9. Got into rented car and drove to meeting site – Hertz captures the location, type of car, duration etc. Plus Sprint knows roughly where I’m at too.
  10. Checked into room and had dinner – The hotel and credit card companies have that covered and logged.
  11. Browsed a few websites, email and writing an article – The hotel network and connectivity provider logged everyone of them, I promise.Read more herehttp://www.tjcrawford.com/2012/02/17/big-data-is-everyday-data/

Big Data Conversion Chart & Functional Relevance

A Petabyte is a lot of data, not a generic bite from your pets

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

My favorite quotes from this post are:

“Data has no inherent value. To be useful, data must flow to agents who will ultimately process, analyze, and synthesize it to produce information that drives decisions. The recent conversation in DoD has focused on what is referred to as the “big data problem,” that is, since we don’t know what’s important in the data being collected, everything must be saved. But this is much harder than it sounds.”

“Data is not important. It’s the information that can be gleaned from the data that matters. The old data paradigm emphasizes precision: save only what you consider to be relevant at the time the data is collected. This approach works only so long as you are dealing with a more or less static context where “relevance” can be readily established.”

Relevance is not an attribute. It’s a relationship, or a complex mapping that has sources, targets, and attribute values on the link. Consider this abstract function:

Relevance = F(source, content, context; me, my role, my situation, my company; the environment and set of competitors and space of potential actions).

Because of these considerations, you can see why relevance is elusive and non-comparable across markets, uses, and situations. THerefore aggregate sums and statistics on relevance are even more problematic.

Source: http://edgefighter.com/2012/01/16/choke-points-in-the-data-supply-chain/