Data is the New Oil

The examples from the last section highlights a critical point about automation that is worth repeating: it requires boatloads of data. A company like Google can successfully use ML because they have access to massive datasets about what their users are looking for when they use their search engine.

Data is behind the success of many other large companies as well. Here are just a few examples:

  • Netflix uses a massive trove of user data and ML to determine what movies and TV shows to recommend to its users.

  • Amazon utilizes its large datasets about buying behavior to figure out how to entice people to add more items to their shopping carts.

  • Facebook and Instagram deliver highly targeted ads based on each user's individual "liking" behavior.

These companies all rely on the fact that they have access to tons of data in order to make use of ML. Competing with these businesses is very difficult because a competitor is going to have a very, very hard time getting anywhere near the same amount of data.

It's reached a point where the underlying algorithms are no longer the most important component of these technologies. They're still proprietary, of course, but they'd be useless in the hands of a company that didn't have data to use them on.

For example, Google could give you their entire ML system for optimizing search and it would be largely useless to you. Sure, you'd have a great piece of technology, but without their data you could never turn on the ML features and have a chance at competing with them. This is a tremendous advantage for them, because anyone who wants to start a competing search engine has to contend with the fact that Google's been gathering data at scale for years.

We're already starting to see this trend in action. Google and other tech companies have released as open source some of their software for running their ML machines. They clearly understand that their advantages no longer lie in technology alone — data is far more valuable than code.

The general idea is this: if a company has access to a large body of data, they can use ML to spot patterns, make improvements to their businesses and solidify their dominance over competitors. It's a feedback loop that's profitable for whoever has the most data, and it's what is driving the formation of tech monopolies.

This is why some people refer to data as "the new oil." Just as combustion engines are useless without oil, the most valuable (aka ML-driven) technology is useless without data. Just as Standard Oil reigned supreme over the American economy because of its monopoly control over the nation's oil supply, big tech companies are using data and machine learning to take over entire markets.

But with the rise of widespread information about how to build ML systems, the availability of cheap processing power and an abundance of programming talent, ML is no longer exclusively the domain of tech giants. They have visibility because they are so large, but many smaller companies are using this new wave of technology to automate markets that are currently being ignored by the giants.

It's these less visible automation efforts that the average person should be more concerned about.

results matching ""

    No results matching ""