Thank You for Being Late
Okay, it wasn’t that simple. He wanted to talk about “the connected cow.”
The story Sirosh tells goes like this: Dairy farmers in Japan approached the Japanese computer giant Fujitsu with a question. Could they improve the odds for successfully breeding cows in large dairy farms? It turns out that cows go into heat, or estrus—their period of sexual receptivity and fertility when they can be successfully artificially inseminated—only for a very short window: twelve to eighteen hours roughly every twenty-one days, and often primarily at night. This can make it enormously difficult for a small farmer with a large herd to monitor all his cows and identify the ideal time to artificially inseminate each one. If this can be done well, dairy farmers can ensure uninterrupted milk production from each cow throughout the year, maximizing the per capita output of the farm.
The solution Fujitsu came up with, explained Sirosh, was to fit the cows with pedometers connected by radio signal to the farm. The data was transmitted to a machine-learning software system called GYUHO SaaS running on Microsoft Azure, the Microsoft cloud. Fujitsu’s research had established that a big increase in the number of steps per hour was a 95 percent accurate signal for the onset of estrus in dairy cows. When the GYUHO system detected a cow in heat, it would send a text alert to the farmers on their mobile phones, enabling them to administer artificial insemination at exactly the right times.
“It turns out that there is a simple secret of when the cow is in heat—the number of steps she takes picks up,” said Sirosh. “That is when AI [artificial intelligence] meets AI [artificial insemination].” Having this system at their fingertips made the farmers more productive not only in expanding their herds—“you get a huge improvement in conception rates,” said Sirosh—but also in saving time: it liberated them from having to rely on their own eyes, instincts, expensive farm labor, or the Farmers’ Almanac to identify cows in heat. They could use the labor savings for other productive endeavors.
All the data being generated from the cows’ sensors revealed another, even more important insight, said Sirosh: Fujitsu researchers found that within the sixteen-hour ideal window for artificial insemination, if you performed that function in the first four hours, there was a “seventy percent probability you got a female calf, and if you did it in the second four hours there was a higher probability that you got a male.” So this could enable a farmer “to shape the mix of cows and bulls in his herd according to his needs.”
The data just kept spitting out more insights, said Sirosh. By studying the pattern of footsteps, the farmers were able to gain early detection of eight different cow diseases, enabling early treatment and improving the overall health and longevity of the herd. “A little ingenuity can transform even the oldest of industries like farming,” concluded Sirosh.
If a cow with a sensor makes a dairy farmer into a genius, a locomotive enabled with sensors is no longer a dumb train but an IT system on wheels. It can suddenly sense and broadcast the quality of the tracks every one hundred feet. It can sense the slope and how much energy it needs to go over each mile of terrain, putting on the gas a little less when it goes downhill, and generally maximizing fuel efficiency or velocity to get from point A to point B. And now all GE locomotives are being equipped with cameras to better monitor how the engineers are operating the engines at every curve. GE now also knows that if you have to run your engine at 120 percent on a hot day, certain parts will need to have their predictive maintenance moved up.
“We are constantly enriching and training our nervous system, and everyone benefits from the data,” said Ruh. But it’s not only the learning you can do with sensors and software; it’s also the transforming you can do with sensors and software together. Today, explained Ruh, “we no longer need to build physical changes into every product to improve their performance, we just do it with software. I take a dumb locomotive and throw sensors and software into it, and suddenly I can do predictive maintenance, I can make it operate up and down the tracks at the optimal speeds to save gasoline, I schedule all the trains more efficiently and even park them more efficiently.” Suddenly a dumb locomotive gets faster, cheaper, and smarter—without replacing a screw, a bolt, or an engine. “I can use sensor data and software to make the machine act more efficiently as though we [manufactured] a whole new generation,” added Ruh.
In a plant, he added, “you can get tunnel vision into the job you are doing. But what if the machine is watching out for you, thanks to the fact that we will have a camera on everything—everything will have eyes and ears? We talk about the five senses. What people don’t realize yet is that I am going to give the five senses to machines to interact with humans in the same way we interact with colleagues today.”
And there’s money in them thar hills—lots of it, explained GE’s CEO, Jeff Immelt, in an interview with McKinsey & Company in October 2015:
Every CEO of a railroad could tell you their [fleet] velocity. The velocity tends to be, let’s say, between twenty and twenty-five miles per hour. This tends to be the average miles per hour that a locomotive travels in a day—twenty-two. Doesn’t seem very good. And the difference between twenty-three and twenty-two for, let’s say, Norfolk Southern, is worth two hundred fifty million dollars in annual profit. That’s huge for a company like that. That’s one mile [per hour]. So that’s all about scheduling better. It’s all about less downtime. It’s all about not having broken wheels, being able to get through Chicago faster. That’s all analytics.
With every passing day, explained John Donovan, AT&T’s chief strategy officer, we are turning more and more “digital exhaust into digital fuel” and generating and applying the insights faster and faster. The American department store owner John Wanamaker was an early twentieth-century pioneer in both retailing and advertising. He once famously observed: “Half the money I spend on advertising is wasted; the trouble is I don’t know which half.” That needn’t be the case today.
Latanya Sweeney, the then chief technology officer for the Federal Trade Commission, explained on National Public Radio on June 16, 2014, how sensing and software are transforming retail: “What a lot of people may not realize is that, in order for your phone to make a connection on the Internet, it’s constantly sending out a unique number that’s embedded in that phone, called the MAC address, to say, ‘Hey, any Wi-Fis out there?’ … And by using these constant probe requests by the phone looking for Wi-Fis, you could actually track where that phone has been, how often that phone comes there, down to a few feet.” Retailers now use this information to see what displays you lingered over in their stores and which ones tempted you to make a purchase, leading them to adjust displays regularly during the day. But that’s not the half of it—big data now allows retailers to track who drove by which billboard and then shopped in one of their stores.
As The Boston Globe reported on May 19, 2016:
Now the nation’s largest billboard company, Clear Channel Outdoor Inc., is bringing customized pop-up ads to the interstate. Its Radar program, up and running in Boston and 10 other US cities, uses data AT&T Inc. collects on 130 million cellular subscribers, and from two other companies, PlaceIQ Inc. and Placed Inc., which use phone apps to track the comings and goings of millions more.
Clear Channel knows what kinds of people are driving past one of their billboards at 6:30 p.m. on a Friday—how many are Dunkin’ Donuts regulars, for example, or have been to three Red Sox games so far this year.
It can then precisely target ads to them.
Sorry, Mr. Wanamaker. You lived in the wrong era. Guessing is so twentieth century. Guessing is officially over.
But so might be privacy. When you think of all the data that is being vacuumed up by giant firms—Facebook, Google, Amazon, Apple, Alibaba, Tencent, Microsoft, IBM, Netflix, Salesforce, General Electric, Cisco, and all the telephone companies—and how efficiently they can now mine that data for insights, you have to wonder how anyone will be able to compete with them. No one else will have that much digital ex
haust as raw material to analyze and fuel better and better predictions. And digital exhaust is now power. We need to keep a close eye on the monopoly power that big data can create for big companies. It is not just how they can dominate a market with their products now, but how they can reinforce that domination with all the data they can collect.
Storage/Memory
As we’ve seen, sensors hold great power. But all those sensors gathering all that data would have been useless without parallel breakthroughs in storage. These breakthroughs have given us chips that can store more data and software that can virtually interconnect millions of computers and make them store and process data as if they were a single desktop.
Just how big did that storage have to get and how sophisticated did the software have to become? Consider this May 11, 2014, talk by Randy Stashick, the then president of engineering at UPS, who spoke at the Production and Operations Management Society Conference on the importance of big data. He began by showing a number 199 digits long.
“Any idea what that number represents?” he asked the audience.
“Let me tell you a couple of things it does not represent,” Stashick continued.
It’s not the number of hot dogs the famous Varsity restaurant, just up the street from us, has sold since opening in 1928. Nor is it the number of cars on Atlanta’s infamous interstates at five o’clock on a Friday afternoon. Actually, that number, 199 digits in all, represents the number of discrete routes a UPS driver could conceivably take while making an average of one hundred twenty daily stops. Now, if you really want to get crazy, take that number and multiply it by fifty-five thousand. That’s the number of U.S. routes our drivers are covering each business day. To display that number, we’d probably need that high-definition screen at AT&T Stadium in Dallas, where the Cowboys play. But somehow UPS drivers find their way to more than nine million customers every day, to deliver nearly seventeen million packages filled with everything from a new iPad for a high school graduate in Des Moines, to insulin for a diabetic in Denver, to two giant pandas relocating from Beijing to the Atlanta Zoo. How do they do it? The answer is operations research.
More than two hundred sensors in the vehicle tell us if the driver is wearing a seat belt, how fast the vehicle is traveling, when the brakes are applied, if the bulkhead door is open, if the package car is going forward or backing up, the name of the street it’s traveling on, even how much time the vehicle has spent idling versus its time in motion. Unfortunately, we don’t know if the dog sitting innocently by the front door is going to bite.
To work through a number of routing options that is 199 digits long and also take into account data fed from two hundred sensors in each UPS truck requires a lot of storage, computing, and software capacity—more than anything available, even imaginable, to the average company as recently as fifteen years ago. Now it is available to any company. And therein lies a really important story about how a combination of storage chips hitting the second half of the chessboard and a software breakthrough named after a toy elephant put the “big” into “big data” analytics.
Microchips, as we have noted, are simply collections of more and more transistors. You can program those transistors for computation or for transmission or for memory. Memory chips come in two basic forms—DRAM, or dynamic random access memory, which does the temporary shoving of bits of data around as they are being processed, or “flash” memory, which permanently stores data when you press “save.” Moore’s law applies also to memory chips—we have been steadily packing more transistors storing more bits of memory on each chip for less money and using less energy. Today’s average cell phone camera might have a sixteen-gigabyte memory, meaning it is storing sixteen billion bytes of information (a byte is eight bits) on a flash memory chip. Ten years ago flash memory density was not advanced enough to store a single photo on a phone—that is how fast all of this has accelerated, thereby making so many other things faster.
“Big data would not be here without Moore’s law,” said Intel’s senior fellow Mark Bohr. “It gave us the bigger memory, more intensive computing, and the power, efficiency, and reliability that large server farms require to handle all that processing power. If those servers were made out of vacuum tubes, it would take one Hoover Dam to operate just one server farm.”
But it wasn’t just hardware that put the “big” in big data. It was also a software innovation—perhaps the most important to emerge in the last decade that you’ve never heard about. That software allowed millions of computers strung together to act like one computer, and it also made all that data searchable down to the level of finding those needles in the haystack. It was made by a company whose founder named it Hadoop—after his two-year-old son’s favorite toy elephant, so that the name would be easy to remember. Remember that name: Hadoop. It has helped to change the world—but with a huge assist from Google.
The father of that little boy and the founder of Hadoop is Doug Cutting, who describes himself as a “catalyst” for software innovation. Cutting grew up in rural Napa County in California—and had not seen a computer until he entered Stanford in 1981, a school he had to borrow money to attend. There, he studied linguistics but also took courses in computer science, learned how to program, “and found it fun.” He also found that programming would be the best way to pay off his student loans. So instead of going to graduate school, he got a job at the legendary Xerox PARC research center, where he was directed to join the linguistics team working on artificial intelligence and a relatively new field at the time called “search.”
People forget that “search” as a field of inquiry existed before Google. Xerox had missed the personal computer business market, even though it had many great tech ideas, said Cutting, so the company was “trying to figure out how to transition from copy paper and toner to the digital world. It came up with the idea that copiers would replace filing cabinets. You would just scan everything and then search it. Xerox had this paper-oriented view of the world. It was the classic example of a company that could not move away from its cash cow—paper was its lifeblood—and it was trying to figure out how to move paper into the digital world. That was its rationale for looking into search. This is before the Web happened.”
When the Web emerged, companies, led by Yahoo, started to organize it for consumers. Yahoo began as a directory of directories. Anytime someone put up a new website, Yahoo would add it to its directory, and then it started breaking websites down into groups—finance, news, sports, business, entertainment, et cetera. “And then search came along,” said Cutting, “and Web search engines, like AltaVista, started cropping up. It had cataloged twenty million Web pages. That was a lot—and for a while it leapfrogged everyone. That was happening around 1995 to ’96. Google showed up shortly thereafter [in 1997] with a small search engine, but claiming much better methods. And gradually it proved itself.”
As Google took off, Cutting explained, he wrote an open-source search program in his spare time to compete with Google’s proprietary system. The program was called Lucene. A few years later he and some colleagues started Nutch, which was the first big open-source Web search engine competitor to Google.
Open source is a model for developing software where anyone in the community can contribute to its ongoing improvement and freely use the collective product, usually under license, as long as they share their improvements with the wider community. It takes advantage of the commons and the notion that all of us are smarter than one of us; if everyone works on a program or product and then shares their improvements, that product will get smarter faster and then drive more change even faster.
Cutting’s desire to create an open-source search program had to overcome a very basic problem: “When you have one computer—and you can store as much data on that computer as its hard drive can hold and you can process data as far and fast as the processor in that computer can process—that naturally limits the size and rate of the computation you can perform,” Cutting explained.
/>
But with the emergence of Yahoo and AOL, billions and billions of bits and bytes of data were piling up on the Web, requiring steadily increasing amounts of storage and computation power to navigate them. So people just started combining computers. If you could combine two computers, you could store twice as much and process twice as fast. With computer memory drives and processors getting cheaper, thanks to Moore’s law, businesses started realizing that they could create football-field-sized buildings stocked with processors and drives from floor to ceiling, known as server farms.
But what was missing, said Cutting, was the ability to hook those drives and processors together so they could all work in a coordinated manner to store lots of data and also run computations across the whole body of that data, with all the processors running together in parallel. The really hard part was reliability. If you have one computer, it might crash once a week, but if you had one thousand it would happen one thousand times more often. So, for all of this to work, you needed a software program that could run the computers together seamlessly and another program to make the giant ocean of data that was created searchable for patterns and insights. Engineers in Silicon Valley like to wryly refer to a problem like this as a SMOP—as in, “We had all the hardware we needed—there was just this Small Matter Of Programming [SMOP] we had to overcome.”
We can all thank Google for coming up with both of those programs in order to scale its search business. Google’s true genius, said Cutting, was “to describe a storage system that made one thousand drives look like one drive, so if any single one failed you didn’t notice,” along with a software package for processing all that data they were storing in order to make it useful. Google had to develop these itself, because at the time there was no commercial technology capable of addressing its ambitions to store, process, and search all the world’s information. In other words, Google had to innovate in order to build the search engine it felt the world wanted. But it used these programs exclusively to operate its own business and did not license them for anyone else.