How Trulia is using Hadoop and big data to power one of the largest global real estate sites

(c)iStock.com/cnythzl

In one form or another, businesses have always employed data analysis to inform decision making. But today’s state of the art big data analytics platforms like Hadoop have taken things to a whole new level.

This new level of business functionality, called converged infrastructure, optimises business processes across an entire enterprise by grouping multiple IT components into a single computing package.

For example, converged infrastructure may include components like servers, storage, networking equipment, and software for IT infrastructure management, automation, and orchestration. That’s a lot of moving parts—a lot of data storage and processing power. And data analytics platforms like Hadoop route all that power and data through a central user interface allowing massive amounts of data to be crunched faster, easier, and cheaper. That may sound pretty incredible, but that neat little description falls short of demonstrating how big a deal converged infrastructure really is. So let’s take a look at how the online real estate giant, Trulia, is doing it.

One of the largest online residential real estate marketplaces around, Trulia claims more than 55 million unique site visits per month. Trulia is an online real estate marketplace whose service model centres on providing relevant and unique insights about properties, neighbourhoods, commute times, and school districts. Trulia’s insights are available at every level of the real estate industry including home buyers, sellers, and renters. But Trulia’s core strength is data.

Back in 2012, Trulia unveiled its new service to help real estate agents identify the best leads by tapping a trove of data the company had started analysing the year before, according to Reuters. Trulia Insight, as the service is called, shows real estate agents which potential buyers have pre-qualified for a mortgage and whether they are looking to buy a home in the next six months. And with the help of Hadoop, Trulia updates its website with new insights every night. A recent article on CIO.com explains that “every night, Trulia crunches more than a terabyte of new data and cross-references it with about two petabytes of existing data to deliver the most up-to-data real estate information to its users.”

In that same CIO.com article, Zane Williamson, one of Trulia’s senior DevOps engineers explains how their daily, terabyte data processing includes information from public records, real estate listings, and user activity. “We process this data across multiple Hadoop clusters and use the information to send out email and push notifications to our users,” Zane said. “That’s the lead driver to get users back to the site and interacting. It’s very important that it gets done in a daily fashion. Reliability and uptime for the workflows is essential.”

Two years after Trulia converged its IT infrastructure through the Hadoop platform, the company was acquired by its largest competitor, Zillow, for $3.5 billion. By 2015 Q4, Zillow announced that they had finished integrating the Trulia platform. In the wake of their merger with Trulia, Geekwire.com reported Zillow’s 2015 earnings increased seven cents per share, after analysts had projected a three cent loss.

Now Trulia and Zillow are moving forward with the same Hadoop-based data analytics platform, and according to Trulia’s vice president of data engineering, Deep Varma, they’re moving toward providing real-time data to users while enhancing the emotional aspects of their user experience. At the same time, according to siliconangle.com, Trulia and Zillow are working toward leveraging California’s Open Data Movement to further enhance their real estate insights with crime scores and public transit systems.