(Image Credit: iStockPhoto/GoodLifeStudio)
Writing on Airbnb’s ‘Nerds’ hub, Riley Newman, head of data science, says: “A datum is a record of an action or event, which in most cases reflects a decision made by a person. If you can recreate the sequence of events leading up to that decision, you can learn from it; it’s an indirect way of the person telling you what they like and don’t like – this property is more attractive than that one, I find these features useful but those… not so much. This sort of feedback can be a goldmine for decisions about community growth, product development and resource prioritization… we translate the customer’s ‘voice’ into a language more suitable for decision-making.”
The insight gained from this feedback enables Airbnb to ensure they concentrate efforts on signing up landlords in popular destinations at peak times, and structure pricing so that the use of their global network of properties is optimized. For example, data is used to determine the appropriate price of a room or apartment, based on a number of variables such as location, time of year, type of accommodation, transport links, etc. Airbnb use an algorithm to help their hosts determine the right price for their offering. This is particularly challenging given the sheer range of accommodation available and when you consider these are real homes, not bog-standard hotel rooms that can be easily rated on a star system. After all, what is desirable in a city apartment (Wi-Fi, good transport links, etc.) may be less important in a quaint cottage (where the guests may prefer peace and romantic decor over Wi-Fi and subway connections).
To help hosts set the price, Airbnb released a machine-learning platform called Aerosolve. The platform analyses images from the host’s photos (listings with photos of cosy bedrooms are more successful than those with stylish living rooms!) and automatically divides cities into micro-neighbourhoods. The platform also incorporates dynamic pricing tips that mimic hotel and airline pricing models.
Airbnb have also just unveiled Airpal: a user-friendly data analysis platform designed to allow all of their employees, not just those trained in data science, access to all of the company’s information, and tools to query it with.
What were the results?
As Newman says: “Measuring the impact of a data science team is ironically difficult, but one signal is that there’s now a unanimous desire to consult data for decisions that need to be made by technical and non-technical people alike.” This is demonstrated in the Airpal system; launched in 2014, Airpal has already been used by more than one-third of Airbnb employees to issue queries. This impressive statistic shows how central data has become to Airbnb’s decision making.
The growth of Airbnb is another indication that their clever use of data is paying off.
What data was used?
Data is primarily internal across a mixture of structured and unstructured formats: image data from host photos, location data, accommodation features (number of rooms/beds, Wi-Fi, hot tub, etc.), customer feedback and ratings, transaction data, etc. Some external data is analysed, too, for example accommodation in Edinburgh during the popular Edinburgh Festival will be priced higher than the same accommodation in a different month.
What are the technical details?
Airbnb hold their approximately 1.5 petabytes of data as Hive managed tables in Hadoop Distributed File System (HDFS) clusters, hosted on Amazon’s Elastic Compute Cloud (EC2) Web service. For querying data, Airbnb used to use Amazon Redshift but they’ve since switched to Facebook’s Presto database. As Presto is open source, this has allowed Airbnb to debug issues early on and share their patches upstream – something they couldn’t do with Redshift.
Going forward, Airbnb are hoping to move to real-time processing as opposed to batch processing, which will improve the detection of anomalies in payments and increase sophistication around matching and personalization.
Any challenges that had to be overcome?
One big challenge for the Airbnb data science team was keeping up with the company’s dramatic growth. Early in 2011, the team consisted of just three data scientists but, as the company was still quite small, the three could still pretty much meet with every individual employee and fulfil their data needs. By the end of the year, Airbnb had 10 international offices and hugely expanded teams, meaning the data team could no longer hope to partner with everyone across the company.
As Newman puts it: “We needed to find a way to democratize our work, broadening from individual interactions, to empowering teams, the company, and even our community.” This was achieved through investing in faster and more reliable technologies to cope with the expanding volume of data. They also moved basic data exploration and queries from data scientists to the teams throughout the company, with the help of dashboards and the Airpal query tool; this empowered Airbnb teams and freed up the data scientists from adhoc requests so they could focus on more impactful work. Educating the teams on how to use these tools has been key to helping them gain insights from the data.
What are the key learning points and takeaways?
Airbnb are a perfect example of a fast-growing company with ever-expanding Big Data needs. The ability to shift and adapt as the company have grown has, I think, been at the heart of their success. This highlights the non-static nature of Big Data and how your data strategy may need to change over time to cope with new demands.
It’s also great to see a data science team so well integrated with all parts of the organization (even if they can no longer meet with every employee!). This not only ensures the data scientists have an excellent understanding of the business’s goals but also emphasizes the importance of data-based decision making for employees right across the company. After all, it doesn’t matter how much data you have if no one acts upon it.
This is an edited extract from Big Data in Practice: How 45 Successful Companies Used Big Data Analytics to Deliver Extraordinary Results by Bernard Marr (published by Wiley)
Did you find anything interesting from this case study of Airbnb’s use of the cloud? Let us know in the comments.