Zillow (Zestimate): Data Science in Real Estate with AI and Analytics (#234)
Articles,  Blog

Zillow (Zestimate): Data Science in Real Estate with AI and Analytics (#234)


Welcome to Episode #234 of CxOTalk. I’m Michael Krigsman, and CxOTalk brings to
you truly the most innovative people in the world, talking about topics relating to digital
disruption, and machine learning, and all kinds of good stuff. Before we begin, I want to say a hearty “Thank
you” to our live streaming video platform, which is Livestream. Those guys are great! And if you go to Livestream.com/CxOTalk, they
will even give you a discount. So, today, we are speaking with somebody who
is a pioneer in the use of data and analytics in consumer real estate. And we’re speaking with Stan Humphries, who
is the Chief Analytics Officer, and also the Chief Economist of the Zillow Group. And I think that everybody knows the Zillow
Group as Zillow and the Z-Estimate. Stan Humphries, how are you? Hey, Michael! How are you doing? It’s good to be with you today! I am great! So, Stan, thanks for taking some time with
us, and please, tell us about the Zillow Group and what does a Chief Analytics Officer and
Chief Economist do? Yeah! You bet! So, I’ve been with Zillow since the very
beginning back in 2005, when what became Zillow was just a glimmer in our eye. Back then, I worked a lot on just algorithms,
and some part development pieces; kind of a lot of the data pieces within the organization. We launched Zillow in February of 2006, and
back then, I think people familiar with Zillow now may not remember that between our first
couple of years between 2006 and 2008, all you could find on Zillow was really all the
public record information about homes and displayed on a map. And then, a Zestimate, which is an estimated
the home value of every single home, and then a bunch of housing indices to help people
understand what was happening to prices in their local markets. But, we really grew the portfolio of offerings
to help consumers from there and added in ultimately For Sale listings, mortgage listings,
a mortgage marketplace, a home improvement marketplace, and then, along the way, also
brought in other brands. So now, Zillow Group includes not only Zillow
brand itself, Zillow.com but also Trulia, as well as StreetEasy in New York, Naked Apartments,
which is a rental website in New York, HotPads, and a few other brands as well. So it’s really kind of grown over the years
and last month, all those brands combined got about 171 million unique users to them
online. So, it’s been a lot of fun kind of seeing
it evolve over the years. So, Stan, […] you started with the Zestimate. You started aggregating data together, and
then you came up with the Zestimate. What was the genesis of that Zestimate and
maybe you can explain what that is? Yeah. Sure! So, we were, in the early days, we were looking
at different concepts that seemed like there was a lot of interest in from consumers about
real estate and, I think there was a lot of angst about really what we think as an economist,
we think of as information asymmetry. So, the fact that certain participants in
the marketplace of real estate have a lot of information, and other people don’t have
any information. And, we felt, I think, in a lot of the leadership
team that found that Zillow … You know, our reference point was, you know, we were
very passionate about more of this social progress of transparency in various marketplaces,
which you had seen in the 80s and 90s in stock markets. But we had been part of, actually prior to
Zillow at Expedia, about eliminating information asymmetries in the travel agency space. You had seen it in insurance and a lot of
different sectors. We were very interested in kind of creating
information transparency in the real estate sector, so that got us very interested in
where was the information people wanted, and how could we get it; and how could we make
it available for free to consumers? And once we had done that, a lot of that information
is squirreled away and county tax assessors and county recorder offices around the country
… And how our country’s organized is those tend to be more than 3,000 different counties
around the country, and each office has a different format of the file, and it became
our job to go to all those different places and get all that data and put it online in
a standardized way. And, you asked about the Zestimate. The way that came about was once we had done
that and we bring people in in the early days, and we’d show them a UI of what we were
trying to do. We showed them these maps of recently sold
homes, and then you could click on any house and see the public facts, and when it was
last sold. We noticed that people had what we thought
was really a really focused interest on recently-sold homes, and they would jot them down on napkins
when we brought them into the offices to look at the user interface for focus groups. And we were like, “What are you doing there?” It became clear that they were very interested
in looking at recently sold homes in order to understand the value of a home they might
be looking to either buy or sell in the future. And that was kind of an a-ha moment where
we were like, “Wow! Okay, if you’re trying to figure out an estimated
price for a home, then maybe we can help you do that better than just napkin math.” So that was the genesis of the Zestimate and
today, we do a whole lot more than napkin math. It is a very substantially computationally
and processed reassessment. How has the Zestimate changed since you began
it? Yeah. So, back in, if you look at when we first
rolled out in 2006, the Zestimate was a valuation that we placed on every single home that we
had in our database at that time, which was 43 million homes. And, in order to create that valuation in
43 million homes, it ran about once a month and we pushed a couple terabytes of data through
about 34 thousand statistical models, which we thought was, and was, compared to what
had been done previously, was an enormously more computationally sophisticated process. But if you flash forward to today; well actually
I should just give you a context of what our accuracy was back then. Back in 2006 when we launched, we were at
about 14% median absolute percent error on 43 million homes. So what we’ve done since, is we’ve gone from
43 million homes to 110 million homes today where we put valuations on all 110 million
homes. And, we’ve driven our accuracy down to about
5% today which, we think, from a machine learning perspective, is actually quite impressive
because those 43 million homes that we started with in 2006 tended to be in the largest metropolitan
areas where there was a lot of transactional velocity. There were a lot of sales and price signals
with which to train the models. What’s in the rest of, as we went from 43
million to 110, you’re now getting out into places like Idaho and Arkansas where there
are just fewer sales to look at. And, it would have been impressive if we had
kept our error rate at 14% while getting out to places that are harder to estimate. But, not only did we more than double our
coverage from 43 to 110 million homes but we also almost tripled our accuracy rate from
14% down to 5%. Now, the hidden story of how we’re able
to achieve that was basically by throwing enormously more data, collecting more data,
and getting a lot more sophisticated algorithmically in what we are doing, which requires us to
use more computers. Just to give a context, I said that back when
we launched, we built 34 thousand statistical models every single month. Today, we update the Zestimate every single
night and in order to do that, we generate somewhere between 7 and 11 million statistical
models every single night, and then when we’re done with that process, we throw them away,
and we repeat again the next night. So, it’s a big data problem. How did your, shall we say, algorithmic thinking,
change and become more sophisticated from the time you began … What was the evolution
of that? That must be very interesting. Yeah. It certainly has been. There have been, you know, there have been
a few major changes to the algorithm. We launched in 2006. We did a major change to the algorithm in
2008. Another major change in 2011, and we are now
rolling out another major change right now. It started in December and we’ll be fully
deployed with that new algorithm in June. Now that’s not to say every single day in
between those major releases; we’re doing work and changing bits and pieces of the framework. Those times I described there is kind of major
changes to the overall modeling approach. And what has changed is, as probably suggested
by the fact of how many statistical and machine learning models are being generated right
now in the process, what has changed a lot is the granularity with which these models
are being run; meaning, a lot finer geographic granularity and, also, the number of models
that are being generated. So right now, when we launched, we were generally
looking at a county and in some cases for very sparse data, maybe a state, in order
to generate a model. And, there were, like I said, 34 thousand
of those different models. Today, we are generally looking at … We
never go above a county level for the modeling system, and large counties, with a lot of
transactions, we break that down into smaller regions within the county where the algorithms
try to find homogenous sets of homes in the sub-county level in order to train a modeling
framework. And that modeling framework itself contains
an enormous amount of models, where there are models … Basically, the framework incorporates
a bunch of different ways to thin, about values of homes combined with statistical classifiers. So maybe it’s a decision tree, thinking
about it from what you may call a “hedonic” or housing characteristics approach, or maybe
it’s a support vector machine looking at prior sale prices. The combination of the valuation approach
and the classifier together create a model, and there are a bunch of these models generated
at that sub-county geography. And then there are a bunch of models which
become meta-models, which their job is to put together these sub-models into a final
consensus opinion, which is the Zestimate. This is very interesting and I want to remind
people that we’re talking with Stan Humphries, who is the Chief Analytics Officer and also
the Chief Economist at the Zillow Group. And I think most people probably know the
Zestimate that automatically estimates a value for any piece of real estate. Stan, so you’ve been talking about your
use of data and the development of these models. But, real estate has been a data-intensive
business, right? The analyst shares real estate data, but it’s
static data. And so, again, what were you doing, and how
did this change the nature of the real estate market? So if you can go from the technology into
the disruptive business dimension? Sure. Yeah. But you know, I think you’re right Michael,
in the sense that there’s always been a lot of data floating around real estate. I would say, though, that a lot of that data
had been largely impacted, and so it kind of had a lot of unrealized potential. And that’s a space that, as a data person,
you love to find. And, honestly, travel, which a lot of us were
in before was a similar space, which is dripping with data, and a lot of people had not done
very much with that data, and it just meant that really a day wouldn’t go by where you
wouldn’t come up with “Holy crap! Let’s do this with the data!” And, you know, real estate was one where we
certainly had multiple listing services had arisen … But the very purpose of facilitating
the exchange of real estate between unrelated brokers and, which was a very important purpose,
but it was a system… There were multiple listing services which
were between different agents and brokers on the real estate side. There were homes that were for sale. You had, though, a public record system which
was completely independent of that, and actually two public records systems: one about deeds
and liens on real property, and then another which was tax roll. And, all of that was kind of disparate information
and … What we were trying to solve was the fact that all of this was offline, and we
really just had the sense that it was like, from a consumer’s perspective, like the
Wizard of Oz, where it was all behind this curtain, and you couldn’t really look…You
weren’t allowed behind the curtain and you really just wanted to know, “Well, I’d
really like to see all the sales myself and figure out what’s going on.” And, you’d like the website to show you
both the core sale listings and the core rent listings. But of course, the people who were selling
you the homes didn’t want you see the rentals alongside them because maybe they would like
you to buy a home not rent a home. And we’re like, “We should put everything
together, everything in line,” and we had a faith that type of transparency was going
to benefit the consumer and I think it has where …
You know, what’s been interesting in the solution is that you still find the agency representation
as very important, and I think the reason that’s been true is that it’s a very expensive
transaction. It will be generally for most Americans, the
most expensive transaction and the most expensive financial asset they will ever own. And so, there has been and continues to be,
a reliance, I think, a reasonable reliance on an agent to help hold their hand for a
consumer as they either buy or sell real estate. But what has changed is that now consumers
have access to the same information that the representation has either on the buy or sell
side. And I think that has really enriched the dialogue
and facilitated the agents and brokers who are helping the people, where now a consumer
comes to the agent with a lot more awareness and knowledge, and is a smarter consumer,
and is really working with the agent as a partner where they’ve got a lot of data and
the agent has a lot of insight and experience; and together, we think they make better decisions
than they did before. I want to tell everybody that there’s a
problem with Twitter at the moment, and so if you’re trying to tweet about the show
and your tweet is not going through, try doing it a second time and sometimes that seems
to be making it work. I am so glad to hear that you said it, Michael,
because I just tried to retweet right before I got on and I couldn’t do it and I thought
it was my Twitter app. Sounds like it’s Twitter overall. Yes, it seems like we’re back to the days
of Twitter having some technical issues. Anyway, Stan, in a way, by the act of trying
to increase this transparency across the broad real estate market, you need to be a, shall
we say, a neutral observer. And so, how do you ensure that in your models,
you’re as free from bias as you can be? And maybe would you also mind explaining the
issue of bias a little bit just briefly? I mean, we could spend an hour on this, but
briefly. So, what is the bias issue in machine learning
that you have to face, and how do you address it in your situation? Okay. Yeah. May I ask you for a few more sentences on
the bias issue and machine learning? Because as a data person, I’m thinking about
it from a statistical sense, but I guess that’s probably not how you mean it. In terms of the business model itself, and
how we think and how that interaction with machine learning and what we’re trying to
do, we are … Our North Start for all of our brands is the consumer, you know, full-stop. So, we want to surprise and delight and best
service our consumers, because we think that by doing that, that, then… You know, advertising dollars follow consumers,
is our belief. And we want to help consumers the best we
can. And, what we’re trying to construct and have
constructed is, in an economic language, is a two-sided marketplace where we’ve got consumers
coming in who want to access inventory and get in touch with professionals. And then on the other side of that marketplace,
we’ve got professionals, be it real estate brokers or agents, mortgage lenders, or home
improvers, who want to help those consumers do things. And what we’re trying to do is provide a marketplace
where consumers can find inventory and can find professionals to help them get things
done. So, from the perspective of a market-maker
versus a market-participant, you want to be completely neutral and unbiased in that, where
you’re not trying to … All you’re trying to do is get a consumer the right professional
and vice-versa, and that’s very important to us. And that means that when it comes to machine
learning applications, for example, the valuations that we do, our intent is to try to come up
with the best estimate for what a home is going to sell for; which is, again, thinking
back from an economic perspective, it’s different than the asking price of the offer price. In a commodities context, you call that a
bid-ask spread between what someone’s going to ask in a bid; and the real-estate context,
we call that the offer price and the asking price. And so, what someone’s going to offer to
sell you their house for is different than when a buyer’s going to come in and say,
“Hey, would you take this for it?” There’s always a gap between that. What we’re trying to do with Zestimate is
to better inform some pricing decisions such that bid-ask spread is smaller, such that
we don’t have buyers who end up buying a home and getting taken advantage of when the
home was worth a lot less. And, we don’t have sellers who end up selling
a house for a lot less than they could have got because they just don’t know. So, we think that having great, competent
representation of both sides is one way to mitigate that, and one way that we think is
fantastic. Having more information about pricing decision
to help you understand what that offer to … offer-ask ratio, what the offer ask-spread
looks like is very important as well. So, from a data collection standpoint, and
then a data analysis standpoint, how do you make sure that you are collecting the right
data and then analyzing it in the right way so that you’re not influenced […] wrongly
or over-influenced in one direction, or under-influenced in another direction, which would, of course,
lead to distortions in the price estimates. Yeah. Let’s see, I’m trying to think of vices that
we watch for in the evaluation process. I mean, one obvious one is that the valuation
that we’re trying to produce is a valuation of an arms-linked bear market exchange of
a home which, those words are important because it means that there are a lot of transactions
which are not full value at arms-length. So, if you look in the public record and you
start to build models off the public record, you’ve got a lot of homes that are a lot of
deeds that are […] claimed due to the works. And you know, they are ten dollar exchanges
of real property, which is not a fair value. And, you have some that are arms-length, where
parents are selling homes to their children for pennies on the dollar, and those aren’t
fair value either. And then, of course, the most common example
from the past housing cycle is a foreclosure or short-sale, where, you know, we’re not
trying to… We do provide a foreclosure estimate, for
foreclosures, but the Zestimate itself is designed to tell you what that home would
transact for as a non-distressed piece of inventory in the open market; which means
that we’ve got to be really diligent about identifying foreclosure transactions and filtering
those out so that the model is not downwardly biased and becomes really a […] between
a non-distressed and distressed property. So that’s one area that we kind of have to
watch for quite a bit. And we have a question from Twitter. I’m glad this one went through. I’m having trouble getting my tweets out
there. And, this is an interesting one from Fred
McKlymans, who asks; he’s wondering how much the Zestimate use-case – how much as
the Zestimate helped define, rather than just reflect real estate value? So, what impact has Zillow itself had on the
market that you’re looking at? Yeah. That’s a question we get a lot, and particularly
as, you know, as our traffic has grown is people want to know, “Do you reflect the marketplace? Do you drive in the marketplace?” And my answer to that is that on any given… Our models are trained such that half of the
Earth will be positive and half will be negative; meaning that on any given day, half of [all]
homes are going to transact above the Zestimate value and half are going to transact below. […] I think [this] reflects the fact of
what we said since launching this Zestimate, which is we want this to be a starting point
for a conversation about home values. It’s not an ending point. You know, there was a reason why the name
“Zestimate” came from the internal working name of a Zillow Estimate. We got tired of calling it a Zillow Estimate
so we started to call it a Zestimate. And then when it came time to ship the product,
we’re like, “Why don’t we just call it that?” You know, but it was called the Zillow Estimate,
Zestimate, not the price because it is an estimate. And, it’s meant to be a starting point for
a conversation about value. And that conversation, ultimately, needs to
involve other pay means of value, include real estate professionals like an agent or
broker, or an appraiser; people who have expert insight into local areas and have actually
seen the inside of a home and compare that inside and the home itself to other comparable
homes. So, you know, that’s kind of designed to
be a starting point, and I think the fact that half of homes sell above the Zestimate
and half below, I think reflects the fact that people are … I think that’s an influential
data point and hopefully, it’s useful to people. But it’s not the only data point people
are using, because another way to think about that stat I just gave you is that on any given
day, half of sellers sell their homes for less than the Zestimate, and half of buyers
buy a home for more than the Zestimate. So, clearly, they’re looking at something
other than the Zestimate, although hopefully, it’s been helpful to them at some point
in that process. Mhmm. And, we have another question from Twitter. And again, I’m glad that this one went through;
it’s an interesting question: “Have you thought about taking data such as AirBnB data,
for example, to reflect or to talk about the earning potential of a house?” That is an interesting … I’m noodling on
that. We’ve done some partnerships with Airbnb on
economic research, kind of understanding the impact of Airbnb by housing data that we have. We do a lot of work on that. I think probably the direct answer to that
using AirBnB data is “no,” but when you say the earning potential, I guess what I’m hearing
is the potential to buy that home and convert into a cashflow-positive rental property,
and thinks like what’s the cap rate, or the capitalization rate of the price to rent ratio. And, that we do a lot of, because we also
have the largest rental marketplace in the US as well. So, we have a ton of rental listings, and
then we use those rental listings for a variety of purposes, among them being to help understand
price to rent ratios and what we compute as … Call it a “break-even horizon,” which
is how long you have to live in a house to make buying it more worthwhile than renting
it. So, we … And I guess the other thing would
directly help that question would be the fact that on any home page, on a page that lists
a home, we call them internally a home details page. On any home page on Zillow, we show both the
Zestimates, so what we think that home would sell for, and we also show a rent Zestimate,
what we think it would rent for. And, that hopefully allows the homeowner to
have some notion for if they decided to rent it out, what they could get for it. Now, the question that I think from Twitter
is an interesting new one, which is; our rental estimate is on the rent of that entire home. What if you just want to rent out a room or
part of that home? What’s your potential on that? And that is a very interesting question, which
we thought some about. We don’t have a product to directly … Cool
product there that seems directly related to the question would be, an estimate on Zillow
that would tell you if you did want to rent out a room or two on that house, what would
you fetch? And, that’s a very interesting […]. Duly
noted! Let’s go back to the discussion of machine
learning. Machine learning has become one of the great
buzzwords of our time. But, you’ve been working with enormous,
enormous datasets for many years now. And, when did you start? Did you start using machine learning right
from the start? Have your … We spoke a little bit about
this earlier, but how have your techniques become more sophisticated over time? Yeah. I would say I’ve been involved in machine
learning for a while, from I guess I started in academia when I was a researcher at a university
setting, and then at Expedia, I was very heavily involved in machine learning, and then here. So, you know, there has been … Biggest change
… Well, it’s hard to parse it. I was going to say the biggest change has
really been in the tech stack over that period of time, but, I should minimize the change
in the actual algorithms themselves over those years, where algorithmically, you see the
evolution from at Expedia, personalization, we worked more on things relatively sophisticated,
but more statistical and parametric models for doing recommendations; things like unconditional
probability, item-to-item correlations. And, now, most of your recommender systems,
they’re using things like collaborative filtering for algorithms that are optimized
more for high-volume data and streaming data. And in a predictive context, we’ve moved
from things like decision trees and support vector machines to now a forest of trees;
all those simpler trees with much larger numbers of them… And then, more exotic […] decision trees
that have in their leaf nodes more direction components which are very helpful in some
contexts. In terms of the tech stack, you know, it’s
been transformed. You know, back in the day, you were doing
stuff with seed code; we were using … Maybe we were doing prototyping in ADS-plus. You were usually coding in FORTRAN or C, but
you were doing it all from scratch […] on a single machine and trying to get as much
as you can into memory. And, you know, today, from that, it has gone
through to more proprietary systems; maybe you were using SaaS scale, to then you were
maybe using a database, maybe in my sequel, you were using Hadoop. And then today, generally, our firm and other
firms that are on the cutting edge here are using something like Spark, probably. Maybe, in coding directly in Scala, or maybe
using Spark to plug into Python or R. And then, generally, those frameworks are
now running into Cloud, and are using streaming systems like Nexus or Kafka; real-time triggering
of events. And so, all the infrastructure has changed,
and I would say for the better. As a data scientist now, you can get up and
start working on a problem on, you know, AWS, in the Cloud, and have an assortment of models
to quickly deploy much easier than you could back twenty years ago when you were having
to code a bunch of stuff; start out in MATLAB and import it to C, and you were doing it
all by hand. Are you looking at the … Are you making
predictions about the future value of the home, or only from the past to the present
moment? The Zestimate itself is, you know, as a, I
guess some people would call it a “now-path,” so it’s a prediction for what the home will
sell for today if it were on the market. We do also forecast the Zestimate forward
in time. Right now, we project forward about a year. And, that model is a combination of the machine
learning models I described before. The point-estimate of what we’re estimating
today; and then moves it forward. It’s combining it with the modeling framework
– a forecasting framework [with] which developed the purposes of forecasting our housing index
– of the Zillow Home Value Index, which tells you basically the home values have done
over the past twenty years, and what they will do with the next year. That forecasting framework is itself a combination
of some straightforward, univariate areas, and some more complex structural models that
are taking as inputs economic variables, and trying to predict what those economic variables
are going to do to home prices in your local market over the next year. We take those forecasts from the index and
then apply them to the individual level, with some nuances where it’s not just the forecast
for your area. We then break that forecast down by housing
segments so that maybe, high-end homes are going to appreciate more quickly than low-end
homes, or vice-versa. That nuance effects the forecast that is then
applied to the property level to create the forecast for the Zestimate. I want to remind everybody that we’re talking
with Stan Humphries, who is the Chief Analytics Officer, and Chief Economist at Zillow. And, if you’re trying to ask a question
by Twitter, just keep trying and some of those tweets are actually getting through. Stan, what about the data privacy aspects
of all of that? […] I know you’re aggregating public data,
but still, you’re making public information about the value of people’s homes and there’s
a privacy aspect to this. So, how do you think about that? Yeah. That’s a, you know… We’ve been fortunate in most of our business
operations… Really almost all of our business operations
involve the domains that are all a matter of public record. And, a lot of the value-added that we’ve
done is to bring that public record data in collating it together into one spot. And, putting it, standardizing it so there’s
kind of a standard way to look at it regardless of how they collect data in Idaho versus Florida. They’ll standardize it so that on Zillow,
you’re looking at it. Truly our […] easier […] you’re looking
at it all the same way. But, at its core, that’s all public record
information, which is beneficial when it comes to privacy because all of that data is, at
this point, generally accessible. It’s all available if you were to walk into
a county tax assessor or county recorder’s office. And at this point, most of those offices are
now online. So, if you knew where to look on the web,
then you could find that information online because it is a matter of public record, because
of the fact that real estate is based on property taxation. And, it is a longstanding history for why
things involving real property, liens, and actual information about real property is
public-domain information. But, in all states, most of that information
is public, except there are some states where the actual transaction price itself is not
a matter of public record, and those are called “non-disclosure” states; states like Utah
and Texas. Everything else is public record. And what we’re doing is then providing estimates
and derivative data on top of that. So, we’re creating housing indices out of
that data, or evaluations. And those evaluations are, theoretically,
no different than if you were to go to the county tax assessor’s website or into their
office, there is already a market value to access they’re putting on your home, which
is a matter of public record. Ours is, you know, we’re applying probably
a lot more […] hour and algorithmic specification that a lot of tax assessors are able to do. But, in principle, it’s exactly the same concept
as that. So how it feels different when that data is
aggregated and then presented in such a succinct form. And it’s also easily accessible. Somehow, it feels different. Yes. That’s true. I would say it feels different. I would say it feels different in a lot of
different ways. For most consumer applications, it feels different
because it feels really good. When, for an individual, there are some individuals
who would like that information to be…would like all the information! They would like no facts about their home
to be public, so they would probably prefer the county assessor to make it public. They would prefer the transactions were not
a matter of public record, and they would prefer if companies weren’t able to put a
derivative product on top of that. I certainly get that. They problem becomes a collective action problem
where individually we would all prefer to take all of our information offline, but collectively
we would like the ability to look at other information for us to make a better decision. And, collectively, as a society, we have decided
that this information should be public. And, because of that, properties like … companies
like Zillow are able to make that information public as well, which we think the consumer
benefits far outweighs the individual concern that they would prefer the facts about their
homes not be public. You know, I’ve … There are also, I think,
real social equity issues here of there’s a lot of research. When you look at kind of disclosure; non-disclosure
states, for example, you will find that taxation policy is … There’s been some fantastic
academic research on this issue, but property tax, there’s more inequality in property tax
in non-disclosure states than disclosure states because people are able to look at those transactions
and figure out how does that tax relate to what that home’s really worth? And therefore, disputes are less likely to
refer to the lower half of the price spectrum, but wealthy people will always go dispute
and try to get a lower assessment on their home. And that leads to more inequality in the assessment
of tax than would exist otherwise, which we think is a harm to the overall public benefit. And you just raised some public policy issues. And so, in our last five minutes, I’ll ask
you to put on your economist hat and share your thoughts on how this data economy; and
in a way, you’re right in the middle of the data economy. How is … How do you see that changing the
workforce and the public policy issues around that? Yeah. I think, you know, we … You know, I do a
lot of writing now on policy […] to real estate and housing. And also, some kind of more broader economic
discussion. And that broader economic discussion, I do
do a lot of … One of the themes I touch on somewhat often is the need for us to get
ahead of the changes that are coming due to machine learning and the data era, where I
think there are two parts of our social, of our societal framework that will really establish
facts. Generally with the last transformation; well,
not with the last transformation, probably the one prior to that; where basically, we
moved from an agrarian society to a manufacturing society, it was around that time when we started
mandatory, compulsory public education. We also started to set up, with the progressive
era in the early 1900’s, social security systems and unemployment systems that allowed for
people who may be thrown out of work from a manufacturing job to have a little bit of
a safety net, where they found their other job. You know, I am concerned in the current … This
is less real-estate related and more the impact machine learning is going to have full-bore
on our economy, thinking about the impact of driverless cars, for example, on people
who drive trucks and cars. You know, that’s five to eight million people,
and you know, they’re going to come under pressure as self-driving car technology becomes
more ubiquitous. And, I am concerned that one, we need to up
our educational game, where we need to think about college education as being the equivalent
in the late-1800’s of high school education, and we need to be doing a better job of training
in our college graduates for the jobs that exist. And, then I would say that on the unemployment
side, that system I described is set up for a world where you lose a job, and your next
job is likely to be in the same town you’re in, and in the same field. We’re going to go through […] in the next
thirty years, a lot of unemployment where the job you need to get is probably not in
the area you live in and it’s probably not in the field you’re in, so it’s going to require
some re-tooling. And that’s more that, like, six weeks to three
months of unemployment. We need to think hard about people who are
moving from a manufacturing job, and maybe their next job needs to be a computer-assisted
machine operator, which is a non-trivial job that needs to be trained for. And you’re not going to learn it in four weeks. So, I’m definitely interested in public policy
trying to address those issues in a better way. And in the last two minutes, what advice do
you have for public policy-makers on these topics? You mentioned education as being one thing. Any other thoughts on this? Yeah. I would just encourage us to … We seem to
be in particularly ideologically-charged times. You know, I would encourage us to think broadly,
you know, like we did when we came up with compulsory public education for children and
try to think, there are a lot of these ideas that if you think about [it], there are a
lot of these ideas that have been suggested from both the left and the right. And, for example, a viable, possible replacement
for, you know, short-term unemployment insurance with something more like a more robust negative
income tax, which we have a form of that in this country called an “earned income tax
credit,” where you, for low-wage workers, we supplement their income to the tax system. You know, Milton Friedman was a champion of
a very robust negative income tax on the right. We’ve got a lot of liberal thinkers who have
championed it on the left. That type of system, where people can kind
of step out of the day-to-day work, and be assured that they’re going to make a base-level
income for a longer period of time, and that income’s going to allow them to get another
job. Those ideas have come from the Left and the
Right, and I would hope that we’re going to be able to fashion a system that’s going to
work better for the next thirty next thirty years than we’ve got now; and that we don’t
get hung up on rigid ideology on it. Okay. And, I’m afraid that about wraps up our
show. We have been speaking with Stan Humphries,
who is the Chief Analytics Officer, and also Chief Economist of the Zillow Group. And Stan, thank you so much for taking your
time and sharing your thoughts with us! Michael, thanks for the interview! It’s a broad range of topics we got to cover. So, it’s quite unusual, but it’s been
fun! Yeah. That’s great! Forty-five minutes is enough time to dive
in. Everybody, thank you so much for watching,
and go to CxOTalk.com/Episodes, and be sure to subscribe on YouTube. And also, “like” us on Facebook. You should do that! “Like” us on Facebook as well. Thanks so much, everybody! Have a great day! Bye-bye.

4 Comments

  • Matthew Taylor

    seems to be talking a lot of fluff.

    'we have big numbers'

    zero description on accuracy or proof of accuracy, just buzzwords thrown together.

    seriously. I don't expect an established company to give away their trade secrets, that said this was appalling.

  • Jack Jones

    No algorithm can accurately predict human emotions to the myriad of factors that affect market values IE locational nuances, curb appeal, qualitative conditions and materials ETC. What Zillow will likely do is address these high risk accuracy issues with liquidation value outcomes, or inflated value calculations caused by inaccurate submarket boundary predictions.

Leave a Reply

Your email address will not be published. Required fields are marked *