Property Analysis

1 Introduction

It is a truth universally acknowledged that a married man in possession of a good fortune must be in want of a house.

I've taken the liberty of updating good old Jane's statement for more contemporary times. This article talks about my process of looking at and understanding property prices in Ireland.

Some background for my non-Irish readers:

  • Ireland was mostly poor from the 1930's to the 1990's
  • Banks were very hesitant to lend money to anyone
  • And the interest rates were very high

This all changed during the Celtic Tiger period, when a flood of cheap euro-denominated money was dropped in by German and French helicopters 1. This lead (predictably) to a massive property bubble, which was sadly interrupted by the advent of financial contagion in 2007. Over the next few years, the economy fell apart, we were dragged into a bailout program, and house prices collapsed.

A few years after that, they started to recover, and right now, we have a massive housing crisis in that people can't afford houses, which puts more stress on the rental market, increasing prices.

Given that I don't trust the property-porn engaged in by the media and government, I decided to take matters into my own hands, and consult the property price register (PPR for future reference) and figure out whether or not I should buy or continue to slum it in a hell of short-term lets and fear 2.

1.1 Property Price Register

So, the property price register was established in the depths of the economic crisis 3, and so has a number of problems. The most obvious is that we only have data from 2010. Given data from the boom, it would be a much easier task to estimate tops and bottoms, but we can only work with what we have. The second problem with the data is that Ireland doesn't have postcodes 4, which makes localisation of the properties a challenge.

Nonetheless, the nice people at the PPR have collated the data for us, so there is that. They helpfully provide the data in CSV format, and have a document which contains all of the property data, which can be found here 5

So, on with the show! 6

1.1.1 Data Cleaning

In data analysis, much as in life, cleaning is important. Unfortunately, while keeping ones house and body cleaning rarely takes up all of one's time, data cleaning, can, and often does take up almost all of the time. So, enjoying (or at least not hating) this process, is an important career skill.

Anyways, so the first thing we do is look at the data, to assess what needs to be done.

dat  <- readr::read_csv("PPR_2020.csv")
  ## dat <- readxl::read_excel("PPR-ALL.xlsx", sheet="PPR-ALL")
  names(dat)
Date of Sale (dd/mm/yyyy)
Address
Postal Code
County
Price ()
Not Full Market Price
VAT Exclusive
Description of Property
Property Size Description

OK, so we have date (nicely, they have specified the format), address, postal code, county.

I have some problems with these names. The spaces will cause problems, and capitalisation is so passe. I'll fix these problems with some code.

normalise_names <- function(df) {
    nms <- names(df)
    normed <- iconv(
        tolower(
            gsub("([[:space:]]|[[:punct:]])+", "_",
                 x=nms)),
        "latin1", "ASCII", sub="")
    drop_usc <- gsub("([a-z_])_*$", "\\1",
                     x=normed)
    names(df) <- drop_usc
    df
}

This function does a few things:

  1. it replaces spaces or punctuation with underscores7
  2. it then converts all text to lowercase
  3. Then we exert cultural imperialism and export to ascii9
  4. finally, we replace multiple underscores with one (for tidyness, mostly)
dat2 <- normalise_names(dat)
names(dat2)
date_of_sale_dd_mm_yyyy
address
postal_code
county
price
not_full_market_price
vat_exclusive
description_of_property
property_size_description

Much better. Little things like this add up to a better, more productive life.

Next, we need to remove the commas from the input doc (which have become hopelessly mangled). Again, we abuse iconv, this time on a particular column. My life would be much easier if I took the time to learn how to handle encodings 10

fix_price <- function(x) {
    nopunct <- gsub(",|", "", x=x)
    nums <- iconv(nopunct, "latin1", "ASCII", sub="")
}
dat2$price <- with(dat2, fix_price(price))

These preliminaries accomplished, we can move towards actually looking at the data

prop_df <- dat2 %>%
    mutate(date_of_sale=lubridate::dmy(date_of_sale_dd_mm_yyyy),
           not_full_market_price=as.factor(not_full_market_price),
           postal_code=as.factor(fct_explicit_na(postal_code)),
           vat_exclusive=as.factor(vat_exclusive),
           county=as.factor(county),
           price=as.numeric(price))
prop_df2 <- mutate(prop_df,
                   year=lubridate::year(date_of_sale),
                   month=lubridate::month(date_of_sale),
                   day=lubridate::day(date_of_sale))

prop_sample <- sample_frac(prop_df2, size=0.1)

We also take a ten percent sample, just to speed up the analysis 11

1.1.2 Wasting Time with Plots

pricedens.png

So, when we do the simplest plot ever (a one-way summary of price), it looks pretty weird. Most of the density is in the one line near zero, and there's a long tail that goes up to 100mn. Who are these people buying houses for many millions of euros? More importantly, how can I become one of them?

The first thing to do when you find something weird like this is to take a look at the observations and see if they are different 12. Let's sort the data by price and look at the addresses.

arrange(prop_df, desc(price)) %>%
    select(address, county, date_of_sale, price) %>%
    head()
address county date_of_sale price
Blocks A and C, The Point Campus Dublin 2019-12-23 142256576.05
COMMERCIAL PROPERTY AND, 423 RESI UNITS AT CLANCY QUAY, ISLANDBRIDGE Dublin 2018-06-22 139165000
266 Apts & 18 Commercial Units, Heuston South Quarter, St. John's Road West Dublin 2019-07-19 129500000
APARTMENT 1 - 197 NEPTUNE BLOCK, HONEYPARK, DUN LAOGHAIRE Dublin 2020-02-27 114440820
189 APARTMENTS AT ELMFIELD, BALLYOGAN, DUBLIN 18 Dublin 2020-02-27 99762972.37
Apartments 1-164 Fairwary Court, &amp; 1-50 Fairway Hall, Cualanor Upper Glenageary Road Dublin 2019-05-27 95242291

OK, so the Cannon house thing must be more than one apartment. Otherwise things have gone even madder than I thought. Also Apt 204 the Alliance, and 182 the Elysian 13. Entire apartment blocks for 76-86 million makes more sense, however.

arrange(prop_df, desc(price)) %>%
    select(address, county, date_of_sale, price) %>%
    tail()
address county date_of_sale price
9 Greenfields, Lanesboro Road, Roscommon Roscommon 2015-05-22 5179
LAVALLY, BALLINTOGHER Sligo 2014-05-01 5177
65 ST JOSEPH'S PARK, NENAGH, CO TIPPERARY Tipperary 2018-09-04 5080
Loghnabradden, Fintown, Co. Donegal Donegal 2012-01-11 5079
CLOGHAN, GLENCOLMCILLE, DONEGAL Donegal 2014-07-18 5079
COULAGHARD, EYERIES, BEARA Cork 2019-03-11 5030.53
  • there must be something really wrong with these houses.

It does tell us something about the tails14 of the crash, in that property was worth so little in Roscommon in 2015 that this happened. It may also be an error, as I don't know what kind of checking goes into this data.

countyearly.png

However, it's interesting to note that there weren't that many sales. Even at 60k it's only just over 5% of the housing stock 15. That's not a huge amount, but given the lack of data for the boom, it's very difficult to make those judgements.

yearly.png

So the overall trend is pretty clear. The chart above shows median prices. For those of you who aren't stats nerds, the median is the middle value. If you laid out all the prices in a line, the median would be the middle of the line. This means that it's a reasonable summary of the distribution.

Really, you should look at the distribution itself, but aggregations such as the median are useful for plots and telling people about.

meanyearly.png The mean, which is the sum divided by the length of the set of numbers is much higher. This is pretty normal in most sets of data because huge values at the top or bottom have more impact on the mean. Again, the same trend is pretty visible. The mean price in Dublin is 400k, which assuming a 10% deposit means that a household would need a combined income of approximately 102k.

This puts houses in Dublin far out of reach of ordinary earners, which is perhaps not surprising given the incredible inflation in asset prices we've seen since the financial crisis in 2008.

Note that I've filtered out the properties sold for more than 2mn, as these are mostly apartment blocks and groups of houses. These probably shouldn't be looked at for this analysis 16.

Looking closer at Dublin (and breaking down by postal code), we can see some more interesting patterns.

dubmonthly.png Prices are definitely trending up, but there are some large outliers across the board.

Like this former house owned by an executive in Dublin 20, for example. That place definitely doesn't reflect the average experience in that area. In general, I'm beginning to think that there are multiple housing markets in Ireland, not one. This makes sense as one can only compete against others in the same price range. This would suggest that I should regard the house buying denizens of South Dublin to be some kind of alien species. I do enjoy when a data-driven theory agrees with my priors.

dubsalesdaily.png

  • Long periods of not much activity, followed by sudden spikes.
  • We should probably dig into some of the spikes to figure out what they mean.
  • It's possible that they may be one overall transaction recorded as multiples (for instance, when an entire apartment block is sold to one developer, and it's recorded unit by unit).

1.1.3 Conclusions

So, above we've reviewed the property price register, and done a bunch of data cleaning, but not really accomplished that much overall. Part Two will cover some other interesting datasets 17, while future parts will cover geocoding the data and looking at spatial patterns, as well as some bayesian modelling with Stan.

Hopefully someone makes it thus far, at least ;).

Footnotes:

1

In fact, it was done via banks, but it doesn't really have the same ring to it now, does it?

2

actually I've been in the same house for many years, but I likes a good simile.

3

by the Greens actually, so at least I don't feel my vote for them was entirely wasted

4

we do now, but they are not on any of the addresses

5

they also provide commercial data (which I just noticed), but I am resisting the urge to become distracted

6

for very nerdy, boring values of show

7

You could use any character, but I like underscores. It's totally pointless though, so really we should just standardise on one 8

8

clearly underscores

9

this whole lack of understanding of encodings is deeply embarassing

10

look at how well I'm resisting distraction!

11

turns out sampling is pretty cool, who knew?

12

or you could just log it and get one with building a CNN I guess

13

that dude finally made his money back, after opening really expensive apartments just as the crash hit

14

geddit?

15

assuming 1mn households. I should probably check this with the CSO

16

yes, this is really subjective, like almost everything in data analysis. The advantage here is that I provide code so people can change that however they want

17

it's possible that if you've read this article before you'll note that this was once one article. I came to the conclusion that people might be more inclined to read my article if they weren't more than 10,000 words long.

Author: richie

Created: 2020-05-12 Tue 16:35

Validate