Miracle A/S
Borupvang 2C
2750 Ballerup
Tlf.: +45 4466 8855
CVR: 2567 8990
info@miracle.dk

X

Astma-Allergi
Danmark

Automatic forecast of pollen count of the day

Miracle constructed a platform that predicts the local pollen count by machine learning.

Challenge 

Astma-Allergi Danmark wanted to be able to predict pollen counts with help from measurements combined with weather data. This will partly make it possible to hit quite accurately and partly make it possible to predict pollen counts several days ahead. With more measurement stations, there is also potential to predict better locally.

Solution

Miracle has completed a project delivery for Astma-Allergi Danmark. The goal of the project was to build a platform, which, by using machine learning, could predict the pollen count for the following day, based on weather reports.

We used a method consisting of three steps:

  1. Data collecting and pre processing
    Here data is collected and prepared, thus ready to be analyzed and interpreted by a training algorithm.
  2. Training of models
    Here, one or more models are trained on the prepared data
  3. Evaluation
    At this step, the accuracy of the model is tested. If the accuracy is not high enough, new data is either collected or additional pre-processing is carried out. The process thus starts all over again. By following this method, we have achieved an accuracy of 76 %, which is comparable to the accuracy achieved by the experts when manually predicting the upcoming day’s level of pollen.



Weather- and pollen data

Based on weather reports, Miracle has constructed a model that is able to predict the pollen count for the following day. The weather data is extracted from the Meteorological Institute of Denmark, and contains measurement points of the minimum, maximum and average temperature, as well as the average wind velocity, wind direction, air pressure and precipitation – everything on a daily basis.

The data, which describes the amount of pollen in the air, is collected by Astma-Allergi Danmark, which operates and monitors all pollen traps in Denmark. Astma-Allergi Danmark’s pollen warning comes in three different levels: low, medium and high. A machine learning model thus had to be constructed, based on the described weather and pollen data, which, based on the weather forecast, could predict whether the level of pollen the following day would be low, medium or high.

Konklusion

With an accuracy matching the level of the experts’ assessments, the project must be said to be a success. In addition to freeing up time for the experts, the model also offers brand new possibilities. One can, for instance, imagine the model being used in other parts of the country. The walkability of this is determined by the extent to which the weather and the seasons determine the level of pollen, versus how much the local flora influences. This approach can be tested with data from other pollen measuring stations.

Another very interesting implication is the possibility of making predictions further into the future. Since the model is based on weather data, you can simply give it a two or three days weather forecast and thereby get a prediction of the level of pollen on the given day. This approach naturally becomes more inaccurate the further ahead in time one attempts to predict.

Technical details can be found in this LinkedIn article.

 

perm_data_setting
Algorithm

We used the balance algorithm SMOTE (Synthetic Minority Over-sampling Technique) in order to transform the data set, so that there were approximately equal numbers of observations of each type.

SMOTE works by modeling all the observations as points in a high dimensional space. Lines are then formed between the points that represent the minority observations. Lastly, new and random points are formed along these lines.

sports_football
Training the model

When training a model, you need pre-classified data. In our case, we wished to predict the level of pollen based on weather data. This is represented in two matrices: x, which is the data you predict based on y, which is what you wish to predict. In this way, we are continuously able to train and expand the model with data.

account_tree
Model type

Out of the model types tested, random forest proved to be the most accurate. Random forest works by making a randomized split of the data set. You thus get a number of smaller data sets, and a decision tree is trained on each of these.

About

Astma-Allergi Danmark has a mission to make the lives easier for everybody affected by asthma, allergies, hay fever and eczema.We do that by providing advice and counseling to persons affected by asthma, allergies, hay fever and eczema. Developing tools that can help with specific everyday issues (e.g. apps and courses). Collecting knowledge about how the everyday looks like in the lives of patients and relatives. Stay up to date on new research. Take part in the debate and put political focus on the topic.

We also work to enhance preventive aspects. Today, about 1.5 million Danes are affected by asthma, allergies or another type of hypersensitive disease – and the number is still increasing.

Do you want to learn more? Then contact

 

Simon Møgelvang Bang
CTO
Der anvendes cookies for at løfte brugeroplevelsen
Got it