2. Land Cover Validation with LUCAS dataset

This is an example of a land cover product validation using LUCAS points. The process is using the class Validator to perform the main validation steps.

2.1. Install

2.1.1. Note for Google Colab

[ ]:
!git clone https://gitlab.com/geoharmonizer_inea/st_lucas/st_lucas-python-package.git
!(cd st_lucas-python-package/; git pull)
import sys
sys.path.insert(0, './st_lucas-python-package/docs/notebooks/')

2.1.2. Install requirements

[ ]:
!pip3 install geopandas pyyaml==6.0 ipyleaflet
print("INSTALLATION COMPLETED")

Now we have to restart runtime: Runtime -> Restart runtime (on Google Colab) or Kernel -> Restart (on JupyterLab).

[1]:
import os
import yaml

from osgeo import gdal
from osgeo import gdalconst
import geopandas as gpd
import numpy
import urllib

import matplotlib.pyplot as plt
%matplotlib inline

from validator import Validator

2.2. Land Cover validation

2.2.1. Configure validation

Check contents of the config.yaml file.

[2]:
# configuration with sample data
config_file = "sample_land_cover/config.yaml" # On Google Colab: "./st_lucas-python-package/docs/notebooks/sample_land_cover/config.yaml"

with open(config_file, 'r') as file:
    file_contents = file.read()

print(file_contents)
project:
  name: 'Geoharmonizer Land Cover validation'
  abbrev: 'cz_lc_18'
  run_id: '20210907'

# land cover & reference definitions
input:
  # raster map
  path: ./sample_land_cover
  in_ras: cz_land_cover_osm_2018.tif
  ndv: 0
  legend: legend.yaml
  # vector reference
  in_vec: cz_lucas_points_l1_2018.shp
  ref_att: 'label_l1'

# validation report settings
report:
  path: ./sample_land_cover
  dir_name: 'lc_2018_validation'

# validation points for GIS exploration
validation_points:
    file_name: 'validation_points'
    ogr_format: 'ESRI Shapefile'
    epsg: 3035



2.2.2. Initialize the validator

Initilize the validator by passing the config file or a Python dictionary with the same structure

[3]:
validation = Validator(config_file)


Validation project initialized!
Inputs:
cz_land_cover_osm_2018.tif
cz_lucas_points_l1_2018.shp


2.2.3. Check validity of the inputs

[4]:
# Check if you can read the geodata

inputs_valid = validation.check_inputs()
print('Validation data ready: {}'.format(inputs_valid))
Validation data ready: True
[5]:
# Check contents of the raster and vector geodata

with open(config_file) as file:
    cfg = yaml.load(file, Loader=yaml.FullLoader)

# Vector data
vector_fn = os.path.join(cfg['input']['path'], cfg['input']['in_vec'])
gdf = gpd.read_file(vector_fn)

gdf.head()
[5]:
point_id survey_dat gps_altitu gps_lat gps_long nuts0 obs_dist obs_type lc1 lc1_perc label_l1 geometry
0 48262936 2018-05-10 309 49.321800 16.956880 CZ 6.0 1 C33 5 NaN POINT (4826005.575 2935998.024)
1 47162980 2018-08-02 409 49.799533 15.492019 CZ 22.0 1 B75 4 3.0 POINT (4716021.636 2979995.779)
2 46183058 2018-07-19 242 50.557290 14.193572 CZ 0.0 1 B13 3 2.0 POINT (4617999.865 3057999.905)
3 45262962 2018-07-17 545 49.734924 12.844097 CZ 0.0 1 B35 4 2.0 POINT (4526000.963 2961999.910)
4 47463004 2018-07-24 251 49.993666 15.933803 CZ 0.0 1 B11 4 2.0 POINT (4745999.810 3004000.403)
[6]:
# Check the legend

legend_file = "sample_land_cover/legend.yaml" # On Google Colab: "./st_lucas-python-package/docs/notebooks/sample_land_cover/legend.yaml"

with open(legend_file, "r") as file:
    legend = file.read()
print(legend)
legend:
    1: Artificial
    2: Cropland
    3: Perenial
    4: Forest
    5: Shrubland
    6: Grassland
    7: Barren
    8: Wetlands
    9: Water
    10: Glaciers

[7]:
# view distribution of the classes

attribute = cfg['input']['ref_att']
gdf[attribute].value_counts().plot.pie(figsize=(7, 7), autopct='%1.1f%%')
[7]:
<AxesSubplot:ylabel='label_l1'>
../_images/notebooks_02_land_cover_validation_17_1.png
[8]:
# Show a map of the data

from ipyleaflet import Map, GeoData, basemaps, LayersControl

gdf_4326 = gdf.to_crs("EPSG:4326")
center = gdf_4326.dissolve().centroid

m = Map(center=(float(center.y), float(center.x)), zoom = 7, basemap= basemaps.OpenStreetMap.Mapnik)

lucas_gd = GeoData(geo_dataframe = gdf_4326,
                   style={'color': 'black', 'fillColor': '#3366cc', 'opacity':0.05, 'weight':1.9, 'dashArray':'2', 'fillOpacity':0.6},
                   point_style={'radius': 2, 'color': 'red', 'fillOpacity': 0.8, 'fillColor': 'blue', 'weight': 3},
                   name='LUCAS points')

m.add_layer(lucas_gd)
m.add_control(LayersControl())

m
/tmp/ipykernel_17328/1624841052.py:6: UserWarning: Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.

  center = gdf_4326.dissolve().centroid

2.2.4. Run land cover map and reference overlay

[9]:
validation.overlay()


Processed: 100% | 4930 reference points.
[9]:
0

2.2.5. Report the validation results

[10]:
# short report
validation.short_report()
Validation indicators:
---
Overall map accuracy is 84.16 %
No. passed: 3426
No. failed: 645
No. points used in validation: 4071
[10]:
0
[11]:
# full report
validation.report()


Machine learning validation indicators (per class):
---
              precision    recall  f1-score   support

           1       0.44      0.84      0.58       185
           2       0.86      0.97      0.91      1801
           3       0.62      0.13      0.22        61
           4       0.93      0.90      0.92       989
           5       0.15      0.18      0.16        17
           6       0.91      0.63      0.75       952
           7       0.62      0.11      0.19        46
           8       0.25      0.27      0.26        11
           9       0.30      0.33      0.32         9

    accuracy                           0.84      4071
   macro avg       0.56      0.49      0.48      4071
weighted avg       0.86      0.84      0.84      4071

Classical LC validation indicators:
---
overall_accuracy :  0.8416
producers_accuracy :  0.8583
users_accuracy :  0.8416
kappa :  0.7670


[11]:
0
[12]:
# Save the validation report to a text file for a future use
validation.save_report()
Saving validation report:
---
cz_lc_18_validation_report.txt


[12]:
0

2.2.6. Plot confusion matrix

[13]:
validation.show_confusion_matrix()
../_images/notebooks_02_land_cover_validation_26_0.png
[13]:
0
[14]:
# Check the legend again

print(legend)
legend:
    1: Artificial
    2: Cropland
    3: Perenial
    4: Forest
    5: Shrubland
    6: Grassland
    7: Barren
    8: Wetlands
    9: Water
    10: Glaciers

[15]:
# Save the confusion matrix

validation.save_normalized_confusion_matrix()
[15]:
0
../_images/notebooks_02_land_cover_validation_28_1.png
[16]:
# You can also save the plots into the validation directory for later use
validation.save_confusion_matrix()
validation.save_normalized_confusion_matrix()
[16]:
0
../_images/notebooks_02_land_cover_validation_29_1.png
../_images/notebooks_02_land_cover_validation_29_2.png

2.2.7. Save the validation overlay geodata to a vector

You can

[17]:
validation.save_vec()
Saving validation data:
---
Vector data: validation_points.shp created from overlay data
[17]:
0

2.2.8. Classes aggregation

[18]:
config_aggregation = {
    'project':
      {'name': 'Geoharmonizer Land Cover validation',
      'abbrev': 'cz_lc_18',
      'run_id': '20210907'
      },
    'input':
      {'path': './sample_land_cover',
      'in_ras': 'cz_land_cover_osm_2018.tif',
      'ndv': 0,
      'legend': 'legend.yaml',
      'in_vec': 'cz_lucas_points_l1_2018.shp',
      'ref_att': 'label_l1'
      },
    'report':
      {'path': './sample_land_cover',
      'dir_name': 'lc_2018_validation_aggregation'
      },
    'validation_points':
      {'file_name': 'validation_points',
       'ogr_format': 'ESRI Shapefile',
       'epsg': 3035
      }
}
[19]:
validation_lc_aggregated = Validator(config_aggregation)


Validation project initialized!
Inputs:
cz_land_cover_osm_2018.tif
cz_lucas_points_l1_2018.shp


[20]:
# 2: agriculture (arable land & grassland)
aggregartion = {
    2: [2, 6]
}
[21]:
validation_lc_aggregated.overlay(aggregartion)


Processed: 100% | 4930 reference points.
[21]:
0
[22]:
validation_lc_aggregated.report()


Machine learning validation indicators (per class):
---
              precision    recall  f1-score   support

           1       0.65      0.90      0.76       174
           2       1.00      1.00      1.00      3113
           3       1.00      0.18      0.31        44
           4       0.97      0.95      0.96       945
           5       0.21      0.25      0.23        12
           7       0.71      0.25      0.37        20
           8       0.30      0.43      0.35         7
           9       0.33      0.38      0.35         8

    accuracy                           0.97      4323
   macro avg       0.65      0.54      0.54      4323
weighted avg       0.97      0.97      0.97      4323

Classical LC validation indicators:
---
overall_accuracy :  0.9683
producers_accuracy :  0.9735
users_accuracy :  0.9683
kappa :  0.9267


[22]:
0
[23]:
validation_lc_aggregated.show_confusion_matrix()
../_images/notebooks_02_land_cover_validation_39_0.png
[23]:
0