Data Aggregation
Overview
In the previous section (Analyze CSV File), we described an example of uploading the City of New York's publicly available Air Quality (opens in a new tab) dataset published by City of New York and described an example of performing a simple analysis.
Here is an example of converting the resulting Python cell into a table and performing an aggregation.
Preparation
As a preliminary preparation, select the Python cell created in Analyse CSV File and:
-
press the '+' mark (Add cell) button and then Table to convert the Python execution results into a table.
-
The 'Create Table' modal is displayed. here enter the table slug and comments as follows and press the 'Create' button.
-
We have now saved the actual data from the Python cell analysis as a table.
-
To perform data aggregation on this table, create an SQL cell by pressing SQL from the button with the '+' symbol (Add cell) with a cell in the table selected.
Data aggregation
When the ">_ Open Editor" button is pressed with the SQL cell created in the preliminary preparation in the previous section selected, the following editing screen is displayed. As an example, data aggregation using SQL is performed here.
Approach
The table 'manhattan_air_quality' contains the average nitrogen dioxide (NO2) concentration per year, as analyzed in the Python cell. This can be used to perform more specific data aggregation.
In this section, we will assess the variation in NO2 concentrations from year to year and build a query to gain further insight into the trend.
Query description in SQL cells
Having already done the basic pre-processing for the tabulation, we paste the following query into the edit window to calculate the annual change in NO2 concentrations (year-on-year). This will allow you to tabulate the transition of NO2 concentrations in the Manhattan area.
This SQL query uses the LAG
function to obtain the NO2 concentration of the previous year and calculate the difference with the current year. As a result, you can see the increase or decrease in NO2 concentration from year to year.
Run the query
After pasting the query into the edit screen, press the 'Run' button at the top right of the screen to execute it. The results of the run are displayed in a tabular format at the bottom of the edit screen and you can now enumerate the NO2 concentrations for each year, the previous year's concentrations and the annual changes.