By: Nat Sundar | Updated: 2018-01-31 | Comments | Related: > Power BI
Problem
I have read the previous tip about analyzing UK Petition data using Power BI. I understand because of technical limitations, the UK parliamentary system provides only 50 petition details in a JSON feed. So how can I load all open petition details into Power BI?
Solution
UK Petition Data
Because of a technical limitation, the JSON feed can only provide 50 petition details. The next set of 50 petition details are available in the next JSON feed. Now let’s understand the format of the supplied JSON feed.
Understanding the JSON feed format
The JSON feed has two major groups. They are “Links” and “Data”. In the below mentioned example, the link for the first, last and next JSON feed details are available under the navigation “Links". The first 50 petition details are available under the navigation “Data”. It is well understood that there were 65 pages of JSON feed petition details available.
The “first” link has the details about the current URL and the next set of petition details can be extracted from the link "https://petition.parliament.uk/petitions.json?page=2&state=open". By looking at the link carefully, we can come to a conclusion that the page number is getting increased in ascending order till the last page.
Based on this finding, we can modify the Power BI query through the advanced editor to extract petition details from the first page to the last page incrementally. As a first step, we need to extract the final page number from the JSON link.
Creating a query to find the last page number
As a first step, let’s try to extract the final page number from the JSON feed. For this purpose, I am going to extract data from the JSON feed "https://petition.parliament.uk/petitions.json?page=1&state=open".
Follow the below steps to extract data from the 1st page of the JSON feed for petition.
Click on GetData menu and "More" sub menu to open the "Get Data" dialog box as shown below.
Now select on the "Other" sub menu and select "Web" and click on "Connect".
On the dialog box, type the URL (https://petition.parliament.uk/petitions.json?page=1&state=open) as below and click OK.
Once the JSON feed has been loaded, you will be see the query properties.
This highlights the fact that there are two possible ways to navigate. There are "Links" and "Data". Details about the links and the data can be explored by clicking on the header respectively.
By clicking on the "links" header, it is revealed that it has the links for "Self", "First","Last","Next" and "Prev" page details.
Our main objective is to extract the last page number from the "Last" page link. So let’s navigate to the "Links" to see the details.
The above image lists all the available links with the JSON feed. To enable us to apply data transformation rules, we need to convert the result of the query into a table.
This can be done by clicking on the "Convert into Table" ribbon. After converting into a table, the result set will be available as shown below.
Now let’s apply the filter to list only the "Last" page as shown below.
The resultant table will have only one record with the last page URL. The last page number 42 (in this example) is embedded in the URL. We may need to apply few transformations to the page number.
As a first transformation, I would like to delimit the value by the "?" character. This will split the column value into two different columns. Let’s apply this transformation using the ribbon menu "Split Column" by delimiter and supply the delimiter "?" in the given dialog box as shown below.
Once the transformation has been applied, this will split the column "value" into two columns namely "Value.1" and "Value.2".
We need to split the column "Value.2" to get the actual page number. Let’s apply the transformation "Split column by delimiter" by the character "&" this time.
After applying the transformation, we are able to split the Value.2 column into two columns namely "Value2.1" and "Value 2.2". Now we need to apply one more transformation to get the final page number. Let’s split the column further by using the delimiter "=".
Now the final page number can be extracted by referencing the column "value.2.1.2". We are interested only the column which contains the last page number. So the other unwanted columns can be removed from the view.
Also, I have renamed the column as "LastPageNumber". The below picture represents the final result set.
Though we have derived the final page number, it may not be useful as is, because we will not in be in a position to lookup this value to get the final page number every time. So let’s wrap the extraction and transformation logic in a function, so the function can be called from other queries to get the final page number.
Converting the transformation logic to a function
We can easily convert the query to a function using the below steps. Select and right click on the query to open the menu. Click on the "Create function" menu to create a function, as shown in the picture below.
This function doesn’t have any parameters. So let’s create the function without the parameters.
Type the name of the function in the text box as shown below.
Now the function has been created.
Now let’s click on the “Invoke” button to test the function. This will return a table with a column "LastPageNumber", as shown in the picture below.
We are interested only in the value of the column. Hence the transformation logic can be updated as below to return only the value. Click on the “Advanced“ editor to see the transformation logic and update as below.
let Source = () => let Source = Json.Document(Web.Contents("https://petition.parliament.uk/petitions.json?page=1&&state=all")), links = Source[links], #"Converted to Table" = Record.ToTable(links), #"Filtered Rows" = Table.SelectRows(#"Converted to Table", each ([Name] = "last")), #"Split Column by Delimiter" = Table.SplitColumn(#"Filtered Rows", "Value", Splitter.SplitTextByDelimiter("?", QuoteStyle.Csv), {"Value.1", "Value.2"}), #"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Value.1", type text}, {"Value.2", type text}}), #"Split Column by Delimiter1" = Table.SplitColumn(#"Changed Type", "Value.2", Splitter.SplitTextByDelimiter("&", QuoteStyle.Csv), {"Value.2.1", "Value.2.2"}), #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter1",{{"Value.2.1", type text}, {"Value.2.2", type text}}), #"Split Column by Delimiter2" = Table.SplitColumn(#"Changed Type1", "Value.2.1", Splitter.SplitTextByDelimiter("=", QuoteStyle.Csv), {"Value.2.1.1", "Value.2.1.2"}), #"Changed Type2" = Table.TransformColumnTypes(#"Split Column by Delimiter2",{{"Value.2.1.1", type text}, {"Value.2.1.2", Int64.Type}}), #"Removed Columns" = Table.RemoveColumns(#"Changed Type2",{"Name", "Value.1", "Value.2.1.1", "Value.2.2"}), #"Renamed Columns" = Table.RenameColumns(#"Removed Columns",{{"Value.2.1.2", "LastPageNumber"}}), LastPageNumber = Table.Column(#"Renamed Columns","LastPageNumber"){0} in LastPageNumber in Source
Once done, click OK and then invoke and now you will see only the actual value.
Iterating through multiple pages
In the previous tip, we have learned to process a single JSON feed which contains only 50 petitions. Now we learned to extract the last page number in this tip. Let’s have a close look at the JSON feed for multiple pages.
- The first JSON feed can be extracted from the link https://petition.parliament.uk/petitions.json?page=1&state=open
- and the second page can be extracted from the link https://petition.parliament.uk/petitions.json?page=2&state=open
- and the third page can be extracted from the link https://petition.parliament.uk/petitions.json?page=3&state=open
- and the fourth page can be extracted from the link https://petition.parliament.uk/petitions.json?page=4&state=open
- and the final page can be extracted from the link https://petition.parliament.uk/petitions.json?page=65&state=open
Almost all the components of the URL are same except the page number. Hence the JSON link for multiple pages can be generated dynamically from the first page to the last page in Power BI. In the below steps, I will help you setup a query to iterate thru all the available JSON web pages to load data into Power BI.
Creating a blank query
Let’s create a blank query and edit the name of the query as "AllPetitions" as below.
Now click on the advanced editor and paste the below transformation logic.
let Source1 = GetLastPageNumber(), Starts = List.Generate(()=>1, each _ < Source1 , each _ + 1), #"Converted to Table" = Table.FromList(Starts, Splitter.SplitByNothing(), null, null, ExtraValues.Error), #"Changed Type" = Table.TransformColumnTypes(#"Converted to Table",{{"Column1", type text}}), #"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each Json.Document(Web.Contents("https://petition.parliament.uk/petitions.json?page="&[Column1]&"&state=all"))), #"Expanded Custom" = Table.ExpandRecordColumn(#"Added Custom", "Custom", {"links", "data"}, {"Custom.links", "Custom.data"}), #"Expanded Custom.data" = Table.ExpandListColumn(#"Expanded Custom", "Custom.data"), #"Expanded Custom.data1" = Table.ExpandRecordColumn(#"Expanded Custom.data", "Custom.data", {"type", "id", "links", "attributes"}, {"Custom.data.type", "Custom.data.id", "Custom.data.links", "Custom.data.attributes"}), #"Expanded Custom.data.attributes" = Table.ExpandRecordColumn(#"Expanded Custom.data1", "Custom.data.attributes", {"action", "background", "additional_details", "state", "signature_count", "created_at", "updated_at", "rejected_at", "opened_at", "closed_at", "moderation_threshold_reached_at", "response_threshold_reached_at", "government_response_at", "debate_threshold_reached_at", "scheduled_debate_date", "debate_outcome_at", "creator_name", "rejection", "government_response", "debate"}, {"Custom.data.attributes.action", "Custom.data.attributes.background", "Custom.data.attributes.additional_details", "Custom.data.attributes.state", "Custom.data.attributes.signature_count", "Custom.data.attributes.created_at", "Custom.data.attributes.updated_at", "Custom.data.attributes.rejected_at", "Custom.data.attributes.opened_at", "Custom.data.attributes.closed_at", "Custom.data.attributes.moderation_threshold_reached_at", "Custom.data.attributes.response_threshold_reached_at", "Custom.data.attributes.government_response_at", "Custom.data.attributes.debate_threshold_reached_at", "Custom.data.attributes.scheduled_debate_date", "Custom.data.attributes.debate_outcome_at", "Custom.data.attributes.creator_name", "Custom.data.attributes.rejection", "Custom.data.attributes.government_response", "Custom.data.attributes.debate"}) in #"Expanded Custom.data.attributes"
The GetLastPageNumber function will calculate the "Last Page Number" and will return a variable "Source". As a next step, a list will be generated containing the numbers from 1 to Last Page Number (say 65).
Now for each number the JSON feed will get generated by adding the number to the link. For the first page, it will be https://petition.parliament.uk/petitions.json?page=1&state=open and then the data gets loaded into the query and then it will iterate thru the next page and so on. Like this it will iterate thru all the pages and all petitions details will get loaded into the query.
Once the changes have been made and closed, then the final result set will look like the below.
Summary
In this tip, we learned how to overcome the limitation of the UK petition system by leveraging a custom function in Power BI. Stay tuned to read the next tip to develop a custom UK parliamentary constituency map to display signature counts for each petition.
Next Steps
- Read other Power BI tips here.
Learn more about Power BI in this 3 hour training course.
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips
Article Last Updated: 2018-01-31