- Quality assurance and adjustment of data zone estimates
- Methodology – Creating the 2002 data zone population estimates
- The RAS Principle
- Contact us
Preliminary data zone population estimates were produced in May 2005 and since then a quality assurance process has been taking place. As part of this process the Small Area Population Estimates (SAPE) working group members provided a list of contacts in local authorities who would be able to look at our data zone estimates and identify any areas of concern. As part of this process a variety of methods have been adopted with many local authorities looking at the records they hold of demolitions and new build to see if the population changes are in line with the housing changes in that area. As a result of this process we managed to identify 222 out of the 6,505 data zones as requiring further attention.
A simple analysis (summary statistics and distributions) of the estimated data zone populations was also undertaken as part of the Quality Assurance (QA) process. This analysis identified that several data zones had negative estimated populations for some age groups. Further investigation found these to be mainly student age groups, aged 16-24, and that the negative population was the result of a very large net outflow from that data zone during the period in question.
Further sources used during the quality assurance process were the assessor dwelling counts (produced by the National Records of Scotland (NRS) household estimates branch), the 2003 child benefit data (produced by the Inland Revenue) and the NRS postal address files for the period 2001-04. However, given that the child-benefit data were only available for one year and the methodology used to produce the assessor dwelling counts had changed from the production of the 2003 to the 2004 estimates, it was deemed that any findings would only be used to identify problem data zones, if they were identified using both the NRS postal address files and local authority knowledge.
To allow us to identify problem data zones the following process was used.
Does the data zone contain large negative populations?
Has the data zone been highlighted as a problem data zone by local authorities?
All data zones identified as being potential problems by local authorities were extracted and compared with the NRS Postal Address File (PAF) change multiplied by the average household size from the 2001 Census to pick out definite problem areas. The methods of adjusting the problem areas are outlined below.
If a data zone has a large negative population the population for that sex and age group is set back to what it was in the previous year. The difference is then re-apportioned amongst other data zones in the same council area, sex and age group to ensure that the data zone estimates aggregate up to the published mid-year local authority estimates.
For all areas identified as problematic by the consultation with local authorities the change in the NRS postal address file (PAF) between 2001 and 2002 was then compared with our estimated change in the SAPE in these areas. If the expected change in population (obtained by multiplying the PAF change by the 2001 Census Household Count) during this period was similar to the SAPE change for these areas then they were not adjusted for the period 2001-02. However, if the expected change indicated that there was a problem then these areas were extracted and split into two camps, students and others. The following adjustments were then made.
If the area was a student area - where a student area was an area whose population during the 2001 Census was made up of 20 per cent or more students - the population was set back to what it was in the previous year. The data were then made consistent with the mid-year estimates for council areas (MYE). An adjustment was then made to all age groups excluding the (16-24) group so that the population in this data zone was changed by the amount expected from examining the change in the NRS geography postal address file. A counteracting adjustment was made to data zones within the same council area, sex and age group so that the data zones aggregated up to the MYE Council totals.
A similar process was undertaken if the problem data zone was a non-student area. However in this case the expected change adjustment was made across all age groups including the 16-24 groups. Again a counteracting adjustment was made to data zones within the same council area, sex and age group so that the data zones aggregated up to the MYE Council totals.
Once the adjustments are carried out the SAPE is aggregated up to the MYE Council estimate totals to ensure that they match. This is required as data zones do nest exactly into council areas and so the data zone estimates should aggregate up to the published Council area MYE.
In order to carry out the cohort-component method to produce the data zone SAPEs it was first necessary to create a 2001 data zone SAPE. In order to do this births, deaths and migration data for the nine week period between the 29 April 2001 Census and 30 June 2001 was obtained at a postcode level and the corresponding data zone code for the postcode matched on. The 2001 Census Population was then adjusted by adding on the births, subtracting the deaths and adjusting for the nine week migration.
The 2001 Armed Forces population by data zone was created by using the 2001 Census AF population by postcode and matching on the corresponding data zone code using the Scottish Neighbourhood Statistics postcode-data zone geography lookup. The population was then summarised by single year of age, sex and data zone. The data contained in this file was then removed from the 2001 SAPE prior to ageing on the population. The resulting population was then aged on one year. Data on births and deaths from mid-year 2001 to mid-year 2002 was then obtained from the vital events branch of the NRS at a postcode level and the corresponding data zone code matched on. Births were then added on to the aged on population and the deaths subtracted.
Once the armed forces personnel have been removed and the population aged on and adjusted for births and deaths, the population needs to be adjusted for migration flows. The migration dataset produced by the mid-year estimates process is at a postcode level and so these postcodes are used to match on the corresponding data zone. The data is then summarised by in-migrants and out-migrants and the net migration calculated. It is this net migration which is used to make the data zone migration adjustment.
Data on ward level asylum seekers used to calculate the 2002 ward level SAPE already existed. To apportion this data to data zone level a data zone corresponding to a given ward was randomly selected for each asylum seeker in Glasgow. Once the data zone asylum seekers data was created it was added onto the population which was previously adjusted for migration flows (Section 2C.)
Once the adjustments for migration and asylum seekers have been carried out the 2002 armed forces population is added back in. To create the 2002 armed forces at a data zone level, the 2001 armed forces population is made consistent with the 2002 MYE total. It is then used in conjunction with the 2002 MYE AF age-sex distribution to get the 2002 AF distribution by data zone, age and single year of age by using the RAS system. The RAS system is a method of controlling cells of a matrix to desired row and column totals. The desired row and column totals must be known and must sum to the same total. Initial cell values must be assigned, but these need not sum to the desired totals. Further information on the RAS principle can be found in Section 3.
The final adjustment to the population is for unmeasured migration. This adjustment is based on the 2001 Census, which showed that previous population estimates had overestimated the population of Scotland by some 50,000. To ensure that migration estimates do not continue to be overestimated an adjustment was included in the 2002 mid-year estimates. Further information on this adjustment can be found in the 'Mid-2002 Estimates, Notes and Definitions' on this website.
In order to create the unmeasured migration data zone population, the ward level data for both in- and out-migration was apportioned to data zone using the same method that was used for the asylum seekers data.
Each of the parts created above were used in conjunction with the births and deaths data in the following way:
- Remove 2001 Armed Forces Population from 2001 SAPE;
- Age on the resultant population;
- Add on Births;
- Subtract Deaths;
- Adjust for Migration;
- Add in Asylum Seekers;
- Adjust for Unmeasured Migration;
- Add in 2002 Armed Forces Population;
- Make consistent with the 2002 Mid-Year Estimate (MYE) for Council areas.
The 2002 and 2003 data zone SAPE were in turn used as the reference dataset instead of the 2001 SAPE;
There was no nine week period to take into account;
The asylum seeker data at ward level for 2003 and 2004 was not available. Instead the following methodology was applied.
The age-sex distribution and total number of the asylum seekers used in creating the mid-year estimates are applied to ward level data on asylum seekers provided by Glasgow City Council. By making this data consistent with the MYE asylum seeker totals by age and sex and then apportioning these to data zones (by randomly selecting a data zone within the ward). Once the data zone asylum seeker data is created the data is added onto the population which was previously adjusted for migration.
This is a method of controlling cells of a matrix to desired row and column totals.
The desired row and column totals must be known and must sum to the same total. Initial cell values must be assigned, but these need not sum to the desired totals.
The method involves summing the initial cell values to give initial row and column totals and then choosing rows or columns first (whichever is least likely to be well reflected by the initial cell values - say columns for the purpose of this example).
Calculate a column factor = desired column total divided by initial column total
For each column and multiply the column by the appropriate factor to amend the cell values so that they now approximately agree with the desired column total.
Sum the rows to obtain new row totals and calculate row factors.
Row factor = desired row total divided by new row total
Multiply the row by the appropriate factor to amend the cell values so that they now approximately agree with the desired row total.
Repeating this iterative process a limited number of times the values will converge toward stability.
Please contact our Statistics Customer Services if you need any further information.
E-mail: [email protected]