National Records of Scotland

Preserving the past, Recording the present, Informing the future

2001 Census Ethnicity Reports - Report on Coding of Other Ethnic Group

2001 Census Ethnicity Reports - Report on Coding of Other Ethnic Group

4. Text Descriptors

Table 2 (in the Tick and Text Data section of the report) shows that there were 8,200 cases in which some text was captured on the form. To make this text amenable to analysis it has undergone some grooming. Grooming entailed a certain amount of coding afresh, but the aim was to distil the text entry or entries for each person into a descriptor of ethnicity. The text was groomed as follows.

  • Spelling was corrected.

  • The terms used were standardised e.g. adjectives were preferred to country names; also Scottish preferred to Scots; English to Anglo; Filipino to all variations of belonging to the Philippines, etc (even Filipina when the person was female). Some similes have certainly been left as separate descriptors e.g. Amerindian and Native-American, Black-Caribbean and African-Caribbean, where it is possible the form-filler may have made a definite choice not to use the alternative form.

  • Qualifying words such as ‘Half’ were extracted into a separate item in the analysis ‘type of descriptor’. Other such transformations were: ‘Mother English Father Nigerian’ became ‘English Nigerian’ with ‘Mother-Father’ in ‘’type of descriptor’. Separating the information on ancestry from ethnic descriptors helps standardise the descriptors while still keeping the information provided about a person’s particular ancestry.

  • Sometimes, although the person may be of mixed ethnicity, we are left with only one descriptor. For example ‘Half Jamaica’ will become a single descriptor of ‘Jamaican’, although ‘Half’ will appear in ‘type of descriptor'.

  • Combinations of terms were kept – and hyphenated - if one was seen as qualifying a descriptor rather than constituting a separate descriptor eg White-Scottish, Kurdish-Turkish, African-Caribbean.

  • The final stage of grooming was to remove duplicate descriptors for an individual that may have come from text written in several text boxes.

The result was to generate 8,192 descriptors; for 8 persons the text was ‘No ethnic group’ or similar or something indecipherable. Appendix D provides a list of all descriptors with more than 20 occurrences. 

The text of the forms distilled down to 481 distinct descriptors, the most common of which was ‘Arab’ (Appendix D). ‘Arab’ as a simple descriptor with no qualification amounted to 974 cases, and with a qualification such as ‘Bahraini’ the ‘Arab’ descriptor covered 1,053 cases. Japanese accounted for 931 cases and Filipino 840 cases.