National Records of Scotland

Preserving the past, Recording the present, Informing the future

2001 Census Ethnicity Reports - Report on Coding of Any Mixed Background

2001 Census Ethnicity Reports - Report on Coding of Any Mixed Background

4. Text Descriptors

Table 2 (on the Tick and Text Data section of the report) shows that there were 9,776 cases in which some text was captured on the form. To make this text amenable to analysis, it has undergone some grooming. Grooming entailed a certain amount of coding afresh, but the aim was to distil the text entry or entries for each person into one or more descriptors of ethnicity. The text was groomed as follows.

  • Spelling was corrected.

  • The terms used were standardised e.g. adjectives were preferred to country names; also Scottish preferred to Scots; English to Anglo; Filipino to all variations of Philippines, etc. (even Filipina when the person was female). Some similes have certainly been left as separate descriptors e.g. Amerindian and native-American, Black-Caribbean and African-Caribbean, to leave the terms as close as possible to what the form-filler wrote.

  • Qualifying words such as ‘Half’ were transferred to a separate item in the analysis ‘type of descriptor’. Other such transformations were: ‘Mother English Father Nigerian’ became ‘English Nigerian’ with ‘Mother-Father’ in ‘type of descriptor’. Separating the information on ancestry from ethnic descriptors helps standardise the descriptors while keeping the information provided about a person’s particular ancestry.

  • Sometimes, although the person may be of mixed ethnicity, we are left with only one descriptor. For example ‘Half Jamaica’ will become a single descriptor of ‘Jamaican’, although ‘Half’ will appear in ‘type of descriptor’.

  • Combinations of terms were kept – and hyphenated - if one was seen as qualifying a descriptor rather than constituting a separate descriptor
    eg White-Scottish, Kurdish-Turkish, African-Caribbean.

  • The final stage of grooming was to remove duplicate descriptors that may have come from text written in several text boxes.