Forschungen

Reducing 3-dimensional to 2-dimensional data

SOFA statistics is free, easy to learn and provides all basic functions that you usually need. Today, however, I came across a problem when trying to recode the data. I checked their help Wiki but couldn’t find the answer. So I tried to figure out my own way and post it below.

Here is the problem: The data comes from a questionnaire where people entered their language skills. Obviously, each individual can give more than one answers if they know more than one language. I therefore ended up with a table like this:

ID languages
1 A, B, C
2 C
3 C, D
4 A, C, D

„ID“ stands for the individual persons (each with a unique number) and the capital letters are each one language.

If I enter this date into SOFA, it interprets all selected languages for each person as one string:
„A, B, C“ is treated as one language, „C“ as another one, „C, D“ as the third language, and „A, C, D“ as the fourth. Of course, the software doesn’t know that „A, B, C“ is a list of many items, rather than a new name for one item.

What I need, however, is the possibility to process the elements inside a field independently, i.e. as list of values assigned to a person for one particular question.

For example, I could analyse the frequency: 2 persons replied with „A“ and 4 persons with „C“.

I therefore need to recode the column „languages“ into new columns, one for each language, using a function like „Does x contain y?“ that returns „yes“ or „no“.

The table would then look like this:

ID A B C D
1 y y y n
2 n n y n
3 n n y y
4 y n y y

The trick can be easily done in a spreadsheet. I tried it with Libre Office (that’s basically Open Office), but it should work as well with Excel, possibly with a slightly different formula.

The result looks like this:

ID original A B C D
1 A, B, C TRUE FALSE FALSE FALSE
2 C FALSE TRUE FALSE TRUE
3 C,D FALSE FALSE TRUE TRUE
4 A,C,D FALSE TRUE TRUE TRUE

Where you see TRUE and FALSE, I used the formula

=ISNUMBER(SEARCH(<field on top>;<field with originals on left>))

where you insert the right positions for the <>. For example, the formula

=ISNUMBER(SEARCH(C1;B2))

goes into the intersection of „A,B,C“ and „A“ – assuming that the field C1 contains the value „A“ and the field B2 contains the value „A,B,C“. You probably need to adjust it to your table.

Filling the rest of the formulas is quite simple. In Libre Office you first make the formula work for one field and then you copy and paste it into the other fields – the software will make the necessary changes.

Depending on the format of the intersecting fields, you might see „1“ and „0“ instead of „TRUE“ and „FALSE“.