Writing Good Online Test Questions

Blackboard

Multiple-Choice Questions (MCQs) are the most common format for computer-aided tests. Because they are often poorly written and constructed there can be a feeling that this type of question is not suitable for assessment in Higher Education (HE). There is an argument that they cannot test higher level thinking, that they test answer recognition rather than answer construction and allow for guessing.

However, well-constructed MCQs allow for objective testing with high test validity, and can measure knowledge, understanding, analysis and the application of knowledge. Writing MCQs requires skill and can be time-consuming, and often questions are seen outside the context of the test as a whole. Attempts at writing questions which test higher level thinking can result in complicated questions which do more to test the student’s understanding of English than subject knowledge.

“I wouldn’t start from here!”

Like all testing, writing MCQs starts with the learning objectives.  Those which are measurable and observable, or which can be written in these terms are suitable for assessment by MCQ.

e.g. Understand the  appearance of sooty mildew on leaves.

How is this demonstrated by the student?

e.g. Describe the features of sooty mildew, the conditions of its growth and the means of eradicating it describes clearly what the student is to do.

Learning objectives which reflect clearly the level of knowledge, understanding, application or level of critical thinking required of the student will result in a higher level of validity in the test.

Components of MCQs

  • Item = the whole question
  • Stem = the actual question asked
  • Responses / choices / options = the answer options given
  • Distractors = incorrect responses

Constructing the Stem

  • Only test one thing per question
  • Construct the stem to elicit the correct answer, not to identify the wrong answer to aid learning
  • Keep it concise and clear: you are not testing the student’s ability to read English.
  • Make sure you contain as much of the item as possible, i.e. do not repeat information in the options.
  • Don’t try to mislead or include tricky information
  • Use negatives with caution
  • Avoid acronyms or abbreviations which might be interpreted in a misleading way

Constructing the Answers

  •  Make sure that there is only one, unambiguous correct answer
  • All answers must be of equal plausibility
  • Never use “All” or “None of the above”
  • Including a “joke” demeans the exercise and reduces the number of choices
  • Don’t make the correct answer guessable by using an order of correct response, e.g. a, b, c,…. Or most often b etc.

Feedback

  • Make sure that feedback addresses errors in individual responses
  • Include “feed forward” information
  • Write feedback whether you intend to use it in an examination or not- the questions might be used on another occasion

Test Construction

  • Provide questions at different levels of difficulty
  • Eliminate effective guessing:
    • If 4 choices are given a student who guesses all of the answers may score 25%, but equally they may score 0% or 100%. The average score for a group of students who all guess all of the answers will be 25%. Well-constructed questions will make it harder for a student to correctly guess the right answer.
    • Ask 2 questions which test the same objective in different ways. It is unlikely that students will guess correctly twice.
    • Negative marking: If 4 distractors and one right answer are given, the probability of a student guessing the right answer is 0.2%. A 0.25 mark penalty for the incorrect answer would give an average score of 0% across a cohort for a completely guessed examination. A “don’t know” option is sometimes used where negative marking is employed. +1 is given for a correct answer, -1 for an incorrect answer and 0 for a don’t know option. This is useful to gain feedback on students’ learning after the examination and gives less confident students an alternative to guessing.
  • Shuffle the answers
  • Randomise the questions

The Validity of Questions

Always get someone else to test your questions before you use them

Item Analysis

Some statistics available from CAA systems allow the level of difficulty of items to be assessed. An ideal test would contain only a few items which over 90% or fewer than 30% of students answered correctly. Blackboard has a feature called “Item Analysis” that allows you to do this.

Item Discrimination Indices

The results of students in the top and bottom quartile (for example) are compared to discriminate between those who know the answer and those who don’t. The ratio will be between -1.0 and +1.0. The closer the ration is to +1.0, the more effectively the question discriminates between the two groups of students. A ratio of 0.6 indicates that the question is fit for purpose. A ratio of less than 0.1 indicates that a large number of poor students answered the item correctly, and it will need to be reviewed.

Bloom’s Taxonomy

Anderson and Krathwohl published this Revised Bloom’s Taxonomy (2001)

MCQs can be written which reflect the bottom 4 levels without too much difficulty. Evaluating is more difficult to test, but can be done.

The Good, the Bad and the Ugly

BadBetter
“one cannot ever know positively whether or not supernatural phenomena, especially gods, exist.” (Thomas Huxley in 1876) This is generally accepted as the correct definition of
– the correct definition of agnosticism
– the correct definition of deism
– the correct definition of atheism
– the correct definition of fideism
“one cannot ever know positively whether or not supernatural phenomena, especially gods, exist.” (Thomas Huxley in 1876) This is generally accepted as the correct definition of
– agnosticism
– deism
– atheism
– fideism


A spinach plant was showing poor growth and flecks of black dust on the leaves. The most suitable solution to this problem is
– Increase the fertiliser
Spraying with an anti-fungal agent*
– Cut back and burn affected leaves
– An increase in humidity would eradicate the disease

*this is the only grammatically correct option, and is correct
A spinach plant was showing poor growth and flecks of black dust on the leaves. The most suitable solution to this problem is
– To increase the fertiliser
Spraying with an anti-fungal agent
– Cutting back and burning affected leaves
– Increasing the humidity around the leaves





A spinach plant was showing poor growth and flecks of black dust on the leaves.  The most likely cause of this is
An infestation of sooty mildew*
– Drought
– Over-watering
– Aphids

*Students will guess this answer as it is longer than the others
A spinach plant was showing poor growth and flecks of black dust on the leaves.  The most likely cause of this is
An infestation of sooty mildew
– A lack of water in the growing season
– Too much water in the growing season
– An attack of aphids




Which of the following is a symptom of sooty mildew on spinach?*
– Yellowing edges to the leaves
– Failure to put on growth
– Black, powdery spots on the leaves
– Thin, white patches on the leaves

*This question tests recall

A row of spinach is beginning to show signs of black, powdery patches on the inner surfaces of the outer leaves.  Watering of roots is increased, but moisture is kept off the leaves. The problem does not improve.
This is most likely to be caused by:
Powdery mildew
– Sooty mildew
– Aphid infestation
– Over-watering


This question speaks for itself…
As the level of fertility approaches its nadir, what is the most likely ramification for the citizenry of a developing nation?
– a decrease in the labour force participation rate of women
– a dispersing effect on population concentration a downward trend in the youth dependency ratio
– a broader base in the population pyramid
– an increase infant mortality rate
What is the most likely effect on developing nations of a significant drop in fertility?
– Fewer women join the labour force
– Populations begin to disperse 
– youth dependency ratio  will decrease
– a broader base in the population pyramid 
– an increased infant mortality rate 



Improving your questions

Give Scenarios.

Understanding can often be demonstrated by using a description of a situation where knowledge needs to be applied. Where judgement is an acceptable part of the decision-making process, distracters need to be distanced from the correct answer.

i.e.
Answer No. 1             Answer No. 2                      Answer No. 3                      Answer No. 4
Most likely                Next most likely                  Not likely                           Not at all likely

e.g.
A row of spinach is beginning to show signs of black, powdery patches on the inner surfaces of the outer leaves. What measure could be taken to best treat this condition?

  • Spray with an anti-fungal solution:  this is the correct answer
  • Increase the fertiliser:  this may have some effect, but will not cure the problem
  • Decrease the amount of water: this will have a small effect, but will weaken the plant further
  • Spray the leaves regularly to increase the humidity: this will make matters worse.

If necessary, 2 marks could be given for answer number one and 1 mark for number 2.

Using Images

Most question types allow images to be added to the question stem. These can be most useful where students are being asked about data, e.g. identifying a particular number or set of numbers, interpreting graphs or comparing visual information. Images need to be in a format suitable for display over the web, and of a suitable size to be displayed within the assessment software.

Some questions ask for a “hotspot” to be clicked, or a marker or series of markers to be dragged onto an image. Other questions can be created using the drag and drop or drag to match.

Other Question Types

Most online testing tools enable the use of several different types of question such as fill in the gaps, matching, matrix and ordering questions. These can be very useful in assessing higher level learning and the upper sections of the Bloom’s Taxonomy diagram.

Assertion-Reasoning questions are designed to test higher-level learning, although there is no convincing research to show that this is the case.

Each assertion is matched with a reason. In some cases students are asked to match the statements from a drop-down list, in others they are asked to say whether the assertion and the reason are true or false:

Q: Sooty mildew in spinach is common in damp years BECAUSE the disease is caused by a fungus which thrives in damp conditions
A: True, True

Q: Sooty mildew in spinach is diagnosed by the appearance of white spots on the outer margins of leaves BECAUSE the leaf structure is damaged at this point
A: False, false

Q: The main diagnostic feature of sooty mildew in spinach is black spots on the leaves BECAUSE this is a residue left by the vector host insects
A: True, False

More complex assertion-reasoning questions can end up as a test of logic. Sometimes the reasons are correct, but not for the assertion given and students are asked to state this.

Mentimeter Training

Workshop Name – MENTIMETER: AN INTRODUCTION
Introducing you to the different types of questions that can be built to stimulate interaction through a ‘fun factor’.  The anonymity may encourage engagement particularly from students who are less confident.

Scheduled sessions are bookable via StaffSpace 

In addition Mentimeter have developed their own self-directed: Getting Started with Mentimeter ‘Beginner’s Course’ – however, you will need to enroll yourself so that you can download your personalised certificate at the end of the course. The course covers both presenter and voter aspects of the tool.

Tests, Pools and Surveys

Blackboard

You can use tests and surveys to measure student knowledge, gauge progress, and gather information from students. Points can be assigned to test questions for grading evaluation, but survey questions are not scored.

Survey results are anonymous, but you can see if a student has completed a survey and view aggregate results for each survey question. 

The end of module evaluation survey is available to download from the following web link:

To set up a module evaluation within Blackboard please refer to this guidance: Module Evaluation in Blackboard.