Time: 12 December 2018, 1 - 2 pm Place: B705

### Abstract

Item response theory (IRT) can be used for measuring latent abilities of students. For example, a latent ability in mathematics can be measured through manifest results in mathematics tests. Different groups may take different tests. In order to calibrate abilities of different groups to be on the same scale, each test has some common questions called anchor items. When the groups differ substantially this is referred to as the non-equivalent anchor test (NEAT) design and the method of associating results of students from different grades (e.g. school years) to a single score scale is called vertical scaling.

In this talk vertical scaling is considered under the one parameter logistic model (1PL).

We consider concurrent calibration which entails a single analysis of data from two tests.

Our objective is to examine the method of concurrent calibration for the NEAT design in detail.

First, we compare different estimation methods. Joint maximum likelihood estimation and marginal maximum likelihood estimation combined with weighted maximum likelihood estimation are considered. Further, we consider different ways of designing the two tests, namely different choices of item difficulties. To do this we simulate two achievement tests assuming different scenarios for the distribution of students' abilities in the two tests.