A single link to the first track to allow the export script to build the search page
  • Pre-Conference | Training Sessions
  • Pre-Conference | Training Sessions

    Training Session, AA

    Quality Control Tools in Support of Reporting Accurate and Valid Test Scores

    Thursday April 07, 2016 8:00 AM - Thursday April 07, 2016 12:00 PM

    Meeting Room Level, Meeting Room 6

    Aster Tessema, American Institute of Certified Public Accountants; Oliver Zhang, The College Board; Alina VonDavier, Educational Testing Service

    All testing companies focus on ensuring that the test scores are valid, reliable, and fair. Significant resources are allocated to meet the guidelines of well-known organizations, such as AERA/NCME, and/or The international Test Commission Guidelines (Allalouf, 2007; ITC, 2011).

    In this workshop we will discuss traditional QC methods, the operational testing process, and new QC tools for monitoring the stability of scores over time.

    We will provide participants a practical understanding of:

    • The importance of flow charts and documentation of procedures
    • The use of software tools to monitor tasks
    • How to minimize the number of hand offs
    • How to automate activities
    • The importance of trend analysis to detect anomalies
    • The importance of applying detective and preventive controls
    • Having a contingency plan

    We will also show how to apply QC techniques from manufacturing to monitor scores.  We will discuss traditional QC charts (Shewhart and CUSUM charts), time series models, and change point models to the means of scale scores to detect abrupt changes (Lee & von Davier, 2013). We will also discuss the QC methods for the process of automated & human scoring of essays (Wang & von Davier, 2014).

    Training Session, BB

    IRT Parameter Linking

    Thursday April 07, 2016 8:00 AM - Thursday April 07, 2016 12:00 PM

    Meeting Room Level, Meeting Room 7

    Wim van der Linden and Michelle Barrett, Pacific Metrics

    The problem of IRT parameter linking arises when the values of the parameters for the same items or examinees in different calibrations need to be compared. So far, the problem has mainly be conceptualized as an instance of the problem of invariance of the measurement scale for the ability parameters in the tradition of S. S. Stevens’ interval scales. In this half-day training session, we show that the linking problem has not much to do with arbitrary units and zeros of measurement scales but is the result of a more fundamental problem inherent in all IRT models—general lack of identifiability of their parameters. The redefinition of the linking problem allows us to formally derive the linking functions required to adjust for the differences in parameter values between separate calibrations. It also leads to new efficient statistical estimators of their parameters, the derivation of their standard errors, and the use of current optimal test-design methods to design linking studies with minimal error. All these results have been established both for the current dichotomous and polytomous IRT models. The results will be presented during four one-hour lectures appropriate for psychometricians with interest and/or practical experience in IRT parameter linking problems.

    Training Session, CC

    21st Century Skills Assessment: Design, Development, Scoring, and Reporting of Character Skills

    Thursday April 07, 2016 8:00 AM - Thursday April 07, 2016 5:00 PM

    Meeting Room Level, Meeting Room 5

    Patrick Kyllonen and Jonas Bertling, Educational Testing Service

    This workshop will provide training, discussion, and hands-on experience in developing methods for assessing, scoring, and reporting on students’ social-emotional and self-management or character skills. Workshop will focus on (a) reviewing the kinds of character skills most important to assess based on current research; (b) standard and innovative methods for assessing character skills, including self-, peer-, teacher-, and parent- rating-scale reports, forced-choice (rankings), anchoring vignettes, and situational judgment methods; (c) cognitive lab approaches for item tryout; (d) classical and item-response theory (IRT) scoring procedures (e.g., 2PL, partial credit, nominal response model); (e) validation strategies, including the development of rubrics and behaviorally anchored rating scales, and correlations with external variables; (f) the use of anchors in longitudinal growth studies, (g) reliability from classical test theory (alpha, test-retest), item-response theory, and generalizability theory; and (h) reporting issues. These topics will be covered in the workshop where appropriate, but the sessions within the workshop will tend to be organized around item types (e.g., forced-choice, anchoring vignettes). Examples will be drawn from various assessments, including PISA, NAEP, SuccessNavigator, FACETS, and others. The workshop is designed for a broad audience of assessment developers, analysts, and psychometricians, working in either applied or research settings.

    Training Session, DD

    Introduction to Standard Setting

    Thursday April 07, 2016 8:00 AM - Thursday April 07, 2016 5:00 PM

    Meeting Room Level, Meeting Room 2

    Chad Buckendahl, Alpine Testing Solutions; Jennifer Dunn, Measured Progress; Karla Egan, National Center for the Improvement of Educational Assessment; Lisa Keller, University of Massachusetts Amherst; Lee LaFond, Measured Progress

    As states adopt new standards and assessments the expectations on psychometricians from a political perspective have been increasing. The purpose of this training session is to provide a practical introduction to the standard setting process while addressing common policy concerns and expectations.

    This training will follow the Evidence-Based Standard Setting (EBSS) framework. The first third of the session will touch upon some of the primary pre-meeting developmental and logistical activities as well as the EBSS steps of defining outcomes and developing relevant research as guiding validity evidence.

    The middle third of the session will be focused on the events of the standard setting meeting itself. The session facilitators will walk them through the phases of a typical standard setting, and participants will experience a training session on the Bookmark, Angoff, and Body of Work methods followed by practice rating rounds with discussion.

    The final third of the training session will give an overview of what happens following a standard setting meeting. This will be carried out through a panel discussion with an emphasis on policy expectations and the importance of continuing to gather evidence in support of the standard.

    Training Session, EE

    Analyzing NAEP Data Using Plausible Values and Marginal Estimation with AM

    Thursday April 07, 2016 8:00 AM - Thursday April 07, 2016 5:00 PM

    Meeting Room Level, Meeting Room 16

    Emmanuel Sikali, National Center for Education Statistics; Young Yee Kim, American Institues for Research

    Since results from the National Assessment of Education Progress (NAEP) serve as a common metric for all states and select urban districts, many researchers are interested in conducting studies using NAEP data. However, NAEP data pose many challenges for researchers due to its special design features. This class intends to provide analytic strategies and hands-on practice with researchers who are interested in NAEP data analysis. The class consists of two parts: (1) instructions on the psychometric and sampling designs of NAEP and data analysis strategies required by these design features and (2) the demonstration of NAEP data analysis procedures and hands-on practice. The first part includes marginal maximum likelihood estimation approach to obtaining scale scores and appropriate variance estimation procedures and the second part includes two approaches to NAEP data analysis, i.e. using the plausible values approach and the marginal estimation approach with item response data. The demonstration and hands-on practice will be conducted with a free software program, AM, using a mini-sample public-use NAEP data file released in 2011. Intended participants are researchers, including graduate students, education practitioners, and policy analysts, who are interested in NAEP data analysis.

    Training Session, FF

    Multidimensional Item Response Theory: Theory and Applications and Software

    Thursday April 07, 2016 8:00 AM - Thursday April 07, 2016 5:00 PM

    Meeting Room Level, Meeting Room 4

    Lihua Yao, Defense Manpower Data Center; Mark Reckase, Michigan State University; Rich Schwarz, ETS

    Theories and applications of multidimensional item response theory model (MIRT) and Multidimensional Computer Adaptive testing (MCAT) and MIRT linking are discussed. Software demonstrated and hands on experienced cover areas for multidimensional multi-group calibration, multidimensional linking, and MCAT simulation; intended for researchers who are interested in MIRT and MCAT.

    Training Session, GG

    New Weighting Methods for Causal Mediation Analysis

    Thursday April 07, 2016 1:00 PM - Thursday April 07, 2016 5:00 PM

    Meeting Room Level, Meeting Room 3

    Guanglei Hong, University of Chicago

    Many important research questions in education relate to how interventions work. A mediator characterizes the hypothesized intermediate process. Conventional methods for mediation analysis generate biased results when the mediator-outcome relationship depends on the treatment condition. These methods also tend to have a limited capacity for removing confounding associated with a large number of covariates. This workshop teaches the ratio-of-mediator-probability weighting (RMPW) method for decomposing total treatment effects into direct and indirect effects in the presence of treatment-by-mediator interactions. RMPW is easy to implement and requires relatively few assumptions about the distribution of the outcome, the distribution of the mediator, and the functional form of the outcome model. We will introduce the concepts of causal mediation, explain the intuitive rationale of the RMPW strategy, and delineate the parametric and nonparametric analytic procedures. Participants will gain hands-on experiences with a free stand-alone RMPW software program. We will also provide SAS, Stata, and R code and will distribute related readings. The target audience includes graduate students, early career scholars, and advanced researchers who are familiar with multiple regression and have had prior exposure to binary and multinomial logistic regression. Each participant will need to bring a laptop for hands-on exercises.

    Training Session, II

    Computerized Multistage Adaptive Testing: Theory and Applications (Book by Chapman and Hall)

    Thursday April 07, 2016 1:00 PM - Thursday April 07, 2016 5:00 PM

    Meeting Room Level, Meeting Room 6

    Duanli Yan, Educational Testing Service; Alina von Davier, ETS; Kyung Chris Han

    This workshop provides a general overview of a computerized multistage test (MST) design and its important concepts and processes. The focus of the workshop will be on MST theory and applications including alternative scoring and estimation methods, classification tests, routing and scoring, linking, test security, as well as a live demonstration of MST software MSTGen (Han, 2013). This workshop is based on the edited volume of Yan, von Davier, & Lewis (2014). The volume is structured to take the reader through all the operational aspects of the test, from the design to the post-administration analyzes. The training course consists of a series of lectures and hands-on examples in the following four sessions:

    • MST Overview, Design, and Assembly
    • MST Routing, Scoring, and Estimations
    • MST Applications
    • MST Simulation Software

    The MST design is described, why it is needed, and how it differs from other test designs, such as linear test and computer adaptive test (CAT) designs.

    This course is intended for people who have some basic understanding of item response theory and CAT. 

    Training Session, JJ

    Landing Your Dream Job for Graduate Students

    Friday April 08, 2016 8:00 AM - Friday April 08, 2016 12:00 PM

    Ballroom Level, Renaissance West B

    Deborah Harris and Xin Li, ACT, Inc.

    This training session will address practical topics graduate students in measurement are interested in regarding finding a job and starting a career. It will concentrate on what to do now while they are still in school to best prepare for a job (including finding a dissertation topic, selecting a committee, maximizing experiences while still a student with networking, internships, and volunteering, and providing suggestions to the questions regarding what types of coursework an employer looks for, and what would make a good job talk), how to locate, interview for, and obtain a job (including how to find where jobs are, how to apply for jobs --targeting cover letters, references, and resumes), what to expect in the interview process (including job talks, questions to ask, and negotiating an offer), and what’s next after they have started their first post PhD job (including adjusting to the environment, establishing a career path, publishing, finding mentors, balancing work and life, and becoming active in the profession). The session is interactive, and geared to addressing the participants’ questions during the session. Resource materials are provided on all relevant topics.

    Training Session, KK

    Bayesian Analysis of IRT Models using SAS PROC MCMC

    Friday April 08, 2016 8:00 AM - Friday April 08, 2016 12:00 PM

    Meeting Room Level, Meeting Room 4

    Clement Stone, University of Pittsburgh

    There is a growing interest in Bayesian estimation of IRT models, in part due to the appeal of the Bayesian paradigm, as well as the advantages of these methods with small sample sizes, more complex models (e.g., multidimensional models), and simultaneous estimation of item and person parameters. Software has become available, SAS and WinBUGS, which make a Bayesian analysis of IRT models more accessible to psychometricians, researchers, and scale developers.

    SAS PROC MCMC offers several advantages over other software, and the purpose of this training session is to illustrate how SAS can be used to implement a Bayesian analysis of IRT models. After reviewing briefly Bayesian methods and IRT models, PROC MCMC is introduced. This introduction includes discussion of a template for estimating IRT models as well as convergence diagnostics and specification of prior distributions. Also discussed are extensions for more complex models (e.g., multidimensional, mixture) and methods for comparing models and evaluating model fit.

    The instructional approach will be one involving lecture and demonstration. Considerable code and output will be discussed and shared. An overall objective is that attendees can extend examples to their testing applications. Some understanding of SAS programs and SAS procedures is helpful.

    Training Session, LL

    flexMIRT®: Flexible Multilevel Multidimensional Item Analysis and Test Scoring

    Friday April 08, 2016 8:00 AM - Friday April 08, 2016 5:00 PM

    Meeting Room Level, Meeting Room 2

    Li Cai, University of California - Los Angeles; Carrie R. Houts, Vector Psychometric Group, LLC

    There has been a tremendous amount of progress in item response theory (IRT) in the past two decades. flexMIRT® is IRT software which offers multilevel, multidimensional, and multiple group item response models. flexMIRT® also offers users the ability to obtain recently developed model fit indices, fit diagnostic classification models, and models with non-normal latent densities, among other advanced features. This training session will introduce users to the flexMIRT® system and provide valuable hands on experience with the software.

    Training Session, MM

    Aligning ALDs and Item Response Demands to Support Teacher Evaluation Systems

    Friday April 08, 2016 8:00 AM - Friday April 08, 2016 5:00 PM

    Meeting Room Level, Meeting Room 5

    Steve Ferrara, Pearson School; Christina Schneider, The National Center for the Improvement of Educational Assessment

    A primary goal of achievement tests is to classify students into achievement levels that enable inferences about student knowledge and skill. Explicating how knowledge and skills differ in complexity and empirical item difficulty—at the beginning of test design—is critical to those inferences. In this session we demonstrate for experts in assessment design, standard setting, formative assessment, or teacher evaluation how emerging practices in statewide tests for developing ALDs, training item writers to align items to ALDs, and identifying item response demands can be used to support teachers to develop student learning objectives (SLOs) in nontested grades and subjects. Participants will analyze ALDs, practice writing items aligned to those ALD response demands, and analyze classroom work products from teachers who used some of these processes to create SLOs. We will apply a framework for connecting ALDs (Egan et al., 2012), the ID Matching standard setting method (Ferrara & Lewis, 2012), and item difficulty modeling techniques (Ferrara et al., 2011; Schneider et al., 2013) to a process that generalizes from statewide tests to SLOs, thereby supporting construct validity arguments for student achievement indicators used for teacher evaluation.

    Training Session, NN

    Best Practices for Lifecycles of Automated Scoring Systems for Learning and Assessment

    Friday April 08, 2016 8:00 AM - Friday April 08, 2016 5:00 PM

    Ballroom Level, Renaissance East

    Peter Foltz, Pearson; Claudia Leacock, CTB/McGraw Hill; André Rupp and Mo Zhang, Educational Testing Service

    Automated scoring systems are designed to evaluate performance data in order to assign scores, provide feedback, and/or facilitate teaching-learning interactions. Such systems are used in K-12 and higher education for such areas as ELA, science, and mathematics, as well as in professional domains such as medicine and accounting, across various use contexts. Over the past 20 years, there has been rapid growth around research on the underlying theories and methods of automated scoring, the development of new technologies, and ways to implement automated scoring systems effectively. Automated scoring systems are developed by a diverse community of scholars and practitioners encompassing such fields as natural language processing, linguistics, speech science, statistics, psychometrics, educational assessment, and learning and cognitive sciences. As the application of automated scoring continues to grow, it is important for the NCME community to have an overarching understanding of the best practices for designing, evaluating, deploying, and monitoring such systems. In this training session, we provide participants with such an understanding via a mixture of presentations, individual and group-level discussions, and structured and free-play demonstration activities. We utilize systems that are both proprietary and freely available, and provide participants with resources that empower them in their own future work.

    Training Session, OO

    Test Equating Methods and Practices

    Friday April 08, 2016 8:00 AM - Friday April 08, 2016 5:00 PM

    Meeting Room Level, Meeting Room 3

    Michael Kolen and Robert Brennan, University of Iowa

    The need for equating arises whenever a testing program uses multiple forms of a test that are built to the same specifications. Equating is used to adjust scores on test forms so that scores can be used interchangeably. The goals of the session are for attendees to be able to understand the principles of equating, to conduct equating, and to interpret the results of equating in reasonable ways. The session focuses on conceptual issues. Practical issues are considered.

    Training Session, PP

    Diagnostic Measurement: Theory, Methods, Applications, and Software

    Friday April 08, 2016 8:00 AM - Friday April 08, 2016 5:00 PM

    Ballroom Level, Renaissance West A

    Jonathan Templin and Meghan Sullivan, University of Kansas

    Diagnostic measurement is a field of psychometrics that focuses on providing actionable feedback from multidimensional tests. This workshop provides a hands-on introduction to the terms, techniques, and methods used for diagnosing what students know, thereby giving researchers access to information that can be used to guide decisions regarding students’ instructional needs. Upon completion of the workshop, participants will be able to understand the rationale and motivation for using diagnostic measurement methods. Furthermore, participants will be able to understand the types of data typically used in diagnostic measurement along with the information that can be obtained from implementing diagnostic models. Participants will become well-versed in the state-of-the-art techniques currently used in practice and will be able to use and estimate diagnostic measurement models using new software developed by the instructor

    Training Session, QQ

    Effective Item Writing for Valid Measurement

    Friday April 08, 2016 1:00 PM - Friday April 08, 2016 5:00 PM

    Ballroom Level, Renaissance West B

    Anthony Albano, University of Nebraska-Lincoln; Michael Rodriguez, University of Minnesota-Twin Cities

    In this training session, participants will learn to write and critique high-quality test items by implementing item-writing guidelines and validity frameworks for item development. Educators, researchers, test developers, and other test users are encouraged to participate.

    Following the session, participants should be able to: implement empirically-based guidelines in the item writing process; describe procedures for analyzing and validating items; apply item-writing guidelines in the development of their own items; and review items from peers and provide constructive feedback based on adherence to the guidelines. The session will consist of short presentations with small-group and large-group activities. Materials will be contextualized within common testing applications (e.g., classroom assessment, response to intervention, progress monitoring, summative assessment, entrance examination, licensure/certification).

    Participants are encouraged to bring a laptop computer, as they will be given access to a web application that facilitates collaboration in the item-writing process; those participating in the session in-person and remotely will use the application to create and comment on each other’s items online. This practice in item writing will allow participants to demonstrate understanding of what they have learned, and receive feedback on their items from peers and the presenters.