## INTRODUCTION

## METHODS

### Subjects

*SD*=2.27,

*M*=26).

### Instrument

*Teacher as facilitator*. The indicators of

*Teacher as facilitator*were adapted from Dolmans and Ginn’s tutor effectiveness questionnaire [8]. The following sections elaborate on the development of indicators for each factor:

### Statistical analysis

*lavaan*and

*sem*. Each individual construct of the PBL implementation questionnaire was tested separately before testing the structural model. To increase the individual construct model fitness, re-specification was conducted by reducing items with a small standardized loading estimate. As a consequence, the total number of items between constructs was different.

## RESULTS

^{2}(384, N=207)=713.564, P=0.000, root mean square error of approximation (RMSEA)=0.065, CFI=0.923. The model consisted of 30 observed variables (N=207). For a model with 30 or more observed variables and N<250, the suggested fit statistics are as follows: CFI≥ 0.92 and RMSEA<0.08 [15]. Therefore, this model was used as the final measurement model of the PBL questionnaire without re-specification. The validity of a measurement model depends on both establishing acceptable levels of the goodness-of-fit for the measurement model and finding specific evidence of construct validity [16]. Therefore, the main objective of this study was not only to assure the goodness-of-fit of the PBL implementation questionnaire but also to assess its construct validity.

*small group*) to 0.921 (

*teacher as facilitator*and

*self-directed learning*). The alpha coefficient for the total items was 0.963, indicating that the questionnaire was internally consistent in measuring the target construct. The omega hierarchical coefficient (

*ωh*) for the PBL implementation scale was 0.97, confirming that the indicators of the PBL implementation scale measure a common latent variable [17] (i.e. the implementation of PBL at the institution). The omega hierarchical coefficient was calculated using the

*psych*package in the R statistical software.

## DISCUSSION

### Face validity

### Convergent validity

*student-centred learning*) to 0.661 (

*real-world problems*). Both factor loading and AVE indicate that the variance for each item in the PBL implementation questionnaire is explained more by a specific latent construct than by the measurement error. The CR values were above the suggested level of 0.7, ranging from 0.804 (

*student-centred learning*) to 0.906 (

*self-directed learning*). This indicates that the indicators of each construct are strongly interrelated [18].

### Discriminant validity

*problem as stimulus*(0.433), SCL and

*real-world problem*(0.398), SCL and

*teacher as facilitator*(0.171), SCL and SDL (0.477), and SCL and

*small group*(0.301). This means that the indicators of the SCL factor measure a specific construct that was not measured by the other factors. The other factors also showed a similar result, with all AVE values higher than the value of the inter-construct squared correlations. Another way to show discriminant validity is to use the average shared squared variance (ASV); discriminant validity is achieved when the AVE is greater than the ASV. The ASV was computed by averaging the inter-construct squared correlations. For example, the ASV of SCL=(0.433+0.398+0.171+0.477+0.301)/5=0.358. Table 3 shows that the AVE values of all factors are higher than the ASV, which indicates discriminant validity. Finally, the absence of factor cross-loading in the PBL measurement model also supports the discriminant validity of the PBL measurement model. Cross-loading is a condition where an indicator loads to more than one construct. Fig. 1 shows that all indicators load to only one factor.

### Factors with three indicators

*student-centred learning*and

*small group*factors. A three-indicator model, or

*just-identified*model, by nature will lead to a perfect fit, as there are just enough degrees of freedom to estimate all the parameters (degree of freedom=0). Just-identified models do not test theories because their fit is determined by their specific circumstances. However, a model with three-indicator factors is acceptable, particularly when other factors have more than three indicators [16]. In the present study, these three-indicator factors are acceptable because the measurement model includes other factors that each consist of more than three indicators:

*problem as stimulus*(six indicators),

*real-world problems*(four indicators),

*teacher as facilitator*(eight indicators), and

*self-directed learning*(six indicators). Although goodness-of-fit does not apply to a just-identified model, the model can still be evaluated in terms of the interpretability and strength of its parameter estimates (e.g. magnitude of factor loading) [18]. In the present study, the questionnaire was reviewed by experts on PBL and the chosen methodology. These experts’ agreement on the validity of the questionnaire provides sufficient evidence of good interpretability. Finally, the factor loadings of

*student-centred learning*(0.71, 0.92, and 0.63) and

*small-group*factor (0.75, 0.60, and 0.97) were all satisfactory.

### Correlated measurement errors

*teacher as facilitator*and self-directed learning. In cross-sectional studies, there should be no correlated measurement errors; that is, the indicators should measure nothing other than the construct that the indicator is intended to measure. However, a correlated measurement error is acceptable in panel studies because the shared variance between the indicators might come from a prior measurement effect. The correlated measurement error can, however, be justified in a cross-sectional study when there is evidence of

*source*or

*method effects*. Method effects exist when the measurement approach, rather than the substantive latent factors, causes differential covariance among items [18]. Some possible method effects related to the present study are the

*scale format*and

*scale anchor*,

*similar item wording*, and

*social desirability*[19].

*teacher as facilitator*factor in the PBL questionnaire. Item B14_E1 (‘The tutors have a clear picture about their strengths/weaknesses as a tutor’) and B14_E2 (‘The tutors are clearly motivated to fulfil their role as a tutor’) were suspected to have similar levels of social desirability as compared to the other items, one possible cause of their correlated errors. The correlated measurement errors in the present study are also acceptable because the variance of most items came from the latent construct rather than from the measurement error, with the two strong pieces of evidence being that the factor loadings of most items were higher than 0.70 and that the AVE values of most items were higher than 0.5. Therefore, although the measurement errors were correlated—indicating the existence of an unknown construct—most of the items’ variance still came from the latent factor, not from the unknown construct. Finally, the correlated errors existed within single factors, with no inter-factor correlated errors. Thus, the correlated error did not violate the model’s underlying theory.