Biosfer: Jurnal Pendidikan Biologi

The Final The quality of the question item is important to know to produce more accurate measurements.


INTRODUCTION
Evaluation is one of the stages that must be taken in a learning activity. Rahmadhani (2014) says to the role of the teacher as an evaluator is very important to conduct a good and objective evaluation, besides that in the pedagogic field, the teacher must also be able to compile quality questions. An evaluation tool or test must have good quality, so as not to have an impact on measuring the ability of students. A good test that can be used is that it must contain reliable, discriminating power, and a good level of difficulty (Arvianto, 2016).
Final Semester Assessment is a very important thing to do. The purpose of implementing the Final Semester Assessment is as a form of evaluation to measure the achievement of student learning competencies that have been taught by the teacher for one semester. Susanto et al., (2015), said that the final semester score is an illustration of the mastery of competencies learned by students in the learning process at school for one semester, so good quality questions are needed.
Based on interviews with one biology teacher at State Senior High Schools in Gondokusuman, questions of the final exams were designed by Biology teachers and have never been analyzed, so the quality is not yet known. Hasanah et al. (2016), the final exam questions as a measuring tool need to be analyzed before being tested on students. Based on the description of the problem, the Biology teacher in Gondokusuman District has not taken steps to develop questions according to standards. Steps are needed to develop questions that are following standards to get quality questions. The quality of the questions can be known if they have analyzed the questions (Rahmadhani, 2014) Based on the review document result of the odd final exam for Semester 2020/2021, many students still have not passed the Minimum Criterion of Mastery Learning (MCML). The 10% of 6th State Senior High School Yogyakarta and 83.88% of 9th State Senior High School Yogyakarta have not passed MCML. Kunandar (2014) if most of the student score below the MCML, it can be caused by questions that are arranged difficult and do not refer to the substance. It could also be that the learning carried out by the teacher cannot be understood by the students. Meanwhile, suppose almost all students get very high scores. In that case, there are several possibilities, such as questions that are arranged too easily, questions that do not follow the rules of making good questions, and the implementation is very loose, allowing students to cooperate or cheat.
One way to determinate students' abilities and results are to do an item analysis. Lubis and Prastowo (2017) said that the analysis of the quality of the items was critical to measure the achievement of student competence. The item analysis consists of two, namely qualitative analysis and quantitative analysis. Febriani (2016) said that item analysis is carried out to test each item's feasibility level based on the difficulty level and distinguishing power of the question because not all items are considered suitable for use. Item revision is based not only on the difficulty level index and differentiating power of the questions but also on the effectiveness of the distractors for each item.
In this study, researchers used the Rasch model analysis with the help of Winsteps software version 4.5.2 to analyze quantitative data, while qualitative data used a test card. The measurement results using the Rasch model can be calibrated; besides, it is not deterministic, so it can identify the object being measured carefully (Sumintono & Widhiarso, 2013). Rasch model can be used to simultaneously analyze the validity, reliability, suitability of persons and items. The Rasch model has advantages over other methods (Tenant et al., 2004;American Educational Research Association, 2014), especially Classical Test Theory (CTT), which can provide a linear scale with the same interval, is able to predict missing data so that the analysis results will be more accurate, able to produce standard error measurement values on the instrument so that it can increase the accuracy of calculations, be able to detect model inaccuracies, and can produce replicable measurements (Stolt et al., 2022). This study aims to determine the quality of the items in the Final Semester Assessment of State Senior High School in the Gondokusuman District which includes validity, reliability, difficulty level, discriminating power, and distractor effectiveness.

Research Design
This research is included in the quantitative descriptive research, using data from students answer sheets of the final semester assessment.. The research was conducted at 'Universitas Yogyakarta Dahlan which is located at Jalan Ring road selatan, Tamanan, Banguntapan, Bantul, Special Region of E-ISSN: 2614-3984 357 Yogyakarta. The time of research was carried out in April-May 2021 for the 2020/2021 academic year. The student answer sheets of the final semester assessment taken from all students of class X MIPA at state senior high school Yogyakarta in Gondokusuman sub-district for the academic 2020/2021 have not yet been analyzed; a total of the student answer sheets 462. Data collection techniques used are observation and documentation.I. The data analysis technique used the Rasch model with software Winstep version 4.5.2. The data of the final semester assessment be analyzed for validity, reliability, and suitability of person and items simultaneously. The data analysis technique used is descriptive qualitative and quantitative analysis.

Population and Samples
In the sub-district Gondokusuman, there are 3 State Senior High SYogyakarta, namely state senior high school 3rd of Yogyakarta, State senior high school 6th of Yogyakarta, and state senior high school 9th of Yogyakarta. However, the student's answer sheet for the Final Semester Assessment of class X MIPA at 3rdYogyakartasenior High School Yogyakarta had been analyzed. The sampling technique used was saturated sampling, namely State senior high school of Yogyakarta in sub-district Gondokusuman students' answer sheets for the final semester assessment not yet analyzed. Data collection techniques used observation and documentation. The data analysis technique used descriptive qualitative and quantitative analysis.

Instrument
The instruments used in this study were multiple-choice test cards, questions, and answer sheets. The data obtained were in the form of grids, questions, student answer sheets, and answer keys for the Final Semester Assessment of Biology for class X MIPA at 6th and 9th State Senior High School Yogyakarta. Two lecturers of the Biology Education study program validated the question study card. The multiple-choice card review can be shown in Table 2. Deterrence works 3.
Have the correct/most correct answer B. CONSTRUCTION 4.
The subject matter is formulated clearly and firmly 5.
The formulation of the question and the formulation of the answer are only required statements 6.
The subject matter does not give a clue about the correct answer 7.
The subject matter does not contain double negative statements 8.
Homogeneous and logical answer choices in terms of material The length of the answer formulation is relatively the same 10.
The answer c"oices do not contain the statement "All o" the"answer choices above are wrong" or "All of "he answer choices above are correct" 11.
The choice of numbers in the form of numbers or time is arranged based on the order of the size of the value of the number of chronological time 12.
Pictures, graphs, tables, diagrams, and the like are clear and functional 13.
Items do not depend on the previous question C. LANGUAGE 14.
The question uses language that is following the rules of Indonesian 15.
The language used is communicative 16.
Do not use the local language 17.
Answer choices do not repeat words/phrases that are not have a unified meaning (Source: Kunandar, 2014)

Procedure
In the first stage, the researcher asked permission to conduct research at 6th and 9th State Senior High School Yogyakarta. In the second stage, the researcher interviewed biology teachers at 6th and 9th State Senior High School Yogyakarta to determine whether the questions Final Semester Assessment for class X Biology subject for the 2020/2021 academic year had been analyzed or not. In the third stage, E-ISSN: 2614-3984 359 the researcher collects data using grids, questions, answer keys, and student answer sheets. In the fourth stage, the researcher conducted a qualitative and quantitative analysis. Qualitative analysis uses a question card review. The analyzed aspects are based on material, construction, and language. Quantitative analysis using Winsteps software version 4.5.2. Software Winsteps is a tool in the Rasch Model to analyze scores generated from test instruments. Winsteps software can determine Outfit MNSQ, Outfit ZSTD, Point Measure Correlation, Item Reliability, and Alpha Cronbach. The MNSQ outfit describes the suitability of th' data with the model used. Cronbach's Alpha value describes the reliability of the items (Azizah & Wahyuningsih, 2020). The aspects seen are validity, reliability, level of difficulty, discriminating power, and distractor effectiveness. If the analysis result meet the requirements, the item can be entered into the question bank, while the item if not used if it does not meet the requirements. In more detail, the research procedure is described in Figure 1.

Data Analysis Techniques
The data analysis technique in this research is descriptive qualitative and quantitative. The data analyzed is the Final Semester Assessment of class X for the academic year 2020/2021. The total questions at 6th State Senior High School Yogyakarta are 50 questions involving 248 students and 40 questions with 214 students at 9th State Senior High School Yogyakarta. The qualitative descriptive analysis uses a question card covering material, construction, and language aspects, while the quantitative descriptive analysis uses Winsteps software version 4.5.2. The aspects are validity, reliability, level of difficulty, differentiating power, and distractor effectiveness. Item fit order (level of item suitability) indicates validity, Cronbach's alpha value indicates reliability, the item size indicates the difficulty level of the item, and the separation value indicates the distinguishing power. Furthermore, the amount of data in the category/choice/frequency nuisance table: in the order of size indicates the effectiveness of the distractor.

Quantitative Analysis a. Validity
Item fit order (level of item suitability) indicates validity. The results of the item validation analysis can be seen in Table 3.  38,36,46,44,24,43,18,41,50,23,30,26,29,31,35,33,40,and 49 18 36% Valid 2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,20,21,22,28,32,34,47,25,45,19,48,37,1,39,42 7, 18, 15, 35, 14, 10, 22, 24, 28, 32, 34, 1, 4, 5, 6, 31, 19, 36, 37, 8, 11, 12, 21, 13, 16, 20, 26, 29, 23, 9,  Based on Table 3, the validity quality questions of the final assessment semester in the ninth state senior high school of Yogyakarta are better than in the sixth state senior high school of Yogyakarta. The item state valid can view from the amount of Infit and Outfit values that are maximum measured. The question item is declared fit if the MNSQ Outfit value is between 0.5 to 1.5; Outfit ZSTD values are between -2.0 to +2.0; and the correlation point measure value is between 0.4 to 0.85 (Sumintono & Widhiarso, 2015). Widyaningsih and Yusuf (2018) the problem of misfit means having indications of misconceptions in understanding and having a problem working questions. Rahmani et al. (2015)  In addition to the validity aspect of the item, it needs to be analyzed unidimensionality. According to Misbach and Sumintono (2014), unidimensionality is important to know whether the instrument developed can measure what should be measured. Based on results, the raw variance measurement results are 32.9% and 43.1%. This result indicates that the minimum unidimensionality requirement of 20% has been met. In addition, obtained variances that instruments cannot explained show independence in instruments of good value because it is below 7% (Misbach & Sumintono, 2014). Thus, the instrument developed is valid enough to measure student's abilities (Novinda et al., 2019). b. Reliability Cronbach's alpha value indicates reliability. The results of the reliability analysis of the problem can be seen in Table 4. Bulqis (2019) states reliability means reliable and trustworthy because by doing reliability analysis, we can find how much consistency or determination to measure problems so that problems can be tested in school and equivalent. Based on the results of data analysis using Winsteps software, in Table 4, the person measure value is 2.70 and 0.57 logit. This grade shows all students' average grades in the problem items. Cronbach's alpha value measures reliability, indicating the interaction between the person and the overall problem items of low and destructive values of 0.47 and 0.62. Cronbach's alpha value describes data that does not vary, but this value does not affect validity.
The person reliability values in Table 4 are 0.21 and 0.64, whereas reliability items are 0.96 and 0.98. Hasanah et al. (2016), said that the reliability value getting closer to the number 1 than the reliability value of the test or test is higher / better. It can be known that the consistency of answers from students is weak, but the quality of the problem items in the instrument is exceptional. According to Wahyudi et al. (2020), finding out whether the item developed quality or not can be seen from the value of reliable items. This result shows that the quality of items for the Final Semester Assessment of Biology subjects in 6th and 9th State Senior High School Yogyakarta is outstanding because it falls above 0.94 (Sumintono & Widhiarso, 2015).
The value can also be reliable in the Rasch model of person separation and separation items. The greater the person separation value, the better the test used because it can reach respondents' ability. High values of separation items also show the better the measurements taken (Sumintono & Widhiarso, 2015). So, it can be known that the reliability of the Final Semester Assessment of Biology subjects in 6th and 9th State Senior High School Yogyakarta is excellent because it is above 0.94. The item size indicates the difficulty level of the item. The results of the problem item difficulty analysis can be seen in Table 5.  Sari and Herawati (2014) stated that the level of difficult questions is to measure the ease and difficulty of the problem to be tested. Analysis of the level of difficulty of the questions is very important because it is used to calibrate the questions in determining the questions in easy, medium, and difficult criteria so that it can consider the proportion of each criterion in the prepared question sheet. The difficulty level in Rasch model analysis can be known by looking at the output of the item measure table. The problem's difficulty evel based on the Rasch model is determined mainly by the student's response/answer to the problem; this distinguishes it from conventional analysis .
According to Sabekti and Khoirunnisa (2018), the difficulty level of problem items can be classified by comparing the measured value of each problem item with the value of the S.D measure. Misbach and Sumintono (2014), in their research, said that if obtained the average logit item is not 0.0, then overall, the instrument is not good. After screening the problem of misfits and outliers obtained, the results of the analysis of the level of difficulty in the Final Semester Assessment Gasal Biology subjects class X at 6th State Senior High School Yogyakarta can be categorized into four categories, namely, as many as 2 points of questions with a percentage of 11.11% including very difficult problems, 9 points of questions with a percentage of 50% including difficult problems, 4 points of questions with a percentage of 22.22% including easy questions, and 3 points of questions with a percentage of 16.66% including very easy problems. The results of the analysis of the Final Semester Assessment of Biology class X subjects at 9th State Senior High School Yogyakarta are categorized into four categories, namely as many as five questions with a percentage of 15.62% including very difficult problems, 9 questions with a percentage of 28.18% including difficult problems, 13 questions with a percentage of 40.62% including easy questions, and 5 points of questions with a percentage of 15.62% including very easy problems.
Syadiah and Hamdu (2020) said that the high difficulty level of questions could be seen from the highest logit values. This condition corresponds to the total score column representing how many respondents answered correctly on the tested question (Widyaningsih & Yusuf, 2018). According to Irmalasari et al., (2016), this follows the theory that if the difficulty level is lower than the student's ability, then the questions is relatively easy. Conversely, the questions is relatively tricky if the difficulty level is higher than the student's ability. If student's level of difficulty and ability is balanced, then the questions is classified as moderate. According to Erfan et al. (2020), a good quality questions is a question that is not too difficult and not too easy. Questions that have a low difficulty level with logit values below -1 must be revised again (Ibnu et al., 2019). d. Distinguishing power The separation value indicates the distinguishing power. The results of the analysis of the distinguishing power can be seen in Table 6. According to Sumintono and Widhiarso (2015), grouping the distinguishing power of questions more accurately is called strata separation. Kunandar (2014) stated that distinguish of a question is the ability of a question to distinguish whether student's have mastered the material or not. Rasch model does not contain parameters of discrimination, all points of the questions are determined to have equality in the power of discrimination. The item's difficulty level is the only item parameter focused on the Rasch model (Sumintono & Widhiarso, 2015). Alfarisa and Purnama (2019) said that the Rasch model with (1PL) characteristics of visible question items is the difficulty level of the grain, while the other power is considered constant. Sumintono and Widhiarso (2015) say the grouping of distinguishing power of questions is more thoroughly called strata separation (H). Ibnu et al. (2019) said that the greater the value of separation, the quality of the question item instrument used is very good because it can identify the group of question points with the group of respondents. Susdelina et al., (2018) stated that the analysis of Rasch's model differs from classical test theory to distinguish student's high and low ability using analysis at the level of individual ability. In addition, it can be seen by identifying the group of respondents based on the respondent separation index. The results of the analysis of the distinguishing power of the question using the Rasch model can be seen in the output of the summary statistics table in the separation column.
Based   In interpreting the effectiveness of distractors (Tables 7 and 8), used the criteria from Oktanin and Sukirno (2015). The effectiveness of the distractor can be seen in the output table of item category/option/distractor frequencies: measure order. Tables 7 and 8 shows that most student can work correctly on item questions than they can not be worked. Case and Donahue (2008) said that a distractor that works means it can produce more difficult item questions. The distractor also reduce random guessing of answers to improve the performance of the questions. Thus, it can be known that the distractor/cheater on the final semester assessment of biology subjects class X at 6th and 9th State Senior High School Yogyakarta does not function effectively. This condition can be seen from the number of distractors that work correctly.

Qualitative Analysis
Qualitative analysis uses a question card that covers material, construction, and language aspects. The results of the qualitative analysis can be shown in Table 9. Sukiman (2012) said that theoretical or qualitative analysis could be done before and after the trial. Qualitative analysis is seen from 3 aspects: material, construction, and language. How to analyze it is to look at the details of the problem compiled from the fulfillment of the requirements of aspects of content (material), construction, and language.
In this study, qualitative analysis was conducted by three reviewers, namely alumni of biology education study programs, peers, and researchers, which is conducted through a panel technique. The validation process by reviewers is carried out in their respective places so that they can be objectives and between reviewers do not affect each other. Each reviewer is given review formats and assessment guidelines. Reviewers are also welcome to correct the direction on the text of the question, provide comments or suggestions, and rate each item of the questions with criteria: Good, revision, or replaced.
The results of qualitative analysis on the Final Semester Assessment of Biology subjects class X at 6th State Senior High School Yogyakarta in material aspect showed 13 question items that did not match the indicators. Whereas at 9th State Senior High School, Yogyakarta showed 2 question items. Besides, in the construction aspect, two question items (from 6th State Senior High School, Yogyakarta) and 6 question items (from 9th State Senior High School, Yogyakarta) do not match the indicator. While in the language aspect, all item questions match with the indicator.
Qualitative analysis also showed that three question numbers are not in the latticework. In compiling the question, the details of the question are adjusted to the existing latticework (Mujimin, 2010). If the question matches existing latticework, then the test results can be used to determine the actual competence of students. Ambiyar and Panyahuti (2020) said that the guidelines in preparing questions are indicators contained in the latticework. Therefore, the teacher's ability to compile the latticework of questions needs. The teacher was the most significant factor in preparing the problem (Rahmadhani, 2014). A teacher needs special abilities such as discussing question ideas, understanding the characteristics of learners, and mastery of question writing techniques so that the questions tested on learners follow standards. Lubis, and Prastowo (2017) said that a teacher must be able to arrange quality problems to know the extent to which students understand the material that has been taught.

CONCLUSION
The quality of the end-of-semester assessment questions in this study has very good reliability. However, some questions should be improved related to the material and question construction aspects to produce better questions to be included in the question bank as material for the assessment questions at the end of the following semester. It is expected that biology teachers conduct trials and analyses of item questions before being used as test instruments to comply with the guidelines of question development measures.