The Development of Authentic Assessment Instrument to Measure Science Process Skill and Achievement based on Students' Performance

This study aims to develop an authentic assessment instrument based on students' performance. It can build the science process skill and measure the cognitive aspects of student’s achievement. The procedure of research & development method is adapted from Borg and Gall’s model, consisting collecting information, planning, early product development, preliminary field testing, first product revision, field testing, second product revision, and dissemination. The initial field testing was executed to 15 participants, and the field testing to 31 participants in MAN III Yogyakarta use the One – Shot Case Study model. The results of the study as follows: (1) instruments according to the experts' judgment, physics teachers and peers, in general, was valid, (2) instruments at the preliminary field test and main field test was reliable, and (3) instruments could build skills and measure cognitive achievement’s aspect of the scientific process skills of the senior high-school students for heat and temperature topic.


INTRODUCTION
The quantity of learning and the achievement of the desired goals can be seen in the assessment system used. Assessment is an activity to collect information in the form of facts, which is carried out intentionally, systematically, and sustainable besides being used to assess student competence. The assessment provides feedback about student learning progress to students, parents, and educators (Earl & Giles 2011). Assessment helps educators to make decisions about student needs, and guidance on program learning plans. Assessment is an integral part of the learning program. Educators pay attention to learning facts from daily activities, which have been carried out by students. That fact shows what students already know, also what they still need to know. Suharto (2015) showed that assessment was processed to gather information to measure enrichment as a result of student studies.
The assessment system is varied, and its use depends on the character of learning. The character of physics subjects have dimensions of scientific processes and products (Safarati 2017), scientific attitudes (Morales 2015), and application in daily life (Powietrzyńska & Gangji 2016) in a holistic manner not separated from three aspects that are cognitive, affective, and psychomotor. The assessment process that is applied must be able to measure that aspect. The holistic assessment can certainly be used as a benchmark for the progress of student achievement. The achievement was abilities, which have by the student after experience. That abilities consist of the cognitive aspects, e-Jurnal: http://doi.org/10.21009/1 on learning assessment activities. To further increase the quantity as well as the variety of authentic assessments through a performance that is already available, it needs improvement and development with a focus on measuring science process skills and achieving cognitive aspects.

RESEARCH METHODOLOGY
This research and development have been carried out from February to July in class X Yogyakarta III. The research subjects consisted of preliminary test subjects of 15 students and the main test subjects as many as 31 students of class XA in MAN III Yogyakarta.
This research refers to the Borg and Gall models (Sugiono 2013) with procedures that have been adapted into eight steps.
Step I is an introductory lesson. In this step, two activities are carried out, namely the study of literature and field surveys in the form of gathering information and references relating to authentic assessment models based on performance. The source of this information is in the form of education laws, journals, and textbooks, electronic and field surveys, observations and interviews with several educators in MAN III Yogyakarta to investigate information related to the assessment that has been carried out and the implementation of authentic assessments.
Step II is planning to determine learning objectives. Planning carries out two activities, namely Competency analysis, carried out to determine authentic assessments based on the performance to be developed. Determination of learning objectives is done to determine the rubric construction reference.
Step III is a development product.
Step III consists of determining authentic assessment criteria based on performance, providing limits that are in accordance with the material to be developed, product development performance models with reference to learning objectives and predetermined criteria. The result is a draft of the initial product. The development of assessment rubrics as a guideline for evaluating physics learning based on assessment activities carried out and product validation involving preparation of research instruments is to collect research data, especially those relating to evaluating authentic quantity product evaluations based on performance, also useful products that have produced, product validation which consists of expert assessment, teacher and peer validation, assessment and provision of inputs needed to improve the authentic assessment of the initial product based on performance, revision I of the initial product improvement, in accordance with the input given by the expert, teacher and colleagues.
Step IV is a preliminary test where the product has been tested to 15 students in class X MAN III Yogyakarta.
Step V, namely revision II has been produced from the main field testing analyzed and then revised (twice) conducted by the teacher and peers to produce product II.
Step VI, namely the main field testing is product II has been tested for class XA MAN III Yogyakarta.
Step VII is the final revision. The results of the main field testing are analyzed and revised to produce the final product.
Step VIII is spread. The final product of an authentic assessment instrument is reproduced and disseminated by being given to physics teachers from five schools in Yogyakarta.

Data, Instrument, and Technique of Data Collecting
Our research has quantitative and qualitative data. The quantitative data has got from product validation and respon data of student on preliminary testing. The quantitative data has got from validator input and student involved with appropriate authentic assessment instrument based on performance.
The data collection instruments used are: (1) product validation instruments, filled by validators, namely two assessment expert instruments, two physics teachers, and also colleagues to assess the suitability of products that have been developed; (2) observation sheet, is a form of a checklist of student performance from the process of science skills for experiments. This sheet is equipped with assessment guidelines. Instruments for measuring performance are used to measure the extent of student performance for trial implementation. The instrument contains as many performance indicators that measure the ability of science process skills that have been controlled by students for the implementation of experiments; (3) interviews, extracting information related to the suitability of the instruments that have been made, as input for improvement; (4) a list of student questions in the form of a questionnaire sheet that is used to determine the moment of testing of student responses related to learning activities and assessments conducted; (5) peer assessment sheets in terms of e-Jurnal: http://doi.org/10.21009/1 involvement in activities; (6) peer assessment sheets in terms of attitude of interaction in group activities; (7) student self-assessment sheet; (8) science process skills questions consist of seventy multiple choice questions for the same reason after a series of implementation of performance-based learning activities; (9) the question of achievement of cognitive aspects consists of 40 multiple choice questions involving all competencies based on temperature and heat the material that has been translated in the lesson plan also contains indicators of cognitive aspect abilities. The cognitive aspects of achievement questions were validated by two expert research instruments and tested as empirical for 15 students in the initial testing. After the initial question has been fulfilled reliably and validly, then it is tested to 31 students.

Data Analysis
The data analysis is related to validity and reliability, instrument analysis doing wit step as follows.
Product Appropriateness Analysis 1) Validation assessment score by expert judgment, teacher, and peer then analyzed as descriptive statistics with the calculated content validity coefficient use formula Aiken's validity. The instrument was valid if fulfilled Aiken's V coefficient was a range between 0-1 (Azwar 2014, 134). 2) Validation assessment score, peer assessment sheet in the form of quantitative data conversed become qualitative data, which stated the value of product quality on four scalas. The average score has got the form the calculation then converted into qualitative. Authentic assessment instrument based on performance declared feasible has used in learning if the minimal quantity average of product assessment by validate and respond of research subject in preliminary testing and main field testing on a good category. 3) Validation assessment score, peer assessment sheet in the form of quantitative data converted become qualitative data, which stated the value of product quality on into four scales. The average score in the form of quantitative data converted become a qualitative value.
Performance otherwise feasible has used in learning if the minimal quantity average of product assessment by validate and respond of research subject in preliminary testing and main field testing on a good category. 4) Assessment score of student questionnaire and observation sheet which form quantitative data converted become qualitative data, which stated the value of product quantity in five scores. The average score gets the form the calculation then converted into qualitative with score conversion criteria become a value in five scales. Authentic assessment instrument based on performance otherwise feasible has used in learning if the minimal quantity average of product assessment by validate and respond of research subject in preliminary testing and main field testing on a good category.

Validity and Reliability Analysis of Cognitive Aspect Achievement Question
The validation of cognitive achievement aspect has done with tested to 15 students in preliminary testing and 31 students in main field testing. The question has tested, analyzed the validity and reliability use Rasch model which helped software Winstep to the determined criterion of item validity referred to (Sumintono & Widhiarso 2013, 111). Question reliability can be reviewed from the value of person reliability, item reliability and Cronbach (Sumintono & Widhiarso 2013, 109).

RESULTS AND DISCUSSION
The preliminary study consists of a literature study and survey activity. The literature study has done with the information collected from all reference with regarded development research authentic assessment model based on performance. Referenced of literature studied which form of law about education, journal and textbook as well electronic.
Survey activity, data collected has done with observation and interview to some educator and student in MAN III Yogyakarta. Observation Results on some class physic learning to go on is gotten that mostly more focus in a teacher, experiment activity has done, but not often, source, which used in learning was a text book which has been provided and also student worksheet. At MAN III Yogyakarta facility of physics laboratory already available, this as trying to hone and student develop from a material that has been studied.
The planing step has done that is analyzed Core Competence and Basic Competence. Analyzed physics Core Competence and Basic Competence Senior High School done to set activity learning, material, and kind of research that wanted.
Step early product planing consists of the draft which forms lesson plan, material temperature, and heat, Student Worksheet, Observation sheet authentic assessment based on performance, Process Skill Question, Achievement Test Question, Student Questionnaire, self-assessment sheet. Peer Assessment Sheet Reviewed From Involved In Activity; Peer Assessment Sheet Reviewed From Interaction Attitude In Group Activity.
Student Worksheet arranged with referred to the criterion of authentic assessment and performance. Also material temperature and heat with aim can help the student for the learning process and make it easy to measure student performance assessment. To step of arranged sheet observation has done with (a) criteria determination authentic assessment performance set to determine assessment indicator also give limits adjusted with material which has developed also referring to aim learning which has been set. (b) The development of scoring rubric developed assessment guideline physics learning class X based on assessment activity which done. The result of product validation by expert judgment contain on TABLE 1. 0.89 Preliminary product tested in class X MAN III Yogyakarta with 15 sample students. At testing of student worksheet average score of performance on experiment sequent to student worksheet I 57 (very good), student worksheet II 59 (very good), and in student worksheet III 49 (good). From a student, as much as 15 in the student worksheet get a very good predicate, six students, good predicate as much as 11 students. To student worksheet II as much as 11 student predicate, very good and four students got good predicate. While to student worksheet III very good predicted as much as 14 students, and enough predicate as much as one student. That result showed students have a performance with implementation science process skill with good. This is because almost students did with sequence on performance for experiment process. As much as 14 indicators of authentic assessment based performance average gave sequent to the student worksheet, I as much as 60.21 (very good), student worksheet II as much as 63.71 (very good), and student worksheet III as much as 61.25 (very good). In student worksheet I performance has very good to seven indicators (50 %) were in use of sense to identification of activity, data collecting and search equation and difference in every object/event, use of a stopwatch to measure the time, use of the balance sheet to measure mass, use of the thermometer to measure temperature, and measurement of the temperature increase. Although like that role educator to guide, still exist. Student performance already in the good criteria 7 indicator (50%) that is in change of measurement unit, measurement of time required, made conclusion/statement based on from evidence/result, estimate event that happened, explanation of observation result in accordance with the observation result and theory which exist, and education of experiment/observation result. Especially to change the student result still need to remind and guide by an educator. To student worksheet II, performance of student has very good to 6 indicator (42.86%) that is in use of sense, to identification of experiment activity, search equation and difference in every object/event, use of balance sheet to measure mass, use of thermometer to measure heat, measurement increase/decline heat, calculation count of temperature absorbed/removed. Student performance almost already in the criterion good to 7 indicator (50 %) that is on evidence collecting relevant suitable with experiment, unit change of measurement mass of water, made conclusion/statement based on evidence/result, estimate event that happed, explanation of observation result in accordance with the observation result and theory which exist. On experiment to student worksheet II, the student still need guide especially in decides and calculation heat type of metal, because student performance in this indicator is enough or only reach 7.14%.
On student worksheet III, the performance of student has very good to 3 indicator (25%) that is in use of sense to the identification of experiment activity, use of thermometer to measure temperature, and measurement of temperature increase. And good to7 indicator (50 %) that is in evidence collecting that relevant in accordance with experience, search equation and difference in every object/event, use of stopwatch to measure time, use of measurement instrument to measure volume, Changed of unit the result measurement, made conclusion/statement based on fact/observation result which exists, and discussion of experiment/observation result. Performance of student with enough criterion to two indicators (16.66 %) that is in about event estimate of happened event and explanation of an observation result following the observation result and theory which exist. To knew the response of students to learning with applied of authentic assessment based on performance, so the student has requested to fill the questionnaire. The response of student to learning with applied authentic assessment based on performance was positive and interesting, because students have done learning with experiment as live/realistic. According analyzed result of achievement test obtained value of alfa crombath 0.85 on very good criterion, item reliability 0.67 on enough criterion, person reliability 0.74 so achievement question reliable and then feasible to used and declared valid, with notes there are 6 questions need thrown away because not fit or include to zona outlier misfit with data which exist to value output MNSQ, ZTD, and Pt Measure Corr. That is outer zona 0,5 < MNSQ < 1,5 ; -0,2 < ZSTD < + 2,0 and 0,5 < Pt Measure Corr < 0,85. The number that needs to be deleted are 12, 27, 28, 2, 17 and 30. The result of achievement test analyzed shown on TABLE 2. Maximum score 82.5 6.
Total of student that reach minimal mastery criteria 8 9.
Total of students 15 10.
The question that not valid 6 questions e-Jurnal: http://doi.org/10.21009/1 In the main field testing the instruments used are authentic assessment products based on performance. This instrument has been revised and analyzed based on preliminary testing. For the main field testing carried out in class XA MAN III Yogyakarta, with a total of 31 students. In one class, it was divided into five groups, with each group as many as 6-7 students. In testing the main field, the design uses One -Shot Case Study, which is a group that provides the properties and then observes the results.
For the main field test, student performance in student worksheet I, student worksheet II, and student worksheet III by applying science process skills. The results show increase skills. This can be seen from the average score on the worksheet I 57 students included in the very good category, the worksheet of students II 58 is included in the excellent category, and the worksheet of students III 50 is included in the excellent category. With a maximum performance score can be on worksheets for I 63 students, worksheets II for 64 students, and worksheets III for 58 students. Minimum performance scores on worksheets I 51 students, worksheets student II 53, worksheets student III 39. There are 31 students on the worksheets I got the title of excellent 16 students (51.61%), the predicate of good was 15 students (48.39%). For the worksheet II, as many as 20 students (64.52%) received a very good predicate, and 11 students (35.48%) received a good predicate. For worksheets, III received very good predicate as many as 16 students (51.61%) and good titles as many as 15 students (48.39%). These results show that students with good applied science process skills are good, even though some students have not been serious about doing it. The 14 indicators of authentic assessment that are based on average performance, performance is also increasing.
On worksheet I the performance of students has a very good indicator of up to 8 (57.14%), namely in the search equation and decrease in each object and event, using a stopwatch to measure time, measurement of time needed, measurement of heat increase, making conclusions and statements evidence or results, communication of experimental results in the form of a chart, as well as an explanation of the results of observations in accordance with the results of observations and existing theories. Even though the role of educators to guide still exists. Student performance has good criteria for 6 indicators (42.86%) which in the sense of using activity identification, data collection and search for similarities and differences in each object and event, use of balance sheet to measure mass, change in measurement units, estimation of events and explanation of the results of observations in accordance with the results of observations and existing theories. For performance using student balance sheet measurement is still guided by teachers, because of the difficulty in determining balance sheet balance.
On the student worksheet II the performance of students has a very good indicator of up to 8 (57.14%) which is in the sense of identifying activities, gathering evidence and looking for similarities and decreases in each object and event, looking for similarities and differences in each object and event, using a balance sheet to measure mass, use a thermometer to measure temperature, change the unit of measurement, the mass of water measurements, measure the increase and decrease in temperature, and explain the results of observations in accordance with the results of observations and existing theories. The performance of students almost has good criteria for 6 indicators (28.57%), namely collecting measuring instruments, deducing based on results, estimating events that will occur, and observing the results of the discussion.
On worksheet III the performance of students is very good up to 9 (75%) indicators namely in terms of using a stopwatch to measure time, used to measure volume, measure thermometers to measure temperature, change measurement units, increase temperature, make conclusions and statements based on evidence and results , estimation of events that occur, explanation of the results of observations in accordance with the results of observations and existing theories. This good category to 3 indicators (25%) is used to identify activities, gather evidence and look for similarities and decreases in each object and event.
For written tests, the average score of students is 76.02. This is good because it has achieved a minimum mastery criteria. From a total of 31 students, 22 students reached a minimum mastery criterion, and nine students have not reached the minimum mastery criteria. With a maximum score of 87 and a minimum score of 50. This can imply that almost all students have a good understanding of achievement. According to result doing skill process science, show that maximum value it has student is 61, the minimal score is 37. With an average score and average value is 51.77 and 74.99. From 31 student as much as 31. There are ten students who get very good predicate or with percentage of 32.26 %, 20 students with good predicate or with percentage 64.52 % and only one who get enough predicate or with percentage of 3.22 %. From that illustrates almost student can apply process skill with good for learning. From as much as 17 questions, there are four questions with good predicate, and 15 question with the very good predicate and average scores are 94.35, this matter indication that processes skill question feasible to use.
According put and notes on main field testing then doing final revision step from instrument of authentic assessment based on performance consist of lesson plan, material of heat and temperature, student worksheet, of authentic assessment based on performance, process skill question, achievement test question, student questionnaire , self assessment sheet, peer assessment sheet reviewed from involved in activity, peer assessment sheet reviewed from interaction attitude in group activity. Revision of final step from instrument developed resulted in final product instrument of authentic assessment based on performance then decimated to physics teacher in 5 school at Yogyakarta that is State Senior High School 3 Yogyakarta, State Senior High School 4 Yogyakarta, State Senior High School 6 Yogyakarta, State Senior High School 11 Yogyakarta dan MAN I Yogyakarta to can use it assessment product.

SUMMARY
Based on the research process, it can be concluded that the number of instruments developed is reviewed in total according to the judgment of experts, physics teachers, and colleagues have valid categories and in the preliminary test and playing field testing as many as universal have valid and reliable categories. The development of authentic assessment instruments based on performance materials on heat and temperature material for high school / MA students can build science process skills. This is proven by observations of performance. The sheet I 57 with a very good predicate, sheet II 58 with a very good predicate, and worksheet III students 50 with a very good predicate and the ability to produce science process skills with an average value of 51.77 in the good category.
The instrument for developing authentic assessments is based on the achievement of cognitive aspects of heat and temperature material for high school / MA students. This is evidenced by the results of the test achieving an average value of 70.66. In preliminary testing, 53.33% of students have met the minimum standard mastery criteria or more. In the main, field tests the average cognitive aspects of 76.02 with 70.97% students have met the minimum standard mastery criteria or more.