This folder includes the script and input used for the analysis what xmi files include UML models. Links: file including ALL links identified by crawling github Script_as_eclipse_project: the zipped eclipse project with the script for - the identification of xmi files that are UML - for reuse it needs to be ensured that the path to input files (urls-1-80000, urls-80001-320000, urls-320001-840000, and urls-840001-1240000) is correct - the identification of duplicates amongs .xmi, .uml, or .mof files. again links to input files need to be set accordingly. - NOTE: the folder for the files to be written should already exist on the specified path Results: The folder "Results" includes the results of the automated run on February/March 2016. - For each data chunk we find 1 summary file. Each summary file includes: - mof: list of links, which's files include a schema references to the OMG MOF Schema - xmi: list of links, which's files include a schema references to the OMG XMI Schema (but not uml or mof schema) - uml: list of links, which's files include a schema references to the OMG UML Schema (but not to mof) - NoOMGXMIorUML: list of links, which's files include no schema references to the OMG XMI, UML, or MOF Schema - NotDownloaded: list of files that could ot be downloaded automatically during that run - .uml files: list of found .uml files UMLSchema=UML_Manual_Check_Results: The excel file contains data from 4 open source projects, containing together 53 (between 1 and 33) links to xmi files. All xmi files had been subject to 2 checks: a) the manual check for the actual content b) the semi-automated for UML-schema refernces The match of these results are used as argumentation that we can take the check for UML schema references as substitute for manually checking the content of all xmi files. Duplicates: The results of the duplicate analysis of .xmi and .uml files can be found in the files: Duplicates_chunks1.4_xmi_summary (uml files stored as .xmi) and Duplicates_chunks1.4_uml_summary (uml files stored as .uml).