Title: The CrossCult Knowledge Base
Authors: Andreas Vlachidis (UCL); Antonis Bikakis (UCL); Melissa Terras (UCL); Yannick Naudet (LIST); Louis Deladiennee (LIST); Daphne Manessi (TEI-A); Evgenia Vasilakaki (TEI-A); Ioannis Triantafyllou (TEI-A); Joseph Padfield (NG); Kalliopi Kontiza (NG)
Description: The Crosscult Knowledge Base (CCKB) is a comprehensive structure of semantic definitions and formalisms, developed for facilitating interoperable connections between the cultural heritage datasets contributing to Crosscult. It is written in OWL2 (the standard ontology language for the Semantic Web) and enables augmentation, semantic linking, semantic-based reasoning and retrieval across disparate data resources.
Title: Pilot 1 National Gallery collection ontology extensions
Description: Ontology Extensions
Title: Pilot 1 National Gallery collection literals
Description: These datasets enclose information about the venue participating in pilot 1, The National Gallery (NG) London and the NG collection. These datasets will be used to model the ontology of the paintings of the NG collection. The datasets will be composed of the extensions of the upper-level ontology used to describe the types and relationships required to capture the NG collection and the actual literal data related to the collection. They will be created by NG staff along with assistants from other CROSSCULT researchers, as part of other related NG projects and an ongoing program within the NG to develop and enhance its collection information. The final literal dataset will take the form of structured XML files, with the information modelled in OWL. The types and relationships captured within these files will be documented within the overall ontology, which will also take the form of structured XML. The overall ontology will be based on the CIDOC-CRM cultural heritage standard, along with other related ontologies and enhancements that have been selected and or developed within the CROSSCULT project. The core (Tombstone) data will be retrieved from NG collection information held in the National Gallery Collection Management System (CMS) TMS (The Museum System™) and will be presented to the CROSSCULT app via a new API.
Title: Pilot 1 user profiles
Description: This dataset will include all the information used to create the profiles of the visitors participating in the experiments of pilot 1 and the data calculated or determined by the CrossCult platform, to describe and categorise users and their interests, activities, preferences and interaction with the collection and the CrossCult Pilot 1 app. The dataset's content will be used to model the profile of the CrossCult Pilot 1 app user's and so help to personalise the recommendations to be offered to them.
Title: Pilot 1 user-generated-content
Description: The information of this dataset will be provided by the users who will participate in the Gallery visits, and will contribute content participating in the experiments using the CrossCult Pilot 1 app. This content will represent the users’ reinterpretation of the NG collection information, allowing them to reflect on their experiences. This will include data literals from any subgroups of the NG collection created by user input, preferences and searches.
Title: Pilot 1 NG routes and tours
Description: This dataset will include the data collected from paintings location in combination with records of tracking location of users participating in the experiments using the CrossCult pilot 1 app. It will begin with a small set of predefined, suggested, NG tours, based on the available data and the defined CrossCult pilot 1 reflection points. The dataset will then be augmented by user’s recommended tours created during the profiling and recommendation processes, records of user visits, and the creation of new user defined tours. This dataset will include people’s visits and activities in a reusable form that can be used by other users, or the system. Some degree of review or scoring could be included, based of automated monitoring of their use and user defined values. The final dataset will take the form of structured XML files.
Title: Pilot 1 multimedia images
Description: This dataset will contain the multimedia resources, images provided by the venue in pilot 1 that will be used to create stories to be presented to the users.
The dataset will include zoomable images or just simple static thumbnails and jpegs. The data will use the most well-known formats regarding multimedia resources. The IIIF and MPEG-7 standards will be used to describe the relevant details of the involved files, describing their technical characteristics and the more appropriate descriptions of their contents.
Title: Pilot 1 user tracking and observations
Description: This dataset will contain all the data collected from observation and tracking location of the users participating in the experiments using the CROSSCULT Pilot 1app. It will include information about the users’ interaction with the NG collection information.
The dataset will comprise anonymised information about the users who will participate in the NG visits, using the CROSSCULT app. Additionally, for users who give their permission, information can be automatically extracted/retrieved from their devices location. User agreement must be obtained before tracking and processing data about the user.
The dataset generated from the observation will provide information on user behaviour. The exact make-up of the fields included in this dataset will be determined as part of the work carried out within CROSSCULT.
Title: Pilot 1 game scenarios
Description: This dataset will contain information about the content available in the game scenarios for the users participating in the experiments of pilot 1 using the CROSSCULT app. This dataset will have generic structures for quizzes in addition to a fixed set of questions, as a predefined set of questions, with default question types. The fixed set of questions will be selected in relation to the pilot 1 reflection points and the available related data. While users interact with the CROSSCULT APP, new sets of questions will be generated based on user options, preferences, and activities and profiling. Personalisation and context-awareness are also two important elements implemented in CROSSCULT technologies. It would also be possible to relate the results of quizzes to the user activities with the CROSSCULT app and the National Gallery, allowing researchers to explore how well different interactions with the collection relate to a user’s ability to answer questions.
The dataset will also include anonymised data related to which quizzes are used and how well different users have performed in each attempted quiz.
Title: Pilot 2 ontology extensions
Description: Ontology Extensions
Title: Pilot 2 literals
Description: These datasets enclose additional information about the four venues participating in pilot 2 and the elements contained in their exhibitions and facilities not included in the general dataset. The datasets will be composed of the extensions of the upper-level ontology used to describe the types and relationships required and the actual literal data related to the collections. This way, the datasets will be used to model the knowledge about the items from the four venues participating in pilot 2 in order to permit the discovery of additional relationships that could be presented to visitors to enhance their understanding and to be used in games to improve the entertainment aspect of the experience.
These datasets will be composed of the ontology(ies) used to describe the types and relationships required to capture the items from the venues. They will take the form of structured XML files, with the information modelled according to the OWL standard. The types and relationships captured within these files will be documented within the overall ontology, which will also take the form of structured XML files. The overall ontology will be based on the CIDOC-CRM cultural heritage standard, along with other related ontologies and enhancements that have been selected and or developed within the CROSSCULT project.
No elaborated data following the same structure will be reused, and the original data will be extracted from the sources provided by the venues. The basic data will be augmented by related concepts such as geographical data and related vocabulary terms and connected to external sources of information (Wikidata, Wikipedia) and other museums APIs (BM, Europeana).
Most of the data (both data structures and instances) describing the venues items will be covered by the general datasets. So, the expected size of these specific datasets is quite reduced, in the order of several KB.
The ontology may be used for further semantic related projects to help define and describe similar collections and as part of more general semantic research. The literals may be used in further research to examine the relationships captured in the dataset, analyse the collections connections to the broader world history, explore the visualisation of complex cultural heritage datasets and or as an example for how other collections might be structured and described with this kind of tools.
Title: Pilot 2 visitor profiles
Description: This dataset will include all the information used to create the profiles of the visitors participating in the experiments of pilot 2. This way, the dataset contents will be used to model the users that will play the games of the pilot and so help the apps to personalise the options to be offered to them in order to get a more appealing experience for every visitor.
This dataset will comprise information about the visitors who participate in the games, including psycho-demographic data, cognitive style, and interests mined from social networks. All the information will be anonymised and the storage will be organised by assigning an anonymised identifier to every participant.
Title: Pilot 2 multimedia contents
Description: This dataset includes the multimedia resources linked to the venue involved in pilot 2 that may be used to create stories to be told to the participants in the experiments. They are the multimedia assets appearing in the narratives, for instance to illustrate questions or descriptions.
The intended resources include text, images, video and audio clips, animations, 3D shapes, AR contents, external links, and any other formats. The multimedia resources will use the most common formats (gif, jpeg, png, avi, mpg, mp4, doc, txt, odt,…), so integration and reuse difficulties are not foreseen. Transcoding may be used wherever necessary.
Descriptive metadata containing semantic characterizations for every resource will rely on MPEG-7 standard vocabularies as far as possible.
Title: Pilot 2 game scenarios
Description: This dataset will contain information about the content available in the game scenarios developed by CROSSCULT social sciences experts for the users participating in the experiments of pilot 2. It will include details of the questions and answers provided to the users within the quizzes presented as part of pilot 2. It will contain predefined graphs of concepts and relationships to explore, attached sets of choices for the concepts that will be left blank (including different sets for different individual/team profiles and locations) and attached multimedia contents.
The dataset will be presented on the form of XML files indicating the venues involved, the structuring the concepts and relationships in the graph, the concepts left blank and the possible sets of answers.
Title: Pilot 2 visitor interactions and staff observations
Description: This dataset will include the contributions of participants during the experiments. The information (linked to the corresponding profiles from the CROSSCULT-DS-pilot_3_Visitor_Profiles dataset) will comprise records of individual/collective interactions, plus observations from the guiding staff of the venues (e.g. about the mood of the participants, amount of idle time, reception of the micro-augmentations, etc.).
The format of the data will be XML files structuring the different kind of contributions and contributors in every stage of the user experience.
The information of this dataset will be provided by the participants in the proposed experiences. No reuse of additional information is foreseen, as this information cannot be enhanced by any external contribution.
Title: Pilot 2 interview transcripts
Description: This dataset provides the user responses to the interviews that will be performed during the evaluation phase of the project and will be designed around the unified theory of user’s acceptance of technology.
A simple Dublin core set of metadata will be used to describe the dataset.
It will be created during the pilot 2 tests in order to understand how users accept and perceive the concept. No reuse of additional information is foreseen, as this information cannot be enhanced by any external contribution.
The information volume that will be stored is quite small, several Kbytes for each participant at most, yielding up to a few Mbytes per experimentation session.
The dataset could be used by any interested researchers. It is expected that the dataset can contribute to an understanding of the users’ acceptance of the CROSSCULT technology and to an article based on this study. It will also inform the evaluation framework.
Title: Pilot 3 ontology extensions
Description: Ontology Extensions
Title: Pilot 3 literals
Description: This dataset encloses additional information about the Archaeological Museum of Tripolis, where pilot 3 will take place and the elements contained in its exhibitions. The dataset will be composed of the extensions of the upper-level ontology used to describe the types and relationships required and the actual literal data related to the collections. This way, the datasets will be used to model the knowledge about the items from the Archaeological Museum of Tripolis participating in pilot 3 in order to permit the discovery of additional relationships that could be presented to visitors to enhance their understanding and to be used in games to improve the entertainment aspect of the experience.
These datasets will be composed of the ontology(ies) used to describe the types and relationships required to capture the items from the venue participating in pilot 3.
They will take the form of structured XML files, with the information modelled according to the OWL standard. The types and relationships captured within these files will be documented within the overall ontology, which will also take the form of structured XML files. The overall ontology will be based on the CIDOC-CRM cultural heritage standard, along with other related ontologies and enhancements that have been selected and or developed within the CROSSCULT project.
No elaborated data following the same structure will be reused, and the original data will be extracted from the sources provided by the venues. The basic data will be augmented by related concepts such as geographical data and related vocabulary terms and connected to external sources of information (Wikidata, Wikipedia) and other museums APIs (BM, Europeana).
Most of the data (both data structures and instances) describing the venues items will be covered by the general datasets. So, the expected size of these specific datasets is quite reduced, in the order of several KB.
The ontology may be used for further semantic related projects to help define and describe similar collections and as part of more general semantic research. The literals may be used in further research to examine the relationships captured in the dataset, analyse the collections connections to the broader world history, explore the visualisation of complex cultural heritage datasets and or as an example for how other collections might be structured and described with this kind of tools.
Title: Pilot 3 visitor profiles
Description: This dataset will include all the information used to create the profiles of the visitors participating in the experiments of pilot 3. This way, the dataset contents will be used to model the users that will play the games of the pilot and so help the apps to personalise the options to be offered to them in order to get a more appealing experience for every visitor.
This dataset will comprise information about the visitors who participate in the games, including psycho-demographic data, cognitive style, and interests mined from social networks. All the information will be anonymised and the storage will be organised by assigning an anonymised identifier to every participant.
No elaborated data following the same structure will be reused, and all the information of this dataset will be provided by the users through their participation in mini games designed to extract cognitive profiles before their museum visit and, for visitors who give their permission, through direct access to the information of their social networks to infer interesting profile traits.
Title: Pilot 3 multimedia contents
Description: This dataset includes the multimedia resources linked to the venue involved in pilot 3 that may be used to create stories to be told to the participants in the experiments. They are the multimedia assets appearing in the narratives, for instance to illustrate questions or descriptions.
The intended resources include text, images, video and audio clips, animations, 3D shapes, AR contents, external links, and any other formats. The multimedia resources will use the most common formats (gif, jpeg, png, avi, mpg, mp4, doc, txt, odt,…), so integration and reuse difficulties are not foreseen. Transcoding may be used wherever necessary.
Descriptive metadata containing semantic characterizations for every resource will rely on MPEG-7 standard vocabularies as far as possible.
Original data provided by the museum will be reused to build the stories. In addition to the resources provided by the venue itself, some other material will be linked from open online sites like Wikipedia, Wikimedia, etc. It is foreseen to include short clips (always respecting copyright laws and up to 30 seconds long) from Greek music, ancient to traditional and modern, as well as world music.
Title: Pilot 3 Game Scenarios
Description: This dataset will contain information about the content available in the game scenarios developed by CROSSCULT social sciences experts for the users participating in the experiments of pilot 3. It will include details of the questions and answers provided to the users within the quizzes presented as part of pilot 3, pre-visit experience.
No elaborated data following the same structure will be reused. All the dataset resources will be extracted from the contents involved in the composition of the pilot 3 games, be in the form of XML files.
As all the information will be in the form of XML files, no significant amount of information will be stored. The estimation is around several Kbytes by game
Title: Pilot 3 visitor interactions and staff observations
Description: This dataset will include the contributions of participants during the experiments. The information (linked to the corresponding profiles from the CROSSCULT-DS-pilot_3_Visitor_Profiles dataset) will comprise records of individual/collective interactions, plus observations from the guiding staff of the venues (e.g. about the mood of the participants, amount of idle time, reception of the micro-augmentations, etc.).
The format of the data will be XML files structuring the different kind of contributions and contributors in every stage of the user experience.
The information of this dataset will be provided by the participants in the proposed experiences. No reuse of additional information is foreseen, as this information cannot be enhanced by any external contribution.
Title: Pilot 3 interview transcripts
Description: This dataset provides the user responses to the interviews that will be performed during the evaluation phase of the project and will be designed around the unified theory of user’s acceptance of technology.
A simple Dublin core set of metadata will be used to describe the dataset.
It will be created during the pilot 3 tests in order to understand how users accept and perceive the concept. No reuse of additional information is foreseen, as this information cannot be enhanced by any external contribution.
The information volume that will be stored is quite small, several Kbytes for each participant at the most, yielding up to a few Mbytes per experimentation session.
The dataset could be used by any interested researchers. It is expected that the dataset can contribute to an understanding of the users’ acceptance of the CROSSCULT technology and to an article based on this study. It will also inform the evaluation framework.