- Dr.-Ing. Sebastian Hellmann — Agile Knowledge Engineering and Semantic Web (AKSW)
- Article metrics
- Experiments using Semantic Web technologies to connect IUGONET, ESPAS and GFZ ISDC data portals
- North America
The authors observe that several of the quality issues frequent in DBpedia, which cannot be reliably detected automatically, can be identified with human involvement. The study focuses on verifying four types of quality issues frequent in DBpedia triples, related to 1 incorrect object values in a triple, 2,3 incorrect data types or language tags and 4 incorrect links. The paper investigates three main research questions, referring to 1 whether and to what extent these error types can be detected by crowds; 2 how do crowds with diverse skill sets e.
To investigate their research questions, the authors employ two different crowdsourcing genre: expert contests on the one hand and traditional micro-task crowdsourcing on Amazon Mechanical Turk AMT on the other. The Find-Fix-Verify workflow is used in both genre.
- Highterry (The Chino Anthology);
- Social Semantic Web.
- Social Networks and the Semantic Web (Semantic Web and Beyond) - PDF Free Download.
- CSDL | IEEE Computer Society.
The paper provides several interesting lessons. Firstly, by contrasting the HC-based results with state-of-the-art quality assessment tools, it is shown that the majority of errors can only be detected with HC techniques. This provides a good example of a task that currently cannot be reliably automated. Secondly, experiments confirmed that expert and laymen crowds can reliably detect the error types under investigation, each crowd having their own strengths.
Thirdly, experiments show that workflows combining and exploring the synergies of crowds with complementary aptitudes i. Knowledge bases such as DBpedia are becoming an important asset for scientists and practitioners, but suffer from a number of flaws that could be traced back to missing or factually wrong information. The authors investigate how the contribution from workers operating in microwork platforms could be organised to select the entity type e. As a real-world hierarchy could easily contain thousands of classes, there exists a fundamental trade-off between the precision that could be obtained by automatic systems, and the cost of engaging experts.
The paper contributes an analysis of the main design dimension that affect the design of human-enhanced workflows that include both automated and crowdsourced components, and reports on their performance in terms of precision in terms of correctness of entity typing and cost in terms of amount of required manual work. Workflows include three main steps: 1 a prediction step, where a list of candidate classes for a given entity is generated automatically, or from the crowd ; 2 an error detection step, where the output is manually checked, and 3 an error correction step.
The authors focus on three types of workflows, where the main variations affect the prediction step. Experiments were conducted on untyped DBpedia entities, and have demonstrated the intrinsic complexity of the entity typing problem. Even when humans are involved, three main issues seem to affect the classification precision: 1 the lack-of domain-specific expertise of crowd workers; 2 the unbalanced structure of the type hierarchy; and 3 the ambiguity of some entities.
Results clearly indicate the need for further investigation, in terms of both workflow design and optimization strategies. This paper addresses an important problem related to named entity recognition NER performed on noisy social media microposts, e. The basic assumption of the authors is that some types of social media microposts are more amenable to crowdsourcing than others. In order to prove their hypothesis the authors study the impact of the micropost content on the accuracy of human annotations. For this, experiments were performed using a game with a purpose for NER called Wordsmith which sourced workers from the CrowdFlower crowdsourcing platform.
Two research questions and two hypotheses guided these experiments. On the one hand, the authors investigated what is the effect of micropost features on the accuracy and speed of entity annotation performed by non-expert crowd workers.
Dr.-Ing. Sebastian Hellmann — Agile Knowledge Engineering and Semantic Web (AKSW)
Authors measured the number and type of entities recognized, as well as the length and sentiment of the post. On the other hand, the authors also investigated whether crowd workers prefer some NER tasks over others. Specifically, they measured the number of skipped annotations, the precision of the annotation, the time spent and the overall user interface interaction.
The experimental investigations confirmed that features such as micropost length, number and type of mentioned entities are good indicators of how well crowds will perform NER on posts: shorter posts with less entities are more often correctly annotated than longer posts with more entities, while crowd-workers perform better at identifying entities of type person and location in comparison to identifying organizations or miscellaneous entities. This work on better characterizing which posts are amenable for processing with HC paves the way to building hybrid human-machine NER workflows where each post is assigned to either the human or machine component of the system based on its characteristics.
Based on our investigation of a decade of papers at the intersection of Semantic Web and Human Computation, as well as the papers in this special issue, we draw the following conclusions on the evolution of this inter-disciplinary research area. Proof to this is also the number of 10 papers submitted to this special issue. We observe however an increasing number of papers published in venues of other research communities, especially those that benefit from the combination of the Semantic Web and Human Computation approaches.
Research is published in general computer science venues, as well as in venues of specialized communities, such as NLP, Bioinformatics, or data and software engineering. Surprisingly, this line of research is weakly represented in venues related to Human Computation and Human Computer Interaction. The SW4HC papers primarily focused on exploring the use of semantics for knowledge representation, while the use of these technologies to support data integration and reasoning was only addressed to a limited extent.
We believe this to be a promising avenue for future research. For instance, recent HC work focusing on the analysis of task properties e. We also identified initial work on using Linked Data to publish research results, in order to support research reproducibility [ 2 , 13 , 30 , 31 ] which we hope will be adopted on a larger scale by the community. Our search found a large number of papers which do not necessarily use one of the research areas to support the other, but rather use these two areas in combination i.
In terms of the research challenges defined by Sarasua et al. HC genres all have their strengths and weaknesses which open up opportunities for their combined use. Bu et al. Our search revealed a high number of very diverse papers at the intersection of Semantic Web and Human Computation research, yet no focused surveys of this area.
One expected benefit of these in-depth surveys is that they could further refine and extend the current set of topics and scenarios envisioned for this line of work by Sarasua et al. For instance, we identified emerging clusters of papers around topics such of using HC as support for evaluating Semantic Web research HC4SW-Evaluation or relying on Linked Data as a technology for openly publishing research data. In the area of using Human Computation for Semantic Web research HC4SW , there are a few trending topics both in the overall paper corpus we collected and in the special issue papers.
Last, but not least, to lower the overhead in adopting and using HC in SW, there is a need for reusable tools and user interfaces for common Semantic Web tasks e. One such example from the area of Natural Language Processing is the open-source GATE Crowdsourcing plugin [ 5 ], which offers infrastructural support for mapping documents to crowdsourcing units and back automatically, as well as automatically generating reusable crowdsourcing interfaces for NLP classification and selection tasks.
We also found that the adoption of Semantic Web technologies to support Human Computation systems is currently limited and is focused on the formal knowledge representation capabilities of these technologies, but falls short of exploring more advanced capabilities made possible by semantics such as data integration and automated reasoning.
We conclude that, while this special issue reports on important advances on a number of fundamental research challenges, there are ample so far unexplored opportunities for future work in the context of this maturing, diverse and multi-disciplinary research area. Acosta, A. Zaveri, E. Simperl, D.
Kontokostas, F. Anelli, A. Cali, T. Di Noia, M.
Experiments using Semantic Web technologies to connect IUGONET, ESPAS and GFZ ISDC data portals
Palmonari and A. Fujita, M. Ali, A. Selamat, J. Sasaki and M. Kurematsu, eds, Springer International Publishing, Cham, , pp. Ballatore and P. Mooney, Conceptualising the geographic world: The dimensions of negotiation in crowdsourced cartography, Int. Berners-Lee, J.
Hendler, O. Lassila et al. Bontcheva, I. Roberts, L. Derczynski and D.