SUCHOMEL, Šimon and Michal BRANDEJS. Determining Window Size from Plagiarism Corpus for Stylometric Features. In Experimental IR Meets Multilinguality, Multimodality, and Interaction. Toulouse, France: Springer International Publishing, 2015, p. 293-299. ISBN 978-3-319-24026-8.
Other formats:   BibTeX LaTeX RIS
Basic information
Original name Determining Window Size from Plagiarism Corpus for Stylometric Features
Authors SUCHOMEL, Šimon (203 Czech Republic, belonging to the institution) and Michal BRANDEJS (203 Czech Republic, guarantor, belonging to the institution).
Edition Toulouse, France, Experimental IR Meets Multilinguality, Multimodality, and Interaction, p. 293-299, 7 pp. 2015.
Publisher Springer International Publishing
Other information
Original language English
Type of outcome Proceedings paper
Field of Study Informatics
Country of publisher France
Confidentiality degree is not subject to a state or trade secret
Publication form printed version "print"
WWW URL
RIV identification code RIV/00216224:14330/15:00084706
Organization Fakulta informatiky – Repository – Repository
ISBN 978-3-319-24026-8
ISSN 0302-9743
Keywords in English plagiarism; average word frequency class; stylometry; text classification; intrinsic plagiarism
Links LG13010, research and development project.
Changed by Changed by: RNDr. Daniel Jakubík, učo 139797. Changed: 2/9/2020 09:52.
Abstract
The sliding window concept is a common method for computing a profile of a document with unknown structure. This paper outlines an experiment with stylometric word-based feature in order to determine an optimal size of the sliding window. It was conducted for a vocabulary richness method called ‘average word frequency class’ using the PAN 2015 source retrieval training corpus for plagiarism detection. The paper shows the pros and cons of the stop words removal for the sliding window document profiling and discusses the utilization of the selected feature for intrinsic plagiarism detection. The experiment resulted in the recommendation of setting the sliding windows to around 100 words in length for computing the text profile using the average word frequency class stylometric feature.
Type Name Uploaded/Created by Uploaded/Created Rights
Determining_Window_Size_from_Plagiarism_Corpus_for_Stylometric_Features.pdf Licence Creative Commons  File version 2/9/2020

Properties

Name
Determining_Window_Size_from_Plagiarism_Corpus_for_Stylometric_Features.pdf
Address within IS
https://repozitar.cz/auth/repo/19266/899438/
Address for the users outside IS
https://repozitar.cz/repo/19266/899438/
Address within Manager
https://repozitar.cz/auth/repo/19266/899438/?info
Address within Manager for the users outside IS
https://repozitar.cz/repo/19266/899438/?info
Uploaded/Created
Wed 2/9/2020 09:52

Rights

Right to read
  • anyone on the Internet
Right to upload
 
Right to administer:
  • a concrete person Mgr. Lucie Vařechová, uco 106253
  • a concrete person RNDr. Daniel Jakubík, uco 139797
  • a concrete person Mgr. Jolana Surýnková, uco 220973
Attributes
 
... Licence Creative Commons  File version 17/11/2015

Properties

Name
...
Address within IS
https://repozitar.cz/auth/repo/19266/253714/
Address for the users outside IS
https://repozitar.cz/repo/19266/253714/
Address within Manager
https://repozitar.cz/auth/repo/19266/253714/?info
Address within Manager for the users outside IS
https://repozitar.cz/repo/19266/253714/?info
Uploaded/Created
Tue 17/11/2015 00:50

Rights

Right to read
  • anyone on the Internet
Right to upload
 
Right to administer:
  • a concrete person Mgr. Bc. Růžena Zemanová, uco 134451
  • a concrete person RNDr. Daniel Jakubík, uco 139797
Attributes
 
Print
Add to clipboard Displayed: 27/4/2024 17:21