A. Mussina, S. Aubakirov, P. Trigo


The growth of data in social networks facilitate demand for data analysis. The field of event detection is of increasing interest to researchers. Events from real life are actively discussed in the virtual space. Event detection results can be used in a variety of applications, from digital marketing to collecting data about natural disasters. Thereby, researchers face the emergence of new algorithms along with the improvement of existing solutions in the event detection field. This paper proposes improvements to the SEDTWik (Segment-based Event Detection from Tweets using Wikipedia) algorithm. The SEDTWik algorithm is designed to detect events without contextual guidance. The overall SEDTWik detection process excludes the perspective of a topic, or multi-topic, guided (or semi-supervised) event detection approach. As a result, some interesting narrowly focused events are not detected as they are weakly relevant in a broader context (e.g., Wikipedia) although acquiring relevance within a conditioned context. Therefore, there is a need for an adaptive perspective where data is to be analysed against a set of narrower topics of interest. This paper shows that SEDTWik gains expressive power after being extended with multi-topic semi-supervision. The evaluation of the current proposal uses the well-known corpora with labeled events, Events2012. In the Events2012 dataset used notation category for events, meaning that events are combined by a certain topic. SEDTWik with topic dictionaries was checked across all categories. In the main part of the article, it is also explained the process of topic dictionary construction from Events2012 labeled tweets. At this stage of the research, in all tasks unigrams were used. SEDTWik with dictionaries showed improved accuracy, and more events were found within a certain category.

Ключевые слова

event-detection with multi-topic semi-supervision, SEDTWik, social media, dictionary, Events2012.

Полный текст:

PDF (English)


Mussina, A.B., Aubakirov, S.S., & Trigo, P. (2021). An Architecture for Real-Time Massive Data

Extraction from Social Media. Communications in Computer and Information Science, 138–145.

Morabia, K., Bhanu Murthy, N. L., Malapati, A., & Samant, S. (2019). SEDTWik: segmentation-based

event detection from tweets using Wikipedia. Proceedings of the 2019 Conference of the North

American Chapter of the Association for Computational Linguistics: Student Research Workshop,


Li, C., Sun, A., & Datta, A. (2012). Twevent: segment-based event detection from tweets. Proceedings

of the 21st ACM International Conference on Information and Knowledge Management - CIKM ’12,


McMinn, A.J., Moshfeghi, Y., & Jose, J.M. (2013). Building a large-scale corpus for evaluating

event detection on twitter. Proceedings of the 22nd ACM International Conference on

Conference on Information & Knowledge Management – CIKM ’13, 409–418. https://doi.


Bekoulis, G., Deleu, J., Demeester, T. & Develder, C. (2019). Sub-event detection from twitter streams

as a sequence labeling problem. arXiv preprint arXiv:1903.05396

Chen, X., Zhou, X., Sellis, T., & Li, X. (2018). Social event detection with retweeting behavior correlation.

Expert Systems with Applications, 114, 516–523.

Lu, X. S., Zhou, M., Qi, L., & Liu, H. (2019). Clustering-Algorithm-Based Rare-Event Evolution Analysis

via Social Media Data. IEEE Transactions on Computational Social Systems, 6(2), 301–310. https://

Goswami, A., & Kumar, A. (2016). A survey of event detection techniques in online social networks.

Social Network Analysis and Mining, 6(1).

Cui, W., Wang, P., Du, Y., Chen, X., Guo, D., Li, J., & Zhou, Y. (2017). An algorithm for event detection based

on social media data. Neurocomputing, 254, 53–58.

Papers with Code - The latest in Machine Learning. (2021, August 25). Papers with Code. Retrieved

August 25, 2021, from

Hamborg, F., Breitinger, C. & Gipp, B. (2019). Giveme5w1h: A universal system for extracting main

events from news articles. arXiv preprint arXiv:1909.02766

Du, X. & Cardie, C. (2020). Event extraction by answering (almost) natural questions. arXiv preprint


Liu, X., Luo, Z. & Huang, H. (2018). Jointly multiple events extraction via attention-based graph

information aggregation. arXiv preprint arXiv:1809.09078.

ENwiki-latest-all-titles. (2021). Wikimedia Downloads. Retrieved August 26, 2021, from http://

Wikipedia Keyphraseness. (2021). Aixin’s Homepage. Retrieved August 26, 2021, from https://

Mussina, A. & Aubakirov, S. (2017) Dictionary extraction based on statistical data. KazNU Bulletin.

Mathematics, Mechanics, Computer Science Series, 94(2), 72–82.

Barr, I. (2016, April 20). Heavy Metal and Natural Language Processing - Part 1. Degenerate State.

Retrieved September 20, 2016, from

SEDTWik-Event-Detection-from-Tweets. (2020, July 13). Github. Retrieved August 26, 2021, from



  • Ссылки не определены.

(P): 2707-9031
(E): 2707-904X

Articles are open access under the Creative Commons License  

Бизнес-центр EXPO, блок C.1.
Казахстан, 010000