Word embedding is an natural language processing methodology for mapping words or phrases from a lexicon to a corresponding vector of real numbers, which is then used to find word predictions and semantics. Data cleaning is essential before classification to remove the symbols, spacy tokenizer URLs, emails, stopwords, white space, numbers, punctuation, stemming, lemmatization, and single tokens. To proceed with these datasets, we will first have to adopt one single classification technique, and for this purpose, 0-1 classifier was used, which will tell us if the text contains content of cyberbullying or not, thus making it a black and white area to train our model and eliminating any gray possibilities. As we have acquired these three datasets from different sources, compiling them in their original form will not be compatible because of the difference in classification labels. One contains English texts, the other contains Hindi texts, and the last one contains a combination of Hindi and English texts. As there is little to no work done in aiding the situation of increased cyberbullying in a country like India where most Hindi speaking people use English text, comprising of Hindi words written in Latin script, and many people using Hindi text written in Devanagari script, we plan to proceed to combat this problem by incorporating such data into our suggested learning algorithm so that cyberbullying can be detected in real-time tweets.ĭata have been collected from three sources and then combined. However, most works have included mostly English data for training and testing purposes, while a few included native languages like Bangla, Arabic, and Urdu. Many pieces of research work that are done in this area using various machine learning and deep learning techniques have yielded significant results in detecting and preventing cyberbully. Additionally, our application can recognise cyberbullying posts which were written in English, Hindi, and Hinglish (Multilingual data). Recent studies has shown that deep neural network-based approaches are more effective than conventional techniques at detecting cyberbullying texts. In this work, we proposed a deep learning framework that will evaluate real-time twitter tweets or social media posts as well as correctly identify any cyberbullying content in them. We have proposed a cyberbullying detection system to address this issue. On the majority of social networks, automated detection of these situations necessitates the use of intelligent systems. Cyberbullying is common on social media, and people often end up in a mental breakdown state instead of taking action against the bully. It can take many forms, but the most common is a textual format. Cyberbullying is a form of online harassment that is both unsettling and troubling. With this extensive use of social media comes many downsides and one of the downsides is cyberbully. With the current pandemic scenario, this engagement has only increased as people often rely on social media platforms to express their emotions, find comfort, find like-minded individuals, and form communities. Nowadays, a lot of people indulge themselves in the world of social media.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |