Ongoing (48 months) - Started: 2018 Ends: 2022
Within this project, we propose to develop a multilingual surveillance system capable of detecting emerging crowds by identifying rising events that foster high focus, high energy and high emotion on social media. Our fundamental hypothesis is that virtual crowds evidence similar characteristics to real crowds, which may allow their modelization in terms of complex computer systems by relying on advanced natural language processing and machine learning techniques. The current project lays at the intersection of important scientific research topics, namely urban informatics, natural language processing for social media, predictive analytics over big social data and image semtiment analysis.
Ongoing (24 months) - Started: 2019 Ends: 2021
The objective of this project is to build an NLP cloud platform, which enables researchers and users to use language processing components and resources, following the software-as-a-service paradigms. The focus is on multilingual text analysis based on an open-source infrastructure and compliant with relevant NLP standards.
Current Editorial and Reviewer Board
HultigCrawler is a text crawler that crawls all the text from given website recursively. The crawled data is then saved as items. These items are URL, Title, Tags and Text. This data is then saved into database using scrapy pipelines.
A crawler to extract data from social networks. This crawler was developed in Java programming language and our interface was done with Java Swing. To get started, you only need to visit the social networks developer pages our crawler works with, and follow the steps on the respective pages to get an access token. Then put that access token on our application in the tabs of each social network you want to crawl and press start.
ExtremeSentilex is a lexicon of extreme sentiments created based on SentiWordNet and SenticNet we will soon provide an article where all the information of the research will be available. For now we have for download the file result of the research. One file with the lexicon and the classified datasets that we classified in order to validade our lexicon, a file with lexicon only and other with the classified datasets only.
Senta Web aims to provide an online way of automatically extracting expressions formed by sequences of lexicographic units lexicographical units (e.g. characters, words, punctuation marks), contiguous or non-contiguous, that are as syntactic-semantic units, with their own meaning.