Run python init.py in Command Prompt to start engine.Make necessary changes in the config.py file in * engine\utilities*.Run python_path.bat to add PYTHONPATH env variable.Optionally, you can download parts of this dataset from the Parts folder, each (dataset*.rar) containing 200,000 tweets.The full_dataset.rar contains all 2 Million tweets.Download dataset(s) from the Drive folder.config.py in the utilities package stores tuning parameters such as 'alarm' times, file limit etc.manager.py makes us of multiprocess and subprocess to spawn extractor, preprocessor and postprocessor as separate processes.Control of engine starts with manager.py.The graph, related tweets and summarizations of the URLs along with the hyperlinks is displayed for each topic on the portal.The graph is approximated as usual but the time span has to be discussed upon. ![]() A portion of the main content can be displayed after extraction.Webpages corresponding to the URLs are downloaded and parsed.URLs are extracted for each topic which seem to be most relevant.The tweets are iterated individually to find the topic to which it belongs.If 100 documents is too low, we can split the big documents into smaller ones. Using each of the 100 collections as a separate document, LDA is performed.Before it is dumped into the collection, sentiment analysis is done on them.Based on the aggregation, top 100 entities are found and the respective tweets are clubbed into one collection. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |