From 82787cc428bc7a4cba4941018b97bf632518c5c9 Mon Sep 17 00:00:00 2001 From: Josh Fraser Date: Mon, 27 Jan 2020 19:52:52 -0800 Subject: [PATCH 1/9] update readme --- README.md | 58 +++++++++++++++++++++++++++++++------------------------ 1 file changed, 33 insertions(+), 25 deletions(-) diff --git a/README.md b/README.md index b693d91..bc0c466 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,5 @@ ![origin_github_banner](https://user-images.githubusercontent.com/673455/37314301-f8db9a90-2618-11e8-8fee-b44f38febf38.png) - + Head to https://www.originprotocol.com/developers to learn more about what we're building and how to get involved. # Telegram Bot @@ -8,33 +8,38 @@ Head to https://www.originprotocol.com/developers to learn more about what we're - Bans users for posting messagses matching specified patterns - Bans users with usernames matching specified patterns - Records logs of converstations +- Translates foreign languages to English using Google Translate ## Installation - - Required: Python 3.x, pip, PostgreSQL - - Create virtualenv - - Clone this repo - - `pip install --upgrade -r requirements.txt` +- Required: Python 3.x, pip, PostgreSQL +- Create virtualenv +- Clone this repo +- `pip install --upgrade -r requirements.txt` ## Database setup - - Store database URL in environment variable. - ``` - export TELEGRAM_BOT_POSTGRES_URL="postgresql://:@localhost:5432/" - ``` - - Run: `python model.py` to setup the DB tables. + +- Store database URL in environment variable. + +``` +export TELEGRAM_BOT_POSTGRES_URL="postgresql://:@localhost:5432/" +``` + +- Run: `python model.py` to setup the DB tables. ## Setup - - Create a Telegram bot by talking to `@BotFather` : https://core.telegram.org/bots#creating-a-new-bot - - Use `/setprivacy` with `@BotFather` in order to allow it to see all messages in a group. - - Store your Telegram Bot Token in environment variable `TELEGRAM_BOT_TOKEN`. It will look similar to this: +- Create a Telegram bot by talking to `@BotFather` : https://core.telegram.org/bots#creating-a-new-bot +- Use `/setprivacy` with `@BotFather` in order to allow it to see all messages in a group. +- Store your Telegram Bot Token in environment variable `TELEGRAM_BOT_TOKEN`. It will look similar to this: - ``` - export TELEGRAM_BOT_TOKEN="4813829027:ADJFKAf0plousH2EZ2jBfxxRWFld3oK34ya" - ``` - - Create your Telegram group. - - Add your bot to the group like so: https://stackoverflow.com/questions/37338101/how-to-add-a-bot-to-a-telegram-group - - Make your bot an admin in the group +``` +export TELEGRAM_BOT_TOKEN="4813829027:ADJFKAf0plousH2EZ2jBfxxRWFld3oK34ya" +``` + +- Create your Telegram group. +- Add your bot to the group like so: https://stackoverflow.com/questions/37338101/how-to-add-a-bot-to-a-telegram-group +- Make your bot an admin in the group ## Configuration with ENV vars @@ -44,11 +49,12 @@ Head to https://www.originprotocol.com/developers to learn more about what we're - `CHAT_IDS` : **REQUIRED**. Comma-seperated list of IDs of chat(s) that should be monitored. To find out the ID of a chat, add the bot to a chat and type some messages there. The bot log will report an error that it got messages `from chat_id not being monitored: XXX` where XXX is the chat ID. e.g. `-240532994,-150531679` - `TELEGRAM_BOT_TOKEN` : **REQUIRED**. Token for bot to control. e.g. `4813829027:ADJFKAf0plousH2EZ2jBfxxRWFld3oK34ya` - `TELEGRAM_BOT_POSTGRES_URL` : **REQUIRED**. URI for postgres instance to log activity to. e.g. `postgresql://localhost/postgres` -- `DEBUG` : If set to anything except `false`, will put bot into debug mode. This means that all actions will be logged into the chat itself, and more things will be logged. -- `ADMIN_EXEMPT` : If set to anything except `false`, admin users will be exempt from monitoring. Reccomended to be set, but useful to turn off for debugging. +- `DEBUG` : If set to anything except `false`, will put bot into debug mode. This means that all actions will be logged into the chat itself, and more things will be logged. +- `ADMIN_EXEMPT` : If set to anything except `false`, admin users will be exempt from monitoring. Reccomended to be set, but useful to turn off for debugging. - `NOTIFY_CHAT` : ID of chat to report actions. Can be useful if you have an admin-only chat where you want to monitor the bot's activity. E.g. `-140532994` Sample bash file to set `MESSAGE_BAN_PATTERNS`: + ``` read -r -d '' MESSAGE_BAN_PATTERNS << 'EOF' # ETH Address @@ -62,13 +68,15 @@ EOF ## Attachements -By default, any attachments other than images or animations will cause the message to be hidden. +By default, any attachments other than images or animations will cause the message to be hidden. ## Running ### Locally - - Run: `python bot.py` to start logger - - Messages will be displayed on `stdout` as they are logged. + +- Run: `python bot.py` to start logger +- Messages will be displayed on `stdout` as they are logged. ### On Heroku - - You must enable the worker on Heroku app dashboard. (By default it is off.) + +- You must enable the worker on Heroku app dashboard. (By default it is off.) From 4e1e58612318cc4d7e49cc7175117a48898e0e4d Mon Sep 17 00:00:00 2001 From: Josh Fraser Date: Mon, 27 Jan 2020 20:08:00 -0800 Subject: [PATCH 2/9] setup textblob for sentiment analysis --- README.md | 2 +- bin/install_textblob_corpora | 19 +++++++++++++++++++ bin/post_compile | 9 +++++++++ model.py | 3 ++- requirements.txt | 1 + 5 files changed, 32 insertions(+), 2 deletions(-) create mode 100644 bin/install_textblob_corpora create mode 100644 bin/post_compile diff --git a/README.md b/README.md index bc0c466..ce6b554 100644 --- a/README.md +++ b/README.md @@ -8,7 +8,7 @@ Head to https://www.originprotocol.com/developers to learn more about what we're - Bans users for posting messagses matching specified patterns - Bans users with usernames matching specified patterns - Records logs of converstations -- Translates foreign languages to English using Google Translate +- Logs an English translation of any foreign languages using Google Translate ## Installation diff --git a/bin/install_textblob_corpora b/bin/install_textblob_corpora new file mode 100644 index 0000000..47e2819 --- /dev/null +++ b/bin/install_textblob_corpora @@ -0,0 +1,19 @@ +#!/usr/bin/env bash + +source $BIN_DIR/utils + +echo "-----> Starting corpora installation" + +# Assumes NLTK_DATA environment variable is already set +# $ heroku config:set NLTK_DATA='/app/nltk_data' + +# Install the default corpora to NLTK_DATA directory +python -m textblob.download_corpora + +# Open the NLTK_DATA directory +cd ${NLTK_DATA} + +# Delete all of the zip files in the NLTK DATA directory +find . -name "*.zip" -type f -delete + +echo "-----> Finished corpora installatio" diff --git a/bin/post_compile b/bin/post_compile new file mode 100644 index 0000000..6078c43 --- /dev/null +++ b/bin/post_compile @@ -0,0 +1,9 @@ +#!/usr/bin/env bash + +if [ -f bin/install_textblob_corpora ]; then + echo "-----> Running install_textblob_corpora" + chmod +x bin/install_textblob_corpora + bin/install_textblob_corpora +fi + +echo "-----> Post-compile done" diff --git a/model.py b/model.py index 98a96d5..729dd88 100644 --- a/model.py +++ b/model.py @@ -30,9 +30,10 @@ class Message(Base): language_code = Column(String) english_message = Column(String) chat_id = Column(BigInteger) + polarity = Column(Numeric) + subjectivity = Column(Numeric) time = Column(DateTime, default=func.now()) - class MessageHide(Base): __tablename__ = 'telegram_message_hides' id = Column(Integer, primary_key=True) diff --git a/requirements.txt b/requirements.txt index f5a5e0e..c34709e 100644 --- a/requirements.txt +++ b/requirements.txt @@ -4,3 +4,4 @@ SQLAlchemy==1.2.2 configparser==3.5.0 Unidecode==1.0.22 googletrans==2.4.0 +textblob From 79ec4c1abf7981a84f842b5661ab7c1162d98af8 Mon Sep 17 00:00:00 2001 From: Josh Fraser Date: Mon, 27 Jan 2020 20:12:04 -0800 Subject: [PATCH 3/9] add ipython --- requirements.txt | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/requirements.txt b/requirements.txt index c34709e..f75b495 100644 --- a/requirements.txt +++ b/requirements.txt @@ -4,4 +4,6 @@ SQLAlchemy==1.2.2 configparser==3.5.0 Unidecode==1.0.22 googletrans==2.4.0 -textblob +textblob==0.15.3 +ipython==5.5.0 + From 6fd5bd1020dff46fef89139fd77cb4fc8b229855 Mon Sep 17 00:00:00 2001 From: Stan James Date: Mon, 27 Jan 2020 20:14:20 -0800 Subject: [PATCH 4/9] Fix disabling of `DEBUG` and `ADMIN_EXEMPT` As written (by unknown idiot), there was no way to actually set `DEBUG` and `ADMIN_EXEMPT` env vars in a way that disabled them (as claimed by docs), as their value was converted to uppercase and then tested for a lower case `false`. Fixed to forcing to _lowercase_ with `.lower()`. --- bot.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/bot.py b/bot.py index 6303b98..eac1a92 100644 --- a/bot.py +++ b/bot.py @@ -26,12 +26,12 @@ class TelegramMonitorBot: def __init__(self): self.debug = ( (os.environ.get('DEBUG') is not None) and - (os.environ.get('DEBUG').upper() != "false")) + (os.environ.get('DEBUG').lower() != "false")) # Are admins exempt from having messages checked? self.admin_exempt = ( (os.environ.get('ADMIN_EXEMPT') is not None) and - (os.environ.get('ADMIN_EXEMPT').upper() != "false")) + (os.environ.get('ADMIN_EXEMPT').lower() != "false")) if (self.debug): print("🔵 debug:", self.debug) From d6bdb9655ab048d2cd44ea6736bdb6cbcddad8ed Mon Sep 17 00:00:00 2001 From: Josh Fraser Date: Mon, 27 Jan 2020 20:26:26 -0800 Subject: [PATCH 5/9] add basic sentiment analysis --- bot.py | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/bot.py b/bot.py index 6303b98..c4431a8 100644 --- a/bot.py +++ b/bot.py @@ -19,6 +19,7 @@ import re import unidecode from mwt import MWT from googletrans import Translator +from textblob import TextBlob class TelegramMonitorBot: @@ -303,20 +304,26 @@ class TelegramMonitorBot: return bool_set - def log_message(self, user_id, user_message, chat_id): try: s = session() language_code = english_message = "" + polarity = subjectivity = 0.0 try: + # translate to English & log the original language translator = Translator() translated = translator.translate(user_message) language_code = translated.src english_message = translated.text + # run basic sentiment analysis on the translated English string + analysis = TextBlob(english_message) + polarity = analysis.sentiment.polarity + subjectivity = analysis.sentiment.subjectivity except Exception as e: print(e.message) - msg1 = Message(user_id=user_id, message=user_message, - chat_id=chat_id, language_code=language_code, english_message=english_message) + msg1 = Message(user_id=user_id, message=user_message, chat_id=chat_id, + language_code=language_code, english_message=english_message, polarity=polarity, + subjectivity=subjectivity) s.add(msg1) s.commit() s.close() From 64de9239829508a1df7a49cb278ae6015c12f078 Mon Sep 17 00:00:00 2001 From: Josh Fraser Date: Mon, 27 Jan 2020 20:28:14 -0800 Subject: [PATCH 6/9] add Numeric type to imports --- model.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/model.py b/model.py index 729dd88..35d62cf 100644 --- a/model.py +++ b/model.py @@ -1,4 +1,4 @@ -from sqlalchemy import Column, DateTime, BigInteger, String, Integer, ForeignKey, func +from sqlalchemy import Column, DateTime, BigInteger, String, Integer, Numeric, ForeignKey, func from sqlalchemy.orm import relationship, backref from sqlalchemy.ext.declarative import declarative_base import os From dc18b2a7a3fe196a9005a3a575425bcae394a21e Mon Sep 17 00:00:00 2001 From: Josh Fraser Date: Mon, 27 Jan 2020 21:44:51 -0800 Subject: [PATCH 7/9] update readme to explain sentiment analysis --- README.md | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index ce6b554..88db6a8 100644 --- a/README.md +++ b/README.md @@ -9,6 +9,7 @@ Head to https://www.originprotocol.com/developers to learn more about what we're - Bans users with usernames matching specified patterns - Records logs of converstations - Logs an English translation of any foreign languages using Google Translate +- Uses textblob for basic sentiment analysis of both polarity and subjectivity ## Installation @@ -53,6 +54,20 @@ export TELEGRAM_BOT_TOKEN="4813829027:ADJFKAf0plousH2EZ2jBfxxRWFld3oK34ya" - `ADMIN_EXEMPT` : If set to anything except `false`, admin users will be exempt from monitoring. Reccomended to be set, but useful to turn off for debugging. - `NOTIFY_CHAT` : ID of chat to report actions. Can be useful if you have an admin-only chat where you want to monitor the bot's activity. E.g. `-140532994` +## Download the corpus for Textblob + +For sentiment analysis to work, you'll need to download the latest corpus file for textblob. You can do this by running: + +``` +python -m textblob.download_corpora +``` + +If you're running the bot on Heroku, set an environment variable named `NLTK_DATA` to `/app/nltk_data` by running: + +``` +heroku config:set NLTK_DATA='/app/nltk_data' +``` + Sample bash file to set `MESSAGE_BAN_PATTERNS`: ``` @@ -66,7 +81,7 @@ read -r -d '' MESSAGE_BAN_PATTERNS << 'EOF' EOF ``` -## Attachements +## Attachments By default, any attachments other than images or animations will cause the message to be hidden. From 3a12a90198c881d9e4192403fedb5e0aa3f218be Mon Sep 17 00:00:00 2001 From: Josh Fraser Date: Mon, 27 Jan 2020 21:46:54 -0800 Subject: [PATCH 8/9] update readme to explain sentiment analysis --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index 88db6a8..1716735 100644 --- a/README.md +++ b/README.md @@ -68,6 +68,8 @@ If you're running the bot on Heroku, set an environment variable named `NLTK_DAT heroku config:set NLTK_DATA='/app/nltk_data' ``` +## Message ban patterns + Sample bash file to set `MESSAGE_BAN_PATTERNS`: ``` From 52dfb482a71820243022a948dc6916c1f88ad2bd Mon Sep 17 00:00:00 2001 From: Josh Fraser Date: Mon, 27 Jan 2020 21:48:00 -0800 Subject: [PATCH 9/9] fix typos --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 1716735..4155d63 100644 --- a/README.md +++ b/README.md @@ -5,9 +5,9 @@ Head to https://www.originprotocol.com/developers to learn more about what we're # Telegram Bot - Deletes messages matching specified patterns -- Bans users for posting messagses matching specified patterns +- Bans users for posting messages matching specified patterns - Bans users with usernames matching specified patterns -- Records logs of converstations +- Records logs of conversations - Logs an English translation of any foreign languages using Google Translate - Uses textblob for basic sentiment analysis of both polarity and subjectivity