diff --git a/README.md b/README.md index b693d91..4155d63 100644 --- a/README.md +++ b/README.md @@ -1,40 +1,46 @@ ![origin_github_banner](https://user-images.githubusercontent.com/673455/37314301-f8db9a90-2618-11e8-8fee-b44f38febf38.png) - + Head to https://www.originprotocol.com/developers to learn more about what we're building and how to get involved. # Telegram Bot - Deletes messages matching specified patterns -- Bans users for posting messagses matching specified patterns +- Bans users for posting messages matching specified patterns - Bans users with usernames matching specified patterns -- Records logs of converstations +- Records logs of conversations +- Logs an English translation of any foreign languages using Google Translate +- Uses textblob for basic sentiment analysis of both polarity and subjectivity ## Installation - - Required: Python 3.x, pip, PostgreSQL - - Create virtualenv - - Clone this repo - - `pip install --upgrade -r requirements.txt` +- Required: Python 3.x, pip, PostgreSQL +- Create virtualenv +- Clone this repo +- `pip install --upgrade -r requirements.txt` ## Database setup - - Store database URL in environment variable. - ``` - export TELEGRAM_BOT_POSTGRES_URL="postgresql://:@localhost:5432/" - ``` - - Run: `python model.py` to setup the DB tables. + +- Store database URL in environment variable. + +``` +export TELEGRAM_BOT_POSTGRES_URL="postgresql://:@localhost:5432/" +``` + +- Run: `python model.py` to setup the DB tables. ## Setup - - Create a Telegram bot by talking to `@BotFather` : https://core.telegram.org/bots#creating-a-new-bot - - Use `/setprivacy` with `@BotFather` in order to allow it to see all messages in a group. - - Store your Telegram Bot Token in environment variable `TELEGRAM_BOT_TOKEN`. It will look similar to this: +- Create a Telegram bot by talking to `@BotFather` : https://core.telegram.org/bots#creating-a-new-bot +- Use `/setprivacy` with `@BotFather` in order to allow it to see all messages in a group. +- Store your Telegram Bot Token in environment variable `TELEGRAM_BOT_TOKEN`. It will look similar to this: - ``` - export TELEGRAM_BOT_TOKEN="4813829027:ADJFKAf0plousH2EZ2jBfxxRWFld3oK34ya" - ``` - - Create your Telegram group. - - Add your bot to the group like so: https://stackoverflow.com/questions/37338101/how-to-add-a-bot-to-a-telegram-group - - Make your bot an admin in the group +``` +export TELEGRAM_BOT_TOKEN="4813829027:ADJFKAf0plousH2EZ2jBfxxRWFld3oK34ya" +``` + +- Create your Telegram group. +- Add your bot to the group like so: https://stackoverflow.com/questions/37338101/how-to-add-a-bot-to-a-telegram-group +- Make your bot an admin in the group ## Configuration with ENV vars @@ -44,11 +50,28 @@ Head to https://www.originprotocol.com/developers to learn more about what we're - `CHAT_IDS` : **REQUIRED**. Comma-seperated list of IDs of chat(s) that should be monitored. To find out the ID of a chat, add the bot to a chat and type some messages there. The bot log will report an error that it got messages `from chat_id not being monitored: XXX` where XXX is the chat ID. e.g. `-240532994,-150531679` - `TELEGRAM_BOT_TOKEN` : **REQUIRED**. Token for bot to control. e.g. `4813829027:ADJFKAf0plousH2EZ2jBfxxRWFld3oK34ya` - `TELEGRAM_BOT_POSTGRES_URL` : **REQUIRED**. URI for postgres instance to log activity to. e.g. `postgresql://localhost/postgres` -- `DEBUG` : If set to anything except `false`, will put bot into debug mode. This means that all actions will be logged into the chat itself, and more things will be logged. -- `ADMIN_EXEMPT` : If set to anything except `false`, admin users will be exempt from monitoring. Reccomended to be set, but useful to turn off for debugging. +- `DEBUG` : If set to anything except `false`, will put bot into debug mode. This means that all actions will be logged into the chat itself, and more things will be logged. +- `ADMIN_EXEMPT` : If set to anything except `false`, admin users will be exempt from monitoring. Reccomended to be set, but useful to turn off for debugging. - `NOTIFY_CHAT` : ID of chat to report actions. Can be useful if you have an admin-only chat where you want to monitor the bot's activity. E.g. `-140532994` +## Download the corpus for Textblob + +For sentiment analysis to work, you'll need to download the latest corpus file for textblob. You can do this by running: + +``` +python -m textblob.download_corpora +``` + +If you're running the bot on Heroku, set an environment variable named `NLTK_DATA` to `/app/nltk_data` by running: + +``` +heroku config:set NLTK_DATA='/app/nltk_data' +``` + +## Message ban patterns + Sample bash file to set `MESSAGE_BAN_PATTERNS`: + ``` read -r -d '' MESSAGE_BAN_PATTERNS << 'EOF' # ETH Address @@ -60,15 +83,17 @@ read -r -d '' MESSAGE_BAN_PATTERNS << 'EOF' EOF ``` -## Attachements +## Attachments -By default, any attachments other than images or animations will cause the message to be hidden. +By default, any attachments other than images or animations will cause the message to be hidden. ## Running ### Locally - - Run: `python bot.py` to start logger - - Messages will be displayed on `stdout` as they are logged. + +- Run: `python bot.py` to start logger +- Messages will be displayed on `stdout` as they are logged. ### On Heroku - - You must enable the worker on Heroku app dashboard. (By default it is off.) + +- You must enable the worker on Heroku app dashboard. (By default it is off.) diff --git a/bin/install_textblob_corpora b/bin/install_textblob_corpora new file mode 100644 index 0000000..47e2819 --- /dev/null +++ b/bin/install_textblob_corpora @@ -0,0 +1,19 @@ +#!/usr/bin/env bash + +source $BIN_DIR/utils + +echo "-----> Starting corpora installation" + +# Assumes NLTK_DATA environment variable is already set +# $ heroku config:set NLTK_DATA='/app/nltk_data' + +# Install the default corpora to NLTK_DATA directory +python -m textblob.download_corpora + +# Open the NLTK_DATA directory +cd ${NLTK_DATA} + +# Delete all of the zip files in the NLTK DATA directory +find . -name "*.zip" -type f -delete + +echo "-----> Finished corpora installatio" diff --git a/bin/post_compile b/bin/post_compile new file mode 100644 index 0000000..6078c43 --- /dev/null +++ b/bin/post_compile @@ -0,0 +1,9 @@ +#!/usr/bin/env bash + +if [ -f bin/install_textblob_corpora ]; then + echo "-----> Running install_textblob_corpora" + chmod +x bin/install_textblob_corpora + bin/install_textblob_corpora +fi + +echo "-----> Post-compile done" diff --git a/bot.py b/bot.py index ad1f3af..d5a6e1d 100644 --- a/bot.py +++ b/bot.py @@ -19,6 +19,7 @@ import re import unidecode from mwt import MWT from googletrans import Translator +from textblob import TextBlob class TelegramMonitorBot: @@ -26,12 +27,12 @@ class TelegramMonitorBot: def __init__(self): self.debug = ( (os.environ.get('DEBUG') is not None) and - (os.environ.get('DEBUG').upper() != "false")) + (os.environ.get('DEBUG').lower() != "false")) # Are admins exempt from having messages checked? self.admin_exempt = ( (os.environ.get('ADMIN_EXEMPT') is not None) and - (os.environ.get('ADMIN_EXEMPT').upper() != "false")) + (os.environ.get('ADMIN_EXEMPT').lower() != "false")) if (self.debug): print("🔵 debug:", self.debug) @@ -304,20 +305,26 @@ class TelegramMonitorBot: return bool_set - def log_message(self, user_id, user_message, chat_id): try: s = session() language_code = english_message = "" + polarity = subjectivity = 0.0 try: + # translate to English & log the original language translator = Translator() translated = translator.translate(user_message) language_code = translated.src english_message = translated.text + # run basic sentiment analysis on the translated English string + analysis = TextBlob(english_message) + polarity = analysis.sentiment.polarity + subjectivity = analysis.sentiment.subjectivity except Exception as e: print(e.message) - msg1 = Message(user_id=user_id, message=user_message, - chat_id=chat_id, language_code=language_code, english_message=english_message) + msg1 = Message(user_id=user_id, message=user_message, chat_id=chat_id, + language_code=language_code, english_message=english_message, polarity=polarity, + subjectivity=subjectivity) s.add(msg1) s.commit() s.close() diff --git a/model.py b/model.py index 98a96d5..35d62cf 100644 --- a/model.py +++ b/model.py @@ -1,4 +1,4 @@ -from sqlalchemy import Column, DateTime, BigInteger, String, Integer, ForeignKey, func +from sqlalchemy import Column, DateTime, BigInteger, String, Integer, Numeric, ForeignKey, func from sqlalchemy.orm import relationship, backref from sqlalchemy.ext.declarative import declarative_base import os @@ -30,9 +30,10 @@ class Message(Base): language_code = Column(String) english_message = Column(String) chat_id = Column(BigInteger) + polarity = Column(Numeric) + subjectivity = Column(Numeric) time = Column(DateTime, default=func.now()) - class MessageHide(Base): __tablename__ = 'telegram_message_hides' id = Column(Integer, primary_key=True) diff --git a/requirements.txt b/requirements.txt index f5a5e0e..f75b495 100644 --- a/requirements.txt +++ b/requirements.txt @@ -4,3 +4,6 @@ SQLAlchemy==1.2.2 configparser==3.5.0 Unidecode==1.0.22 googletrans==2.4.0 +textblob==0.15.3 +ipython==5.5.0 +