This work was partially supported by the Norte Portugal
Regional Operational Programme (NORTE 2020), under the PORTUGAL 2020 Partnership
Agreement, through the European Regional Development Fund (ERDF),
within project “CybersSeCIP” (NORTE-01-0145-FEDER-000044)
We live in a world where computers are constantly changing
the way we do things. People spend many hours on their phones or
computers, whether it be for work or leisure purposes. The danger is
that these unsuspecting users can be targeted for attacks at any time
and can fall victim to many types of scams or phishing attacks. These
attacks can be harmful to the user by getting valuable credentials, money
or even installing malicious software on their devices, all while the user
is unaware of what has just happened. In a business environment these
can lead to mass data breeches which could end up costing a company
millions of euros. Many users are not trained to recognize phishing texts,
so an alternative solution is needed to help prevent users from falling
into these traps. In this paper we will be investigating Natural Language
Processing (NLP), a subsection of Machine Learning (ML) to try generate
solutions to the problem of phishing. We will investigating different
NLP solutions: Word2Vec, Doc2Vec and BERT, and different ML solutions:
RNN, LSTM, CNN and TD-IDF. All of these different approaches
provide good classification results ranging from f1-scores of 90.03–98.94.