Grouping similar sentences for faster intent training

Back in December 2020, we successfully extended a powerful Natural Language Processing (NLP) model called SentenceTransformers to Swedish language. We hate to brag, but after publishing the blog post about our achievement, we received a lot of attention 😎 So we decided to continue with another exciting project using the Swedish SentenceTransformers model, in which we aim to half-automate the intent training process by grouping similar sentences. Sounds complicated, doesn't it? Don't worry, we will make the explanation as simple as possible, so please keep on reading 👀

Problems with the intent training process

We already explained in the last post that Ebbot responses to you based on the purpose of your messages (intents) that he learns through example phrases. In order to provide the best customer experience, we continuously use data from real conversations between Ebbot and chatbot users - which are stored inside Ebbot's training center - in order to teach Ebbot new intents, or in some cases, provide more examples to improve his accuracy in detecting old intents.

Even though we love having a lot of data, sometimes it is extremely difficult for our Customer Implementation Manager team to sort out training data for Ebbot when there are thousands of sentences in the training center. Imagine going through that many sentences and deciding their intents, it's a lot of work! If only we can group all the similar sentences together, the training process will be so much faster... 🤔

Faster intent training by clustering sentences

Similarly to our past projects, we used Streamlit to build the similar sentences grouping web app. Combining our Swedish SentenceTransformers and UKPLab's community detection function allows us to have three different functions: Grouping a list of minimum five sentences and using .csv file exported from training center to receive results in another .csv file or a .txt file.

The special part about our app is that we utilize our spam classifier to remove all the spam messages and the app also offers a keywords suggestion feature to help deciding the intent of each group. Fortunately for us, Mutli-RAKE delivers exactly what we need. At the moment, RAKE supports up to 26 languages, so with just a few lines of code, you could also have the same feature as well! 😉

Allow us to show off a little bit about how flexible our clustering function is! For privacy reasons, we have to censor the information in the image below. We hope that you understand👇

Our app allows you to download results as a .csv file

Even though the program already works smoothly and we cannot wait to implement it into the system for our clients, it will take a while more for it to be integrated. Meanwhile we encourage you to follow our LinkedIn for weekly updates. Or if you have any questions, let's have a little chat and we will tell you more about us!

Mia

February 25, 2021

Läs mer

How the EU AI Act will shape the future of service automation

The clock is ticking. The EU AI Act is set to become law, reshaping how artificial intelligence is developed, deployed, and regulated in Europe. For organizations looking to integrate AI solutions, this legislation raises important questions about compliance, accountability, and the choice of AI providers.

January 15, 2025

Detecting toxic messages in Swedish language

Even though the rapid development of Internet and social media contributes significantly to human connection, it is undeniable that this is also the very reason why toxic behaviors become more common online. Thus, toxic comments classification has been researched by experts in the Machine Learning…

Mia

March 22, 2021