back to all articles

Grouping similar sentences for faster intent training

Back in December 2020, we successfully extended a powerful Natural Language Processing (NLP) model called SentenceTransformers to Swedish language. We hate to brag, but after publishing the blog post about our achievement, we received a lot of attention 😎 So we decided to continue with another exciting project using the Swedish SentenceTransformers model, in which we aim to half-automate the intent training process by grouping similar sentences. Sounds complicated, doesn't it? Don't worry, we will make the explanation as simple as possible, so please keep on reading 👀

Problems with the intent training process

We already explained in the last post that Ebbot responses to you based on the purpose of your messages (intents) that he learns through example phrases. In order to provide the best customer experience, we continuously use data from real conversations between Ebbot and chatbot users - which are stored inside Ebbot's training center - in order to teach Ebbot new intents, or in some cases, provide more examples to improve his accuracy in detecting old intents.

Even though we love having a lot of data, sometimes it is extremely difficult for our Customer Implementation Manager team to sort out training data for Ebbot when there are thousands of sentences in the training center. Imagine going through that many sentences and deciding their intents, it's a lot of work! If only we can group all the similar sentences together, the training process will be so much faster... 🤔

Faster intent training by clustering sentences

Similarly to our past projects, we used Streamlit to build the similar sentences grouping web app. Combining our Swedish SentenceTransformers and UKPLab's community detection function allows us to have three different functions: Grouping a list of minimum five sentences and using .csv file exported from training center to receive results in another .csv file or a .txt file. 

The special part about our app is that we utilize our spam classifier to remove all the spam messages and the app also offers a keywords suggestion feature to help deciding the intent of each group. Fortunately for us, Mutli-RAKE delivers exactly what we need. At the moment, RAKE supports up to 26 languages, so with just a few lines of code, you could also have the same feature as well! 😉

Allow us to show off a little bit about how flexible our clustering function is! For privacy reasons, we have to censor the information in the image below. We hope that you understand👇

Our app allows you to download results as a .csv file

Even though the program already works smoothly and we cannot wait to implement it into the system for our clients, it will take a while more for it to be integrated. Meanwhile we encourage you to follow our LinkedIn for weekly updates. Or if you have any questions, let's have a little chat and we will tell you more about us!

Mia
February 25, 2021