Machine Learning In Browser
You have to grow from the inside out. None can teach you, none can make you spiritual. There is no other teacher but your own soul.
-By Swami Vivekanand
This blog guides you on how to implement your NLP model in the browser using tensorflow.js and node.js. In this, I used an NLP model that prevents adding abusive or obscene content in the blog's comment box.
Topics:
1)FrontEnd(Blog made using HTML and CSS)
2)Prerequisite for the NLP model in tfjs.
3)node server and model loading
4)model prediction
Introduction:Writing blogs online is very common but sometimes people write abusive content in the comment section. That is nothing but cyberbullying so to prevent that this blog implementation censor abusive comment and it will not be added.
FrontEnd(Blog made using HTML and CSS)
Front-end consist of a HTML page name predict.html .It is basic Page which is created using HTML,CSS,Javascript and bootstrap etc.
Prerequisite for NLP model in tfjs Steps Required before implementing the NLP model in chrome extension using tensorflow.js.
i)Converting python model into tfjs.model
While implementing the model in the browser we need model.json file so that we can make HTTP calls hence we convert python model since it is in the model.h5 format.
To know more>>click the below links
ii)Preprocessing in Javascript Preprocessing is the first step while creating any machine learning model.
Text preprocessing is traditionally an important step for natural language processing (NLP) tasks. It transforms text into a more digestible form so that machine learning algorithms can perform better. To know more>>click the below links https://towardsdatascience.com/nlp-text-preprocessing-a-practical-guide-and-template-d80874676e79
So once the python model converted in tfjs model before giving the model we have to convert all the preprocessing rule in javascript that we applied in python while training. Below is some preprocessing javascript code.
function word_preprocessor(word) {
word = word.replace(/[-|.|,|\?|\!]+/g, '');
word = word.toLowerCase();
if (word != '') {
return word;
} else {
return '.';
}
};
.
stopwords = ['ke','ka','mein','ki','hai','yah','aur','se','hain','ko','par','iss','hota','jo','kar','me','gaya','karne','kiya','liye','apne',
'ne','bani','nahi','toh','hi','ya','avam','diya','ho','iska','tha','dhvara','hua','tak','saath','karna','vaale''baad','liya','aap',
'kuchh','sakte','kisi','ye','iska','sabse','ismein','the','do','hone','vah','ve','karte','bahut','kaha','varg','kai','karein',
'hoti','apni','unke','thi','yadi','hui','jaa','na','ise','kehte','kahte','jab','hote','koi','hue','va','abhi','jaise',
'sab\mjhb hi','karta','unki','tarah','uss','aadi','kul','raha','iski','sakta','rahe','unka','issi','rakhein','apna','pe','uske','to','bhi','or','kya'];
function remove_stopwords(str) {
res = []
words = str.split(' ')
for(i=0;i<words.length;i++) {
word_clean = words[i].split(".").join("")
if(!stopwords.includes(word_clean)) {
res.push(word_clean)
}
}
return(res.join(' '))
}
Stopword is an array that consists of Hinglish stopwords. word.split(): It is used for splitting the sentences into words. Using for loop we compare words with Hinglish stopword array and then remove from the sentence.
function make_sequences(words_array) {
let sequence = Array();
words_array.slice(0, MAX_SEQUENCE_LENGTH).forEach(function (word) {
// console.log('Word:', word, words_vocab.default[word]);//
words_vocab.default); MAX_SEQUENCE_LENGTH=250
word = word_preprocessor(word);
let id = words_vocab.default[word];
// console.log('Word: ', words_vocab.default, words_vocab['<UNK>']);
if (id == undefined) {
sequence.push(words_vocab['<UNK>']);
} else {
sequence.push(id);
}
});
if (sequence.length < MAX_SEQUENCE_LENGTH) {
let pad_array = Array(MAX_SEQUENCE_LENGTH - sequence.length);
pad_array.fill(0);
sequence = sequence.concat(pad_array);
console.log(sequence);
}
return sequence;
};
var words_vocab = { "madarchod": 1, "meri": 2, "hai": 3, "patni": 4, "randi": 5, "lund": 6, "ker": 7, "sara": 8, "ek": 9, "taqid": 10, "wo": 11, "mere": 12, "chinaal": 13, "rahi": 14, "dost": 15, "bhaut": 16, "wife": 17, "huwe": 18, "bhai": 19, "chut": 20}
Since while giving input to the model it should be in array format, hence we have to use tokenizer to convert in the respected format in which model trained. Word_vocab is an array that is token on the model is trained. The sentence which we have to predict first we convert into word then will match within word_vocab array and assign the id number. If the word is not present in word_vocab it assigns Unknown. As there is a limit on the length of the sentence, so if the sentence is small, we append zero’s. For additional information on how to do preprocessing in javascript
To know more>>click the below links: https://blog.logrocket.com/natural-language-processing-for-node-js/ As you know, neural networks can’t work with words, only with numbers. That’s why we should represent words as numbers. It’s not a difficult task, we can enumerate all unique words and write the number of words instead of the word. For storing numbers and words, we can make vocabulary. This vocabulary should support the words “unknown” (<UNK>) because when we will make a prediction for a new string with new words that are not in the vocabulary. Word “padded” (<PAD>) because for the neural network, all strings should have the same size, and when some string will smaller than another, we fill the gaps with this word.
node server and model loading For running the server and installing all the node dependencies use the following command. npm install && node server.js in your terminal. After that, you can get access to your files in the browser by http://localhost:8081
(async function () {
model = await tf.loadLayersModel('http://localhost:81/model/nlp_model/model.json');
//console.log("hello");
$('.loading-model').remove();
})();
After the server starts give the path of the model to tf.loadLayerModel(). In server.js we defined the static folder for storing model, js scripts and all other files. The below code is used to create a node server.
let express = require("express")
let app = express();
app.use(function(req, res, next) {
console.log(`${new Date()} - ${req.method} request for ${req.url}`);
next();
});
app.use(express.static("../static"));
app.listen(81, function() {
console.log("Serving static on http://localhost:81");
});
Model Prediction After setting up the node server the below function is used to get the sentence from the HTML page and make the prediction.
$("#get_ner_button").click(async function () {
$(".main-result").html(" ");
let wo=$('#input_text').val();
//$(".tags-result").html("<h5>Tags review</h5>Abusive=true");
//console.log("bye");
let words = $('#input_text').val().split(' ');
//console.log(words);
//$(".main-result").append(words);
let sequence = make_sequences(words);
let tensor = tf.tensor1d(sequence, dtype = 'int32')
.expandDims(0);
let predictions = await model.predict(tensor).data();
//console.log(predictions[0]);
if (predictions[0] > 0.6 && predictions[1] > 0.6) {
$(".main-result").append("<h1>Sorry we can't add your comment it is Abusive and Obscene!!!!!!</h1>");
}
else if (predictions[0] > 0.6) {
$(".main-result").append("<h1>Sorry we can't add your comment it is Abusive!!!</h1>");
} else if (predictions[1] > 0.6) {
$(".main-result").append("<h1>Sorry we can't add your comment it is Obscene!!!!</h1>");
}else {
$(".main-result").append("<h1>Clean data comment added successfully</h1>");
$(".main-result").append("<h1>"+wo+"<h1>" );
}
console.log(predictions);
//predictions = model.predict(tensor).data();
});
tf.tensor1d(): It is used to convert the text into 1dimensional tensor which is used for prediction.
Model.predict(): It is used for the model prediction. It gives a prediction probability array.
After getting the prediction there is an “if-else” loop it is used to classify the sentence into an abusive and obscene class.
If it is explicit content it is not added and if it is cleaner then it will be added successfully.
Output:
I am adding a youtube video that will you guide and from that, you can take reference.
Thank You,
Happy Machine Learning,
Author: Vikas Maurya.
I am happy for sharing on this blog its awesome blog I really impressed. thanks for sharing. Great efforts.
ReplyDeleteLooking for Big Data Hadoop Training Institute in Bangalore, India. Prwatech is the best one to offers computer training courses including IT software course in Bangalore, India.
Also it provides placement assistance service in Bangalore for IT. Best Data Science Certification Course in Bangalore.
Some training courses we offered are:
Big Data Training In Bangalore
big data training institute in btm
hadoop training in btm layout
Best Python Training in BTM Layout
Data science training in btm
R Programming Training Institute in Bangalore
apache spark training in bangalore
Best tableau training institutes in Bangalore
data science training institutes in bangalore