+1 (315) 557-6473 

How to write an assignment on ATT&CK PREDICTIONS USING rcATT in python

We discuss here the implementation an assignment on ATT&CK PREDICTIONS USING rcATT in python. The description and tips could act as a perfect guideline for students working on such assignments

DESCRIPTION OF THE ASSIGNMENT

As given in the python assignment, rcATT has been defined as a tool which should be used for the prediction of tactics and techniques. It is used to predict ATT&CK tactics from the official cyber threat reports. We have to put down our report in the provided text area so that we can forward it to the process of getting predicted. There is a number shown as a percentage which is shown next to the name of the tactics or techniques which depicts how much likely the tactic or the technique is present in the report.

CODE

##########################################################

# INTRODUCTION #

##########################################################

# rcATT is a tool to prediction tactics and techniques

# from the ATT&CK framework, using multilabel text

# classification and post processing.

# Interface: graphical

# Version: 1.00

# Author: Valentine Legoy

# Date: 2019_10_22

from flask import Flask,render_template,url_for,request, send_file

import joblib

import re

import classification_tools.preprocessing as prp

import classification_tools.postprocessing as pop

import classification_tools.save_results as sr

import classification_tools as clt

from operator import itemgetter

import pandas as pd

import datetime

#Starts the GUI tool on Flask

app = Flask(__name__)

@app.route('/')

def home():

 return render_template('home.html')

@app.route('/save',methods=['POST'])

def save():

 """

 Save predictions either in the training set or in a JSON file under STIX format.

 """

 if request.method == 'POST':

  formdict = request.form.to_dict()

  save_type1 = "filesave"

  save_type2 = "trainsave"

  #save to a JSON file in STIX format

  if save_type1 in formdict:

   references = []

   for key, value in formdict.items():

    if key in clt.ALL_TTPS:

     references.append(clt.STIX_IDENTIFIERS[clt.ALL_TTPS.index(key)])

   file_to_save = sr.save_results_in_file(re.sub("\r\n", " ", request.form['hidereport']), request.form['name'], request.form['date'], references)

   return send_file(file_to_save, as_attachment=True) # this line save file as .json

  #save in the custom training set

  if save_type2 in formdict:

   references = []

   for key, value in formdict.items():

    if key in clt.ALL_TTPS:

     references.append(key)

   sr.save_to_train_set(re.sub("\r\n", "\t", prp.remove_u(request.form['hidereport'].encode('utf8').decode('ISO-8859-1'))), references)

 return ('', 204)

@app.route('/',methods=['POST'])

def retrain():

 """

 Train the classifier again based on the new data added by the user.

 """

 if request.method == 'POST':

  clt.train(False)

 return ('', 204)

@app.route('/predict',methods=['POST'])

def predict():

 """

 Predict the techniques and tactics for the report entered by the user.

 """

 report_to_predict = ""

 pred_all, pred_tactics_all, pred_techniques_all = [], [], []

 if request.method == 'POST':

  report_to_predict = prp.remove_u(request.form['message'].encode('utf8').decode('ISO-8859-1'))

  # My code: Use 10 times character ~ to split multiple reports

  report_list = report_to_predict.split("~~~~~~~~~~")

  # Kindly loop through multiple files and process the results

  filenumber = 1

  for i in range(len(report_list)):

   # load postprocessing and min-max confidence score for both tactics and techniques predictions

   parameters = joblib.load("classification_tools/data/configuration.joblib")

   min_prob_tactics = parameters[2][0]

   max_prob_tactics = parameters[2][1]

   min_prob_techniques = parameters[3][0]

   max_prob_techniques = parameters[3][1]

   pred_tactics, predprob_tactics, pred_techniques, predprob_techniques = clt.predict(report_list[i], parameters)

   # change decision value into confidence score to display and prepare results to display

   pred_to_display_tactics = []

   for i in range(len(predprob_tactics[0])):

    conf = (predprob_tactics[0][i] - min_prob_tactics) / (max_prob_tactics - min_prob_tactics)

    if conf < 0:

     conf = 0.0

    elif conf > 1:

     conf = 1.0

    pred_to_display_tactics.append([clt.CODE_TACTICS[i], clt.NAME_TACTICS[i], pred_tactics[0][i], conf*100])

   pred_to_display_techniques = []

   for j in range(len(predprob_techniques[0])):

    conf = (predprob_techniques[0][j] - min_prob_techniques) / (max_prob_techniques - min_prob_techniques)

    if conf < 0:

     conf = 0.0

    elif conf > 1:

     conf = 1.0

    pred_to_display_techniques.append([clt.CODE_TECHNIQUES[j], clt.NAME_TECHNIQUES[j], pred_techniques[0][j], conf*100])

   pred_to_display_tactics = sorted(pred_to_display_tactics, key = itemgetter(3), reverse = True)

   pred_to_display_techniques = sorted(pred_to_display_techniques, key = itemgetter(3), reverse = True)

   # My code

   pred_to_display_tactics_output = [[filenumber, 'Tactics'] + pr for pr in pred_to_display_tactics]

   pred_tactics_all.extend(pred_to_display_tactics_output)

   pred_to_display_techniques_output = [[filenumber, 'Techniques'] + pr for pr in pred_to_display_techniques]

   pred_techniques_all.extend(pred_to_display_techniques_output)

   pred_to_display_tactics_output = pred_to_display_tactics_output+ pred_to_display_techniques_output

   csv_output = pd.DataFrame(pred_to_display_tactics_output, columns = ['FileNumber', 'Type', '1', '2', '3', '4'])

   # This saves a result.csv file separately directly in your running folder rcATT

   # if you input 3 reports, there will be "result_1.csv, result_2.csv, result_3.csv" in the folder rcATT

   # and a final synthetic file "result_all.csv" which combines 3 above files

   filename = './result_'+str(filenumber)+'.csv'

   csv_output.to_csv(filename, index=False)

   filenumber += 1 # move to next file

   # ###

  pred_all = pred_tactics_all + pred_techniques_all

  csv_pred_all = pd.DataFrame(pred_all, columns=['FileNumber', 'Type', '1', '2', '3', '4'])

  csv_pred_all.to_csv('./result_all.csv', index=False)

 return render_template('result.html', report = request.form['message'], predictiontact = pred_to_display_tactics, predictiontech = pred_to_display_techniques)

if __name__ == '__main__':

 app.run(debug = True)

Assignment Code Explanation

We know very clearly that for any advanced python programming we have to call in a lot of libraries and so keeping that in mind the code above also first calls in a lot of libraries and each has their own task to follow.

First we are interacting with the flask library. Flask is a library in python which gives us the ultimate facilities for developing web applications. It is basically considered as a web framework, perhaps a microframework which has no ORM also known as the Object Relational Model but provides us with lots of other features like URL routing etc.

In the beginning we are importing flask from the flask module which actually doesn't really enforce any credence or project layout, but offers us suggestions. Then we are importing the render_template. It is used purposefully for generating output from a given template file which is totally or partially based on the Jinja2 engine; this Jinja2 engine is found inside the templates folder of the application. Along with this term we are calling url_for which is defined as a popular function that allows the developers the ability to build and also to create links or URLs on a flask based application. It helps us to insert any new element in the link without visiting every single template costing a lot of time. But instead it does it easily with a very short amount of time.

Send_file is being called to transfer the whole contents of a file to the client using the least complex method available. It cannot be configured or manipulated explicitly.

Joblib is also being called which actually provides us with lightweight pipelining in Python. Pipeline is nothing but a Python scikit-learn utility for performing high functioning machine learning operations. It works by allowing linear or normal series of data transforms and allowing them to link with each other which as a consequence forms a measurable modelling process.

Then we are calling re known popularly as Regular expression which is a very important tool. It is basically a sequence of characters that are functioned to form a search pattern.

Finally we are calling the itemgetter from the operator module which performs various kinds of comparisons with objects , logic related operations and also mathematical operations along with sequential operations. The itemgetter in this standard library returns an object which is totally callable and which fetches to us an element of a list or a value from the dictionary.

And finally at the end we are calling the most important library such that pandas as an alias with “pd”. Pandas is generally used for analysing data used in the domain of data science and machine learning.

In line 28 we are calling the flask function. It is taking the name of the current module(__name__) as an argument so that the value stored in this variable will vary depending on the python source file in the place we are using it.

After that we are routing the given url in a different way through ‘/’ since the route function is a decorator which actually tells the app that which link or url should be called or should call the associated function.

Then we are creating a home function with the usual known syntax and returning the render_template with the known parameter ‘home.html’ such that we are commanding the function to be called by the url. We generally set methods explicitly according to our choice if we need our web application to accept other HTTP methods, otherwise it generally responds with a 405 method. And in this case we are declaring explicitly the method ‘POST’ for it to route to the recurring URL.

Next we are creating a function named save which is not taking any parameters inside it. Now what is happening is that it is checking whether the previously passed other HTTP method is POST or not. It is using a typical if else condition to check whether the method called is POST or not and if it is such then it is creating a variable of formdict which is sending POST requests with dictionaries to the API.

Then we are creating other two string variables to contain string content as “filesave” and “trainsave”

Now we are traversing in the passed dictionaries to the API whether the “filesave” string stored in a string variable is present in the dictionary or not and further inside it we are creating an array to store references and with the same name as the purpose. We are now using a for loop to travel with the iterator as keys of the dictionary and traversing the values of the dictionary. The .items extracts the values which are attached respectively to the keys of the dictionary. Now each key that it extracts is being checked if it is present in clt.ALL_TPPS and if it is present then we are appending at the end of the array the key of the dictionary. Then we are calling the sr.save_to_train_set function which is taking the parameters re for regular expression along with prp which helps in creating a virtual environment for the users and also final parameter which is being passed is references array.

Finally this save function is returning a blank space along with a constant fixed integer “204”.

We are again rerouting the URL with the HTTP method POST commanded with a ‘/’ as a segmentation. Following this we are defining a function named refrain taking in no arguments. The block of code inside is used to train the classifier based on the new data added by the user as mentioned in the comments

Then inside we are again checking if the method requested is “POST” with a basic if else statement then it is training the model “FALSE” as clt.train(False) and the refrain function returns again a blank space and a constant fixed integer 204.

The code is now rerouting the method “POST” with the segmentation (‘/predict’) and thus as required we are also creating a predict function accepting no arguments.

Inside the function we are setting some string variables as empty or null and along with that we created three empty arrays pred_all, pred_tactics_all, pred_techniques_all. We are again checking if the method requested is POST or not and if it is such then in the variable “report_to_predict we are removing the given commendable features as the statement says request.form['message'].encode('utf8').decode('ISO-8859-1') and then we are splitting the report list based on the separator “~~~~”.

Now we are creating an integer variable storing 1 and then we opening a loop with iterator “i” and it ranges from 0 to the length of the report list which is given by the len function and then we are calling a variable parameter which is eventually using the joblib module to load the following path or url name “classification_tools/data/configuration.joblib”

And then it is inserting in two 2 D arrays named rob_tactics and rob_techniques the parameters that are needed. Inside the three previously created empty arrays they are cumulatively calling the predict function in clt source formatwith arguments as the report list and the parameters

We are creating another empty array and running a loop till the length of the predicted probability array’s length i.e predprob_tactics an then inside a variable named conf we are storing the mathematically calculated value(predprob_techniques[0][j] - min_prob_techniques) / (max_prob_techniques - min_prob_techniques) which is the general prediction formula and here in this case the “j” is the iterator.

Then finally inside this loop block it is being checked whether the calculated conf is greater than or lesser than 0 or 1 because only if the configuration value is less than zero then it can be reinitialised to 0.0 but if greater than 1 then can be reinitialised to 1.0 and finally we are appending to the previous array network created for displaying and then this array is sorted with respect to the pred_to_display_tactics and pred_to_display_techniques both with the passed key as an itemgetter passed as three and the whole array should be sorted in ascending order so the reverse is made to “true”. Since the itemgetter is set to call 3rd element which means it will assume an iterable object in this case the list and will fetch the 3rd element out of the list.

In the next step we create another list (pred_to_display_tactics_output) in which we store list of list containing the file number and tactics as specific argumentative parameters and with that we also traverse through the list (pred_to_display_tactics) and then we attach this list to the previous one i.e pred_tactics_all using the extend function which generally attaches a list in the end of another.

Similarly we create another list (pred_to_display_techniques_output) in which we store list of list containing the file number and techniques as specific argumentative parameters and with that we also traverse through the list (pred_to_display_techniques) and then we attach this list to the previous one i.e pred_techniques_all using the extend function which generally attaches a list in the end of another.

Finally we do another concatenation operation. The whole tactics output is given as the sum of the tactics output and the techniques output as shown by this line

(pred_to_display_tactics_output = pred_to_display_tactics_output+ pred_to_display_techniques_output)

Then we put and frame these datas in a csv format file using the DataFrame module since it is the most widely used data structure and for storing data the most efficient way. As shown in the syntax

csv_output = pd.DataFrame(pred_to_display_tactics_output, columns = ['FileNumber', 'Type', '1', '2', '3', '4'])

The above syntax shows that the data frame accept the argument as the last final output list and it creates a csv or comma separated file with columns as mentioned in the brackets

Finally we are creating a string variable “filename” to store the filename. We created the filename using basic concatenation with front and back as given in quotes and in middle we concatenate it with the file number it fetches explicitly converted to string and in the end we add a “.csv” along with the whole name and in the next step we convert the whole data frame into csv structure using the .to_csv function with the given name of the data frame and index is set to false. We know that by default the index is set to the left most column but we can change that by setting index to false.

We increment the integer file number variable to +1 so that it moves to the next file.

The last step includes the formation of the final ist as a net one which will be the summation of all tactics and all techniques as shown in the below syntax.

  pred_all = pred_tactics_all + pred_techniques_all

And this pred_all we again feed it to the panda library in the DataFrame module with the argument as pred_all and column as named in the syntax below.

 csv_pred_all = pd.DataFrame(pred_all, columns=['FileNumber', 'Type', '1', '2', '3', '4'])

Then we convert the data frame structure to a csv file named “./result_all” and index set to False to remove the default left indexed column. In the end we return render_template as named result.html and the app will initiate since debug is set to True.


Comments
No comments yet be the first one to post a comment!
Post a comment