It looks like CN116233013 is a patent that corresponds to this thesis.
The present invention belongs to the field of network security technology, and in particular to a method for identifying Tor Over VPN anonymous network traffic and its service type, which extracts the spatiotemporal dimensional features of Tor Over VPN traffic and combines it with CNN, Transformer and other models for identification.
Step 1, divert the input traffic sample into session traffic based on the five-tuple information, and perform marking, grouping, and numbering preprocessing operations on the traffic according to the traffic type; the five-tuple is the source address, destination address, source port, destination port, and protocol five-tuple;
Step 2, extract the payload length, OpenVPN header protocol field, and heartbeat data packet features of the data packets with sequence numbers 0 to N1 flow by flow;
Step 3, extracting the payload length, payload information, and polling data features of the data packets with sequence numbers N1 to N2 flow by flow, and converting these features into a two-dimensional grayscale image;
Step 4, extract the length, load information, inter-packet delay, MSS packet ratio, and number of interactions of the data packets with sequence numbers N2 to N3 flow by flow to form a spatiotemporal feature vector;
Step 5, match the traffic using the features extracted in step 2 to identify the OpenVPN tunnel traffic;
Step 6, constructing a Tor Over VPN anonymous network traffic identification model based on the two-dimensional grayscale image and the CNN model;
Step 7, constructing a service type identification model based on the spatiotemporal feature vector and the Transformer model;
Step 8, for the traffic sample to be detected, execute steps 1 to 4, then identify the OpenVPN tunnel traffic according to step 5, and use the models constructed in steps 6 and 7 to identify the Tor Over VPN anonymous network traffic and service type respectively.
Compared with the prior art, the present invention has the following significant advantages:
-
By mining the length sequence, protocol fingerprint and heartbeat mechanism characteristics of the VPN tunnel establishment phase, the OpenVPN tunnel traffic is detected based on the rule matching method, which improves the accuracy of encrypted tunnel traffic detection.
-
By converting attributes such as payload length, payload information, and polling data into grayscale images and combining them with the CNN model to identify Tor Over VPN traffic, the detection capability of Tor Over VPN traffic is improved.
-
To address the problem of identifying the service type carried by Tor Over VPN traffic, we conduct an in-depth multi-dimensional analysis of the differences in the spatiotemporal characteristics of various types of traffic, select effective features to construct a unified feature vector, and use the Transformer spatiotemporal sequence model to perform refined identification, thereby improving the accuracy of service type identification at the Tor behavior level.
So it’s like 90% of this kind of papers. They take some random traffic features, throw them into some random machine learning classifiers, and get some output.
The part about transforming features into a grayscale bitmap and then running a CNN over the 2D bitmap is similar to what CN111753290 does for software classification.