SWLUG/paper analysis

[논문 분석] Inferring Browsing Activity on Smartphones

keemnh 2024. 5. 21. 16:38

On Inferring Browsing Activity on Smartphones via USB Power Analysis Side-Channel

On Inferring Browsing Activity on Smartphones via USB Power Analy.pdf
3.05MB

※ 해당 논문을 분석하여 작성한 글입니다.

 

 

 

논문 선정 이유

이와 비슷한 관련 연구를 진행하고 있는데 좀 더 세부적으로 이 연구에서는 브라우저 활동을 어떻게 탐지하는지 궁금해서 읽어보게 되었다. 사용자의 어떤 활동들을 탐지하는지 공격 기법과 환경 세팅들이 궁금해서 읽어보게 되었다. 연구를 진행하면서 부족한 부분들을 채우고 더 이해해보고자 관련 논문으로 선정하게 되었다. 

 

 

I. Contributions and Findings

To fully characterize the side-channel associated with USB power consumption

: analyzed how webpage identification accuracy is impacted by variables pertinent to mobile devices

e.g., battery charging level, wireless connection (WiFi or LTE), and taps on the screen.

 

These include availability of browser cache, training and testing signals collected on different smartphones, the time elapsed between the collection of training and testing signals, geographical proximity between the user and the web server, duration of power traces, and availability of encrypted (TLS) connections on identification accuracy

 

a) Impact of Battery Charge Level and Taps on Screen:

Reduced webpage identification accuracy.

 - Charging at 30% level : 

Identification accuracy decreased, compared to when the battery was fully charged.

However, even with the decrease in accuracy, it was still possible to reliably infer browsing information.

- Taps on the screen :

Added significant noise to the power traces, making webpage identification challenging.

 Results show that this factor caused a significant degradation in identification performance.

 

b) Impact of Other Variables:

- Time elapsed between collection of training and testing traces :

Training traces older than 30 days leading to a significant drop in identification accuracy.

This suggests that traces used to train the classifiers should be updated frequently to improve the attack’s success rate.

 

Able to reliably identify webpages under both WiFi and LTE, even when the training power traces were collected using one type of connectivity (e.g., LTE), and the testing traces were obtained with another (e.g., WiFi).

(this is not related to my research.)

 

- Using different smartphones for training and testing :

Accuracies dropped significantly.

However using two smartphones for training, and a different smartphone for testing reduced the drop in webpage identification accuracies.

 

- When the user did not tap on the screen :

Enabling browser cache improved identification accuracy.

However, for power traces collected when the user tapped on the screen and while the smartphone was charging, enabling cache led to a decrease in webpage identification accuracies.

 

- Increasing the geographical distance between the smartphone and the host serving the webpages :

Reduced identification accuracy.

Divided webpages in foreign (located outside of the continental United States), and local (within the United States) 

- observed that webpages hosted locally had slightly higher identification accuracies than foreign-hosted webpages.

 

- Retrieving webpages via secure connections (HTTPS or, more specifically, TLS)

Not have a measurable impact on identification accuracy.

 

 

c) Experiment Results:

They used machine learning algorithms to identify which webpage the user visited out of a closed set of fifty webpages . They were able to achieve identification accuracies as high as 98.8% with 2-second traces.

Even in the worst case, they achieved an identification accuracy of 54.2% with 6-second traces.

i.e., when the cache was enabled, the user tapped on the screen, and the battery was charging from 30%,

When training and testing traces were collected using different smartphones, identification accuracy was at least 44.5%. This is significantly higher than choosing one out of fifty webpages at random (which leads to 2% baseline accuracy).

 

 

 

II . EXPERIMENT SETUP

 

- Attack Timing and attack model

Collected power traces while webpages were loading on two types of smartphones: Samsung Galaxy S4, and Samsung Galaxy S6. 

 

- Used the homepages of the 50 most popular (non-adult) websites, based on Alexa ranking.

 

 

- To collect power traces during webpage loading :

instrumented the USB charging circuit as shown in Figure 1.

Circuit connects a DC power supply, a smartphone, and a data acquisition card (DAQ),

Measures voltage variations (and therefore the corresponding power consumed) across a 0.1 Ω shunt resistor.

To satisfy the USB charging specifications, they connected the data pins of the USB cable using a 200 Ω  resistor.

Fig. 1. Overview of the setup used to collect power traces

 

Most smartphones use lithium-ion (Li-ion) batteries due to their high energy density. The charging profile of Li-ion batteries encompasses two stages.

 

1) In the first stage, the smartphone charging circuit applies constant current to the battery.

This stage ends when the battery reaches a specific charging voltage (usually between 3.7 V and 4.2 V).

 

2) In the second stage, the battery is charged at a constant voltage, and the current gradually decreases until it reaches a termination value. Because the battery charging process could take several hours, the current used to charge the battery does not vary significantly while the smartphone is loading a webpage.

 

They used an Agilent E3630A DC power supply  as the power source. Measured the voltage drop across the shunt resistor using a National Instrument USB-6211 (DAQ)  at a sampling rate of 200 kHz. They set the power supply to output a fixed voltage of 5.5 V. This voltage is higher than the nominal USB voltage of 5 V to compensate for the voltage drop introduced by the shunt resistor.

The resulting voltage was between 5.32 V and 5.48 V, which is within the tolerance of many modern smartphones [22]. The DAQ’s data output was connected to a laptop, which stored data for offline analysis using LabVIEW.

Figure 2 shows the power consumption traces collected while loading the homepages of google.com and youtube.com.

Fig. 2. Power traces collected during the first 6 seconds of automated webpage loading activity. The left and right panels show power traces collected while loading google.com, and youtube.com, respectively. The x axis shows time (in seconds) from the beginning of the webpage loading activity, and the y axis shows the power consumed by the smartphone.

 

 

Collected power traces in two modes: user-actuated, and automated.

- User-actuated traces, the user initiates webpage loading by typing a URL in Mobile Chrome’s address bar.

- Automated traces, developed an Android application that launches the Chrome browser, and uses it to load the intended webpage. It allows 10 seconds for webpage loading (only the first 6 seconds of data were recorded), and then loads the next webpage.

 

Before each measurement

-> Closed all other applications on the smartphone

-> Set the screen brightness to a constant level.

 

Collected under two conditions:

a. battery level (30% vs. 100%)

b. browser cache (enabled vs. disabled).

They chose these conditions because they impact smartphone energy consumption.

 

When the battery is fully charged, almost all power from the charger is used to load webpages.

In contrast, when the smartphone is charging, a sizable (almost constant) amount of power is used to charge the battery, hence affecting the traces.

Cache availability was chosen because cache misses increase network activity, and therefore radio activity. Retrieving data wirelessly requires more energy than loading it from local flash memory

 

Collected 40 automated traces per webpage for each of the following combinations:

30% battery, cache; 30% battery, no cache; 100% battery, cache; 100% battery, no cache.

 

 Used Four Samsung Galaxy S4 devices 

To analyze the impact of different smartphone models on the attack, used Samsung Galaxy S6.

 

 

 

III . WEBPAGE IDENTIFICATION

 

Webpage identification process consists of training and testing phases.

 

In the training phase:

(1) Extracted frequency-domain features from the power traces; 

(2) Trained a classifier (Random Forest [26]) on the extracted feature vectors.

Next, we provide details on feature extraction, classification, and trace segmentation.

 

Classifier Training and Testing:

They used Random Forest to classify power traces because in our experiments it outperformed other commonly used classifiers, such as SVM, and Dynamic Time Warping (DTW).

We used the WEKA  implementation of Random Forest.

 

They experimented with four training-testing scenarios.

a. The first involved 40 power traces per webpage, collected using automated webpage loading; 

This scenario is used when training and testing are performed with data from the same smartphone.

b. Trained the classifier using all 40 automatically-collected traces, and performed testing with 10 traces collected via user-actuated page loading. This scenario was used with data is collected with user taps.

c. They trained  classifier using 40 traces per webpage, collected using automated webpage loading, on one smartphone device; then used 40 traces collected from a different smartphone device for testing.

d. They trained the classifier using 80 traces from two smartphones (40 traces from each device), and tested on 40 traces from a different smartphone.

 

Feature Extraction:

Transformed each power trace to its corresponding frequency-domain representation using Fast Fourier Transform (FFT)

To reduce the impact of noise on individual frequencies, divided the frequency range into equal-size bins.

- settled on using 125 bins

Figure 3 shows the result of feature extraction on the data in Figure 2. Each data point in Figure 3 represents a feature.

Fig. 3. Aggregate amplitudes corresponding to the first 60 bins computed from the power traces of google.com and youtube.com. (Corresponding time-domain traces are illustrated in Figure 2.) The x axis represents the index of each bin.

 

 

 Trace Segmentation and Voting:

Variable network conditions, web-server load, and smartphone background applications introduce intermittent noise in power traces. To mitigate the effects of noise, divided each trace into overlapping 0.5-second segments

Feature extraction was performed on each segment, and the classifier was trained using segments from all traces.

 

 

 Evaluation of Identification Performance:

To evaluate classifier performance, calculated Rank 1 and Rank 5 identification accuracies.

-  Rank 1, a trace is classified correctly if the most popular label assigned to the trace’s segments is the correct label for the trace

-  Rank 5,  consider a trace as correctly classified if the correct label appears within the 5 most popular labels

- For each rank,  also present the Normalized Rank-n Accuracy, which is defined as follows

( Let pn be the probability that the classifier correctly labels a trace for Rank-n. The probability of correctly guessing the website loaded by the smartphone is computed as pn/n, and represents the probability that the adversary guesses the correct website label given the Rank-n output of the classifier. )

 

 

A. Identification Accuracy on Automated Dataset

- Trace Duration: Increasing the duration of the traces led to an improvement of identification accuracy. ( with 2-second traces)

- Caching:  Results show that enabling cache improved identification accuracy (see Table I).

improved identification accuracy for foreignhosted websites more than for websites located within the United States. This is because the farther the host serving the content, the more network-related noise is added to the traces

- Battery Level: Users connect their smartphones to charging ports at various battery levels. In particular, they were consistently able to classify traces with higher accuracy when the battery was fully charged

TABLE I IDENTIFICATION ACCURACY (IN %) USING FREQUENCY-DOMAIN FEATURES AND CLASSIFIER VOTING FOR AUTOMATED DATASET COLLECTED USING D1. FOR COMPARISON, RESULTS FROM A SAMSUNG GALAXY S6 (D5) ARE ALSO REPORTED. ALL EXPERIMENTS WERE PERFORMED USING 125 FEATURES

 

If the phone is charging, a substantial amount of the available current is directed to the battery, and therefore the fluctuations in power consumption due to webpage loading is limited

Fig. 4. Power traces obtained while loading the same webpage (yahoo.com) with 30% and 100% battery level. The figure shows that the trace corresponding to 100% battery level exhibits a relatively higher dynamic range, because it is not capped by the 1.8 A limit which is often reached by the trace corresponding to 30% battery level.

 

 

 

B. Identification Accuracy on User-Actuated Dataset

 

Once included user activity in the form of taps, identification accuracy dropped significantly due to tap induced noise. This is because tap characteristics (e.g., tap location on the screen, timing, and duration) are different in each trace, which leads to noisy traces.

To validate this observation, computed the average (intra-class) Dynamic Time Warp (DTW) distance between pairs of user-actuated traces, and between pairs of automated traces, under different caching and charging conditions.

 

While two seconds were sufficient to classify webpages with high confidence using automated traces, this was not the case with traces from the user-actuated dataset. Regardless of caching and charging, they achieved good Rank 1 accuracy with six-second traces.

 

Fig. 5. Average inter-class DTW distance on automated traces for each webpage. Measurements are performed with battery fully charged, and with cache enabled. Data is sorted in ascending order of average DTW distance

 

더보기

DTW(Dynamic Time Warping)은 동적 시간 워핑이라고 불린다.

두개의 시계열 데이터가 서로 얼마나 유사한지 비교할 때 사용한다.

DTW를 사용하는 이유?

  • 두개의 시계열 데이터 길이가 달라도 유사도 비교 가능
  • 비슷한 패턴이지만 시간차가 있는 경우(shift 발생) 유사도 비교 가능

[참고]

 

DTW 기본 설명 및 실습 코드

DTW(Dynamic Time Warping)은 동적 시간 워핑이라고 불린다.

blog.kubwa.co.kr

 

 

Overall

Their experiments show that although the presence of taps substantially reduces identification accuracy compared to automated collection of power traces, it is still possible to accurately classify six-second user-actuated traces.

 

 

 

 

V . IMPACT OF OTHER VARIABLES

 

Examined the identification accuracies according to the following variables:

(1) different smartphones used for training and testing (training traces were collected from one or more smartphones that are not used for testing);

(2) LTE and WiFi training and testing;

(3) aging of training traces;

(4) domestic vs. foreign websites

(5) websites accessible via unencrypted connections (denoted as “HTTP”) vs. accessible through TLS-encrypted links (denoted as “HTTPS”).

 

A. Training and Testing Traces From Different Devices

- using different smartphones for training and testing : led to a significant drop in identification accuracy.

-  On the other hand, by training on two devices, and testing on a third, we were able to achieve identification accuracies above 80% with 6-second traces. ... 

This is likely because the classifier generalizes better when trained on multiple devices, which account for more variety within the traces.

 

B. Training and Testing Using WiFi and LTE

They collected power traces while accessing websites over both WiFi and LTE and experimented with three trainingtesting configurations: (1) LTE training and LTE testing, (2) LTE training and WiFi testing, and (3) WiFi training and LTE testing.

 Results (see Table III) show that accuracy obtained when training and testing on LTE is comparable to that of training and testing on WiFi (in Table I).

 

 

C. Aging of Training Traces

Many of the webpages considered in this work contain content that changes over time.

To determine the impact of aging on training data, collected testing traces 32 and 70 days after training, with cache enabled and fully charged battery.

This suggests that, in order to achieve good identification accuracy, training traces should be updated frequently.

 

 

D. Foreign vs. Domestic Websites

Tested this variable because the distance between the client and the host serving a webpage is known to affect packets’ delay and jitter.

The farther the host serving the content, the more variable will be its measured bandwidth and delay. In turn, this variability affects page loading, and hence the corresponding power traces.

-> Experiments show that the location of the host serving a webpage has a very small impact on identification accuracy.

 

E. HTTPS vs. HTTP Websites

Tested this variable because the use of encryption between the smartphone and the server can introduce noise in power traces.

TLS requires additional communication rounds to exchange TLS session keys before a connection can be established. This can potentially increase the variability of power traces.

Results show that there is no significant difference in identification accuracy between the two types of websites. This indicates that the attack is as effective for identifying securely transmitted webpages as with webpages transmitted without encryption.

 

 

CONCLUSION AND FUTURE WORK

 

In this paper, they demonstrated that it is possible to accurately infer browsing activity on a smartphone using USB power consumption measurements.

This work is the first to study this side-channel attack on smartphones, and to analyze a multitude of factors that affect the traces that are collected during the attack, such as:

battery charging level,

user interaction with the touchscreen,

trace length,

time between collection of training and testing traces,

WiFi and LTE connectivity,

training and testing device mismatch,

website characteristics such as type of connection (HTTP or HTTPS) 

location of the host serving the webpage relative to the smartphone.

 

Overall, results show that the attack is highly effective, because webpage loading generates power signatures that are:

(1) distinctive:

: different webpages generate different power traces due to factors such as the amount of data (text, images, and videos) being retrieved, the number of TCP connections required to retrieve all webpage components, and the computational cost of the scripts running within the webpage;

 

(2) consistent: 

each time a particular page is loaded, it generates a power trace that is similar to its previous power traces.

 

 

더보기

연구를 진행하면서 생각보다 환경변수를 설정하고 결과값을 계측하는 게 어려운 일임을 깨닫고 있는데 이 논문을 통해서 다양한 변수들을 고려해볼 수 있었고 어떤 방법을 통해 해결했는지 알 수 있어서 흥미로웠다.