SlideShare a Scribd company logo
1 of 38
©2019 FireEye
©2019 FireEye©2019 FireEye2
About Us
 Michael Sikorski
 Philip Tully
 Jay Gibble
 Matthew Haigh
©2019 FireEye
"HTTP 1.1 200 OK "
©2019 FireEye©2019 FireEye
One String can Make a Difference
4
NanoHTTPD webserver produces extra whitespace
Cobalt Strike Server Detection
Continued for 7 years
Detection signature
Track threat actors, identify C2 addresses
https://blog.fox-it.com/2019/02/26/identifying-cobalt-strike-team-servers-in-the-wild/
©2019 FireEye©2019 FireEye
Running Strings on larger
binaries produces tens of
thousands of strings.
5
©2019 FireEye©2019 FireEye
Strings produces a ton of noise
mixed in with important
information.
6
©2019 FireEye©2019 FireEye
What is a String
7
 N characters + NULL
No file format, context
0x31 0x33 0x33 0x37 0x00
– ‘1337’, right?
Not necessarily:
– memory address
– CPU instructions
– data used by the program
©2019 FireEye©2019 FireEye
Wide Strings
8
 Also be referred to as Wide strings
 The Windows OS uses Wide strings internally
– Microsoft’s encoding standard is UTF-16 LE
 Each wide character is two bytes
 C-style wide character strings terminated with double NULL (0x00, 0x00)
©2019 FireEye©2019 FireEye
Compilation
9
SourceCode
int main() {
printf("Derby");
return 0;
}
ObjectFile
"Derby"
.EXEBinary
.data
0x56000:
"Derby"
Strings persist on disk throughout the compilation process.
©2019 FireEye©2019 FireEye
The Strings Program
10
!This program cannot be run in DOS mode.
??3@YAXPAX@Z
??2@YAPAXI@Z
__CxxFrameHandler
_except_handler3
WSAStartup() error: %d
User-Agent: Mozilla/4.0 (compatible; MSIE 6.00; Windows
NT 5.1)
GetLastInputInfo
SeShutdownPrivilege
%sIEXPLORE.EXE
SOFTWAREMicrosoftWindowsCurrentVersionApp
PathsIEXPLORE.EXE
[Machine IdleTime:] %d days + %.2d:%.2d:%.2d
[Machine UpTime:] %-.2d Days %-.2d Hours %-.2d Minutes
%-.2d Seconds
ServiceDll
SYSTEMCurrentControlSetServices%sParameters
if exist "%s" goto selfkill
del "%s"
attrib -a -r -s -h "%s"
Inject '%s' to PID '%d' Successfully!
cmd.exe /c
Hi,Master [%d/%d/%d %d:%d:%d]
©2019 FireEye©2019 FireEye
Malware Triage
11
Customer
Suspected
compromise
Incident Response
Forensic analysis
Identify malware
sample
Reverse Engineer
Binary triage
Malware analysis
reverse engineers, SOC analysts, red teamers, incident responders, malware researchers
©2019 FireEye©2019 FireEye
Knowing which strings are
relevant often requires highly
experienced analysts.
12
©2019 FireEye©2019 FireEye
Strings Tells a Story
13
Relevance
domain names
IP addresses
URLs
filenames
registry paths
registry keys
HTTP user-agent strings
service configuration info
keylogger indicators
(e.g. ”[DELETE]”, “[BS]”
third party libraries
PDB strings
function names
debugging messages
command line help/usage options
OSINT
runtime artifacts
compiler artifacts
Windows APIs
library code
localizations
locations
languages
error messages
random byte sequences
format specifiers
©2019 FireEye©2019 FireEye
Relevance is subjective and its
definition can vary significantly
across analysts.
14
©2019 FireEye©2019 FireEye
Hypothesis and Goals
15
 Develop a tool that can:
– efficiently identify and prioritize strings
– based on relevance for malware analysis
StringSifter should:
– be easy to use
– generalize across:
– personas, use cases, downstream apps
– save time and money
 How does it work?
©2019 FireEye©2019 FireEye
Rankings are Everywhere
16
©2019 FireEye©2019 FireEye
 Search engines
– web
– e-commerce
 News Feeds
– social networks
 Recommender systems
– ads
– movies
– music
Our Favorite Products Serve Up Rankings
17
©2019 FireEye©2019 FireEye
( )
 Create optimal ordering of a list of items
 Precise individual item scores less important
than their relative ordering
 In classification, regression, clustering we
predict a class or single score
 LTR rarely applied in security applications
Learning to Rank
18
f
©2019 FireEye©2019 FireEye
 Rank items within unseen lists in a similar way to rankings within training lists
 Each item associated with a set of features and an ordinal integer label
 Ordinal label is the teaching signal that encodes relevance level
LTR as Supervised Learning
19
©2019 FireEye©2019 FireEye
 Decision Trees
– greedily choose splits by Gini impurity
 Gradient Boosted Decision Trees (GBDTs)
– combine outputs from multiple Decision Trees
– reduce loss using gradient descent
– weighted sum of trees’ predictions as ensemble
 LightGBM
– GBDTs with an LTR objective function
Gradient Boosted Decision Trees
20
©2019 FireEye©2019 FireEye
EMBER Training Dataset
21
 Endgame Malware BEnchmark for Research
– v1 (1.1 million PE files scanned on or before 2017)
 https://arxiv.org/abs/1804.04637
 https://github.com/endgameinc/ember
– 400k train + test malware binaries from v1
 malware defined as > 40 VT vendors say malicious
 Ran Strings on 400k malware binaries
– produced 3+ billion individual strings (24 GB)
– performed sampling
– labeled according to heuristics and FLARE hand-labeling
©2019 FireEye©2019 FireEye
 Natural Language Processing
– Markov model
– Entropy rate, english KL divergence
– Scrabble scores
 Host, Network IoCs
 Malware Regexes
– encodings (base64)
– format specifiers
– user agents
Representing Strings as Features
22
t
%
F
0.02
0.07
0.01
0.2
0.2
0.01
0.03
0.14
0.05
threshold = 0.01
http://evil.com
SOFTWAREincludeevil.pdb
t%Ft
Vr}Y
0.018
0.014
0.007
0.001
©2019 FireEye©2019 FireEye
quixotry  ˈkwik-sə-trē  (n.)
behavior inspired by idealistic
beliefs without regard to reality.
23
©2019 FireEye©2019 FireEye
Example
24
©2019 FireEye©2019 FireEye
 Normalized Discounted Cumulative Gain
– Normalized: divide DCG by ideal DCG on a
ground truth holdout dataset
– Discounted: divides each string’s predicted
relevance by a monotonically increasing
function (log of its ranked position)
– Cumulative: the cumulative gain or summed
total of every string’s relevance
– Gain: the magnitude of each string’s relevance
Evaluation
25
©2019 FireEye©2019 FireEye
Results
26
StringSifter performs well on a holdout set of 7+ years of FLARE malware reports.
©2019 FireEye©2019 FireEye
Putting it All Together
27
©2019 FireEye©2019 FireEye
Open Sourcing StringSifter
28
 The tool is now live:
– https://github.com/fireeye/stringsifter
– pip install stringsifter
– Command line and Docker tools
 flarestrings <my_sample> | rank_strings
 Versatility
– FLOSS outputs
– live memory dumps
©2019 FireEye
Tools demo
©2019 FireEye©2019 FireEye
 Git + local pip install
– Easy access to source code
 Pip install from PyPi
– If you just want to use the tool
 Docker container
– Minimum impact to host
Install and Use
30
git clone https://github.com/fireeye/stringsifter.git
cd stringsifter
pip install -e .
flarestrings <my_sample> | rank_strings
pip install stringsifter
flarestrings <my_sample> | rank_strings
git clone https://github.com/fireeye/stringsifter.git
cd stringsifter
docker build -t stringsifter -f docker/Dockerfile .
docker run -v <malware_dir>:/samples -it stringsifter
flarestrings /samples/<my_sample> | rank_strings
©2019 FireEye©2019 FireEye
 There are many versions of "strings"
– Gnu binutils, BSD, various windows implementations
– Inconsistent features
 flarestrings
– Pure python implementation of "strings"
– Consistent across platforms
– Prints both ASCII and wide strings
flarestrings *
31
* FLARE => FireEye Labs Advanced Reverse Engineering
©2019 FireEye©2019 FireEye
flarestrings Demo
32
©2019 FireEye©2019 FireEye
StringSifter rank_strings Demo
33
©2019 FireEye©2019 FireEye
rank_strings Options
34
©2019 FireEye©2019 FireEye
rank_strings with --scores
35
©2019 FireEye©2019 FireEye
rank_strings with --min-score
36
©2019 FireEye©2019 FireEye
 Rapid screening for potential capabilities
 Detect and handle packed / obfuscated binaries
– Tipoff for automated unpacker tooling
 Leverage feature vectors to focus triage
 Improve NLP
 Improve ranking performance on mach-o, ELF
Other Use Cases and Future Work
37
©2019 FireEye©2019 FireEye
 Plug into your malware analysis stack
 Seeking critical feedback
– improve accuracy and utility
– pertinent edge cases, non-PE files
– contribute via GitHub Issues
 Beginners and experts alike
 Thank you for your attention!
Community Support
38
https://github.com/fireeye/stringsifter
pip install stringsifter

More Related Content

What's hot

Cisco Live Brksec 3032 - NGFW Clustering
Cisco Live Brksec 3032 - NGFW ClusteringCisco Live Brksec 3032 - NGFW Clustering
Cisco Live Brksec 3032 - NGFW Clusteringib_cims
 
Red Hat Certified System Administrator (RHCSA)
Red Hat Certified System Administrator (RHCSA)Red Hat Certified System Administrator (RHCSA)
Red Hat Certified System Administrator (RHCSA)Suman Chakraborty
 
eBPF - Rethinking the Linux Kernel
eBPF - Rethinking the Linux KerneleBPF - Rethinking the Linux Kernel
eBPF - Rethinking the Linux KernelThomas Graf
 
Qualcomm Hexagon SDK: Optimize Your Multimedia Solutions
Qualcomm Hexagon SDK: Optimize Your Multimedia SolutionsQualcomm Hexagon SDK: Optimize Your Multimedia Solutions
Qualcomm Hexagon SDK: Optimize Your Multimedia SolutionsQualcomm Developer Network
 
Project ACRN hypervisor introduction
Project ACRN hypervisor introduction Project ACRN hypervisor introduction
Project ACRN hypervisor introduction Project ACRN
 
Scaling Instagram
Scaling InstagramScaling Instagram
Scaling Instagramiammutex
 
2 what is the best firewall (sizing)
2 what is the best firewall (sizing)2 what is the best firewall (sizing)
2 what is the best firewall (sizing)Mostafa El Lathy
 
Supermicro Servers with Micron DDR5 & SSDs: Accelerating Real World Workloads
Supermicro Servers with Micron DDR5 & SSDs: Accelerating Real World WorkloadsSupermicro Servers with Micron DDR5 & SSDs: Accelerating Real World Workloads
Supermicro Servers with Micron DDR5 & SSDs: Accelerating Real World WorkloadsRebekah Rodriguez
 
Connecting the Dots: Kong for GraphQL Endpoints
Connecting the Dots: Kong for GraphQL EndpointsConnecting the Dots: Kong for GraphQL Endpoints
Connecting the Dots: Kong for GraphQL EndpointsJulien Bataillé
 
CCNA 200-301 VOLUME 2.pdf
CCNA 200-301 VOLUME 2.pdfCCNA 200-301 VOLUME 2.pdf
CCNA 200-301 VOLUME 2.pdfbekhti
 
Breaking Down the Entry Barriers on Linux Kernel Networking Stack
Breaking Down the Entry Barriers on Linux Kernel Networking StackBreaking Down the Entry Barriers on Linux Kernel Networking Stack
Breaking Down the Entry Barriers on Linux Kernel Networking StackJuhee Kang
 
Php Reusing Code And Writing Functions
Php Reusing Code And Writing FunctionsPhp Reusing Code And Writing Functions
Php Reusing Code And Writing Functionsmussawir20
 
Operating Systems - A Primer
Operating Systems - A PrimerOperating Systems - A Primer
Operating Systems - A PrimerSaumil Shah
 
Exploring the Portable Executable format
Exploring the Portable Executable formatExploring the Portable Executable format
Exploring the Portable Executable formatAnge Albertini
 
QEMU and Raspberry Pi. Instant Embedded Development
QEMU and Raspberry Pi. Instant Embedded DevelopmentQEMU and Raspberry Pi. Instant Embedded Development
QEMU and Raspberry Pi. Instant Embedded DevelopmentGlobalLogic Ukraine
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsBrendan Gregg
 
ハッカー入門 公開鍵で学ぶ、ものごとの裏側を考える技術 (Qiita Conference 2022登壇資料).pdf
ハッカー入門 公開鍵で学ぶ、ものごとの裏側を考える技術 (Qiita Conference 2022登壇資料).pdfハッカー入門 公開鍵で学ぶ、ものごとの裏側を考える技術 (Qiita Conference 2022登壇資料).pdf
ハッカー入門 公開鍵で学ぶ、ものごとの裏側を考える技術 (Qiita Conference 2022登壇資料).pdfYuichiro Smith
 

What's hot (20)

Cisco Live Brksec 3032 - NGFW Clustering
Cisco Live Brksec 3032 - NGFW ClusteringCisco Live Brksec 3032 - NGFW Clustering
Cisco Live Brksec 3032 - NGFW Clustering
 
Red Hat Certified System Administrator (RHCSA)
Red Hat Certified System Administrator (RHCSA)Red Hat Certified System Administrator (RHCSA)
Red Hat Certified System Administrator (RHCSA)
 
eBPF - Rethinking the Linux Kernel
eBPF - Rethinking the Linux KerneleBPF - Rethinking the Linux Kernel
eBPF - Rethinking the Linux Kernel
 
Qualcomm Hexagon SDK: Optimize Your Multimedia Solutions
Qualcomm Hexagon SDK: Optimize Your Multimedia SolutionsQualcomm Hexagon SDK: Optimize Your Multimedia Solutions
Qualcomm Hexagon SDK: Optimize Your Multimedia Solutions
 
Project ACRN hypervisor introduction
Project ACRN hypervisor introduction Project ACRN hypervisor introduction
Project ACRN hypervisor introduction
 
Scaling Instagram
Scaling InstagramScaling Instagram
Scaling Instagram
 
2 what is the best firewall (sizing)
2 what is the best firewall (sizing)2 what is the best firewall (sizing)
2 what is the best firewall (sizing)
 
Supermicro Servers with Micron DDR5 & SSDs: Accelerating Real World Workloads
Supermicro Servers with Micron DDR5 & SSDs: Accelerating Real World WorkloadsSupermicro Servers with Micron DDR5 & SSDs: Accelerating Real World Workloads
Supermicro Servers with Micron DDR5 & SSDs: Accelerating Real World Workloads
 
Connecting the Dots: Kong for GraphQL Endpoints
Connecting the Dots: Kong for GraphQL EndpointsConnecting the Dots: Kong for GraphQL Endpoints
Connecting the Dots: Kong for GraphQL Endpoints
 
CCNA 200-301 VOLUME 2.pdf
CCNA 200-301 VOLUME 2.pdfCCNA 200-301 VOLUME 2.pdf
CCNA 200-301 VOLUME 2.pdf
 
Breaking Down the Entry Barriers on Linux Kernel Networking Stack
Breaking Down the Entry Barriers on Linux Kernel Networking StackBreaking Down the Entry Barriers on Linux Kernel Networking Stack
Breaking Down the Entry Barriers on Linux Kernel Networking Stack
 
Aruba MeshOS 4.7 User Guide
Aruba MeshOS 4.7 User GuideAruba MeshOS 4.7 User Guide
Aruba MeshOS 4.7 User Guide
 
Php Reusing Code And Writing Functions
Php Reusing Code And Writing FunctionsPhp Reusing Code And Writing Functions
Php Reusing Code And Writing Functions
 
Operating Systems - A Primer
Operating Systems - A PrimerOperating Systems - A Primer
Operating Systems - A Primer
 
Unikernel Linux
Unikernel LinuxUnikernel Linux
Unikernel Linux
 
Exploring the Portable Executable format
Exploring the Portable Executable formatExploring the Portable Executable format
Exploring the Portable Executable format
 
QEMU and Raspberry Pi. Instant Embedded Development
QEMU and Raspberry Pi. Instant Embedded DevelopmentQEMU and Raspberry Pi. Instant Embedded Development
QEMU and Raspberry Pi. Instant Embedded Development
 
eBPF Basics
eBPF BasicseBPF Basics
eBPF Basics
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old Secrets
 
ハッカー入門 公開鍵で学ぶ、ものごとの裏側を考える技術 (Qiita Conference 2022登壇資料).pdf
ハッカー入門 公開鍵で学ぶ、ものごとの裏側を考える技術 (Qiita Conference 2022登壇資料).pdfハッカー入門 公開鍵で学ぶ、ものごとの裏側を考える技術 (Qiita Conference 2022登壇資料).pdf
ハッカー入門 公開鍵で学ぶ、ものごとの裏側を考える技術 (Qiita Conference 2022登壇資料).pdf
 

Similar to StringSifter: Learning to Rank Strings Output for Speedier Malware Analysis

IBM Watson & PHP, A Practical Demonstration
IBM Watson & PHP, A Practical DemonstrationIBM Watson & PHP, A Practical Demonstration
IBM Watson & PHP, A Practical DemonstrationClark Everetts
 
apidays LIVE Paris - Bring the API culture to DevOps teams by Christophe Bour...
apidays LIVE Paris - Bring the API culture to DevOps teams by Christophe Bour...apidays LIVE Paris - Bring the API culture to DevOps teams by Christophe Bour...
apidays LIVE Paris - Bring the API culture to DevOps teams by Christophe Bour...apidays
 
Open Source AI - News and examples
Open Source AI - News and examplesOpen Source AI - News and examples
Open Source AI - News and examplesLuciano Resende
 
Learning to Rank Relevant Malware Strings Using Weak Supervision
Learning to Rank Relevant Malware Strings Using Weak SupervisionLearning to Rank Relevant Malware Strings Using Weak Supervision
Learning to Rank Relevant Malware Strings Using Weak SupervisionPhil Tully
 
Inteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for CodeInteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for CodeLuciano Resende
 
3-Way Scripts as a Practical Platform for Secure Distributed Code in Clouds
3-Way Scripts as a Practical Platform for Secure Distributed Code in Clouds3-Way Scripts as a Practical Platform for Secure Distributed Code in Clouds
3-Way Scripts as a Practical Platform for Secure Distributed Code in CloudsTokyo University of Science
 
The Role of Standards in IoT Security
The Role of Standards in IoT SecurityThe Role of Standards in IoT Security
The Role of Standards in IoT SecurityHannes Tschofenig
 
Breaking Extreme Networks WingOS: How to own millions of devices running on A...
Breaking Extreme Networks WingOS: How to own millions of devices running on A...Breaking Extreme Networks WingOS: How to own millions of devices running on A...
Breaking Extreme Networks WingOS: How to own millions of devices running on A...Priyanka Aash
 
" Breaking Extreme Networks WingOS: How to own millions of devices running on...
" Breaking Extreme Networks WingOS: How to own millions of devices running on..." Breaking Extreme Networks WingOS: How to own millions of devices running on...
" Breaking Extreme Networks WingOS: How to own millions of devices running on...PROIDEA
 
Firepower ngfw internet
Firepower ngfw internetFirepower ngfw internet
Firepower ngfw internetRony Melo
 
Csa UK agm 2019 - Web API attacks - Trends seen in the field Kriti Mohul
Csa UK agm 2019 - Web API attacks - Trends seen in the field Kriti MohulCsa UK agm 2019 - Web API attacks - Trends seen in the field Kriti Mohul
Csa UK agm 2019 - Web API attacks - Trends seen in the field Kriti MohulCloud Security Alliance, UK chapter
 
Serverless survival kit
Serverless survival kitServerless survival kit
Serverless survival kitSteve Houël
 
Fuzzing malware for fun & profit. Applying Coverage-Guided Fuzzing to Find Bu...
Fuzzing malware for fun & profit. Applying Coverage-Guided Fuzzing to Find Bu...Fuzzing malware for fun & profit. Applying Coverage-Guided Fuzzing to Find Bu...
Fuzzing malware for fun & profit. Applying Coverage-Guided Fuzzing to Find Bu...Maksim Shudrak
 
CIS 2015 How to secure the Internet of Things? Hannes Tschofenig
CIS 2015 How to secure the Internet of Things? Hannes TschofenigCIS 2015 How to secure the Internet of Things? Hannes Tschofenig
CIS 2015 How to secure the Internet of Things? Hannes TschofenigCloudIDSummit
 
Using LLVM to accelerate processing of data in Apache Arrow
Using LLVM to accelerate processing of data in Apache ArrowUsing LLVM to accelerate processing of data in Apache Arrow
Using LLVM to accelerate processing of data in Apache ArrowDataWorks Summit
 
Introduction To NIDS
Introduction To NIDSIntroduction To NIDS
Introduction To NIDSMichael Boman
 
apl5iy2ftxiwofbhsmxj-signature-584e2459f99b5370bda435f09b42cc84cc8c063b8cd454...
apl5iy2ftxiwofbhsmxj-signature-584e2459f99b5370bda435f09b42cc84cc8c063b8cd454...apl5iy2ftxiwofbhsmxj-signature-584e2459f99b5370bda435f09b42cc84cc8c063b8cd454...
apl5iy2ftxiwofbhsmxj-signature-584e2459f99b5370bda435f09b42cc84cc8c063b8cd454...Chrysostomos Christofi
 
technical-information-gathering-slides.pdf
technical-information-gathering-slides.pdftechnical-information-gathering-slides.pdf
technical-information-gathering-slides.pdfMarceloCunha571649
 

Similar to StringSifter: Learning to Rank Strings Output for Speedier Malware Analysis (20)

IBM Watson & PHP, A Practical Demonstration
IBM Watson & PHP, A Practical DemonstrationIBM Watson & PHP, A Practical Demonstration
IBM Watson & PHP, A Practical Demonstration
 
apidays LIVE Paris - Bring the API culture to DevOps teams by Christophe Bour...
apidays LIVE Paris - Bring the API culture to DevOps teams by Christophe Bour...apidays LIVE Paris - Bring the API culture to DevOps teams by Christophe Bour...
apidays LIVE Paris - Bring the API culture to DevOps teams by Christophe Bour...
 
Open Source AI - News and examples
Open Source AI - News and examplesOpen Source AI - News and examples
Open Source AI - News and examples
 
Learning to Rank Relevant Malware Strings Using Weak Supervision
Learning to Rank Relevant Malware Strings Using Weak SupervisionLearning to Rank Relevant Malware Strings Using Weak Supervision
Learning to Rank Relevant Malware Strings Using Weak Supervision
 
Inteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for CodeInteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for Code
 
voip_en
voip_envoip_en
voip_en
 
3-Way Scripts as a Practical Platform for Secure Distributed Code in Clouds
3-Way Scripts as a Practical Platform for Secure Distributed Code in Clouds3-Way Scripts as a Practical Platform for Secure Distributed Code in Clouds
3-Way Scripts as a Practical Platform for Secure Distributed Code in Clouds
 
The Role of Standards in IoT Security
The Role of Standards in IoT SecurityThe Role of Standards in IoT Security
The Role of Standards in IoT Security
 
Breaking Extreme Networks WingOS: How to own millions of devices running on A...
Breaking Extreme Networks WingOS: How to own millions of devices running on A...Breaking Extreme Networks WingOS: How to own millions of devices running on A...
Breaking Extreme Networks WingOS: How to own millions of devices running on A...
 
" Breaking Extreme Networks WingOS: How to own millions of devices running on...
" Breaking Extreme Networks WingOS: How to own millions of devices running on..." Breaking Extreme Networks WingOS: How to own millions of devices running on...
" Breaking Extreme Networks WingOS: How to own millions of devices running on...
 
Firepower ngfw internet
Firepower ngfw internetFirepower ngfw internet
Firepower ngfw internet
 
Csa UK agm 2019 - Web API attacks - Trends seen in the field Kriti Mohul
Csa UK agm 2019 - Web API attacks - Trends seen in the field Kriti MohulCsa UK agm 2019 - Web API attacks - Trends seen in the field Kriti Mohul
Csa UK agm 2019 - Web API attacks - Trends seen in the field Kriti Mohul
 
Serverless survival kit
Serverless survival kitServerless survival kit
Serverless survival kit
 
Fuzzing malware for fun & profit. Applying Coverage-Guided Fuzzing to Find Bu...
Fuzzing malware for fun & profit. Applying Coverage-Guided Fuzzing to Find Bu...Fuzzing malware for fun & profit. Applying Coverage-Guided Fuzzing to Find Bu...
Fuzzing malware for fun & profit. Applying Coverage-Guided Fuzzing to Find Bu...
 
CIS 2015 How to secure the Internet of Things? Hannes Tschofenig
CIS 2015 How to secure the Internet of Things? Hannes TschofenigCIS 2015 How to secure the Internet of Things? Hannes Tschofenig
CIS 2015 How to secure the Internet of Things? Hannes Tschofenig
 
Using LLVM to accelerate processing of data in Apache Arrow
Using LLVM to accelerate processing of data in Apache ArrowUsing LLVM to accelerate processing of data in Apache Arrow
Using LLVM to accelerate processing of data in Apache Arrow
 
Introduction To NIDS
Introduction To NIDSIntroduction To NIDS
Introduction To NIDS
 
apl5iy2ftxiwofbhsmxj-signature-584e2459f99b5370bda435f09b42cc84cc8c063b8cd454...
apl5iy2ftxiwofbhsmxj-signature-584e2459f99b5370bda435f09b42cc84cc8c063b8cd454...apl5iy2ftxiwofbhsmxj-signature-584e2459f99b5370bda435f09b42cc84cc8c063b8cd454...
apl5iy2ftxiwofbhsmxj-signature-584e2459f99b5370bda435f09b42cc84cc8c063b8cd454...
 
technical-information-gathering-slides.pdf
technical-information-gathering-slides.pdftechnical-information-gathering-slides.pdf
technical-information-gathering-slides.pdf
 
Atelier Technique CISCO ACSS 2018
Atelier Technique CISCO ACSS 2018Atelier Technique CISCO ACSS 2018
Atelier Technique CISCO ACSS 2018
 

Recently uploaded

Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Best Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfBest Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfIdiosysTechnologies1
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noidabntitsolutionsrishis
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....kzayra69
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 

Recently uploaded (20)

Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Best Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfBest Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdf
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 

StringSifter: Learning to Rank Strings Output for Speedier Malware Analysis

  • 2. ©2019 FireEye©2019 FireEye2 About Us  Michael Sikorski  Philip Tully  Jay Gibble  Matthew Haigh
  • 4. ©2019 FireEye©2019 FireEye One String can Make a Difference 4 NanoHTTPD webserver produces extra whitespace Cobalt Strike Server Detection Continued for 7 years Detection signature Track threat actors, identify C2 addresses https://blog.fox-it.com/2019/02/26/identifying-cobalt-strike-team-servers-in-the-wild/
  • 5. ©2019 FireEye©2019 FireEye Running Strings on larger binaries produces tens of thousands of strings. 5
  • 6. ©2019 FireEye©2019 FireEye Strings produces a ton of noise mixed in with important information. 6
  • 7. ©2019 FireEye©2019 FireEye What is a String 7  N characters + NULL No file format, context 0x31 0x33 0x33 0x37 0x00 – ‘1337’, right? Not necessarily: – memory address – CPU instructions – data used by the program
  • 8. ©2019 FireEye©2019 FireEye Wide Strings 8  Also be referred to as Wide strings  The Windows OS uses Wide strings internally – Microsoft’s encoding standard is UTF-16 LE  Each wide character is two bytes  C-style wide character strings terminated with double NULL (0x00, 0x00)
  • 9. ©2019 FireEye©2019 FireEye Compilation 9 SourceCode int main() { printf("Derby"); return 0; } ObjectFile "Derby" .EXEBinary .data 0x56000: "Derby" Strings persist on disk throughout the compilation process.
  • 10. ©2019 FireEye©2019 FireEye The Strings Program 10 !This program cannot be run in DOS mode. ??3@YAXPAX@Z ??2@YAPAXI@Z __CxxFrameHandler _except_handler3 WSAStartup() error: %d User-Agent: Mozilla/4.0 (compatible; MSIE 6.00; Windows NT 5.1) GetLastInputInfo SeShutdownPrivilege %sIEXPLORE.EXE SOFTWAREMicrosoftWindowsCurrentVersionApp PathsIEXPLORE.EXE [Machine IdleTime:] %d days + %.2d:%.2d:%.2d [Machine UpTime:] %-.2d Days %-.2d Hours %-.2d Minutes %-.2d Seconds ServiceDll SYSTEMCurrentControlSetServices%sParameters if exist "%s" goto selfkill del "%s" attrib -a -r -s -h "%s" Inject '%s' to PID '%d' Successfully! cmd.exe /c Hi,Master [%d/%d/%d %d:%d:%d]
  • 11. ©2019 FireEye©2019 FireEye Malware Triage 11 Customer Suspected compromise Incident Response Forensic analysis Identify malware sample Reverse Engineer Binary triage Malware analysis reverse engineers, SOC analysts, red teamers, incident responders, malware researchers
  • 12. ©2019 FireEye©2019 FireEye Knowing which strings are relevant often requires highly experienced analysts. 12
  • 13. ©2019 FireEye©2019 FireEye Strings Tells a Story 13 Relevance domain names IP addresses URLs filenames registry paths registry keys HTTP user-agent strings service configuration info keylogger indicators (e.g. ”[DELETE]”, “[BS]” third party libraries PDB strings function names debugging messages command line help/usage options OSINT runtime artifacts compiler artifacts Windows APIs library code localizations locations languages error messages random byte sequences format specifiers
  • 14. ©2019 FireEye©2019 FireEye Relevance is subjective and its definition can vary significantly across analysts. 14
  • 15. ©2019 FireEye©2019 FireEye Hypothesis and Goals 15  Develop a tool that can: – efficiently identify and prioritize strings – based on relevance for malware analysis StringSifter should: – be easy to use – generalize across: – personas, use cases, downstream apps – save time and money  How does it work?
  • 17. ©2019 FireEye©2019 FireEye  Search engines – web – e-commerce  News Feeds – social networks  Recommender systems – ads – movies – music Our Favorite Products Serve Up Rankings 17
  • 18. ©2019 FireEye©2019 FireEye ( )  Create optimal ordering of a list of items  Precise individual item scores less important than their relative ordering  In classification, regression, clustering we predict a class or single score  LTR rarely applied in security applications Learning to Rank 18 f
  • 19. ©2019 FireEye©2019 FireEye  Rank items within unseen lists in a similar way to rankings within training lists  Each item associated with a set of features and an ordinal integer label  Ordinal label is the teaching signal that encodes relevance level LTR as Supervised Learning 19
  • 20. ©2019 FireEye©2019 FireEye  Decision Trees – greedily choose splits by Gini impurity  Gradient Boosted Decision Trees (GBDTs) – combine outputs from multiple Decision Trees – reduce loss using gradient descent – weighted sum of trees’ predictions as ensemble  LightGBM – GBDTs with an LTR objective function Gradient Boosted Decision Trees 20
  • 21. ©2019 FireEye©2019 FireEye EMBER Training Dataset 21  Endgame Malware BEnchmark for Research – v1 (1.1 million PE files scanned on or before 2017)  https://arxiv.org/abs/1804.04637  https://github.com/endgameinc/ember – 400k train + test malware binaries from v1  malware defined as > 40 VT vendors say malicious  Ran Strings on 400k malware binaries – produced 3+ billion individual strings (24 GB) – performed sampling – labeled according to heuristics and FLARE hand-labeling
  • 22. ©2019 FireEye©2019 FireEye  Natural Language Processing – Markov model – Entropy rate, english KL divergence – Scrabble scores  Host, Network IoCs  Malware Regexes – encodings (base64) – format specifiers – user agents Representing Strings as Features 22 t % F 0.02 0.07 0.01 0.2 0.2 0.01 0.03 0.14 0.05 threshold = 0.01 http://evil.com SOFTWAREincludeevil.pdb t%Ft Vr}Y 0.018 0.014 0.007 0.001
  • 23. ©2019 FireEye©2019 FireEye quixotry ˈkwik-sə-trē (n.) behavior inspired by idealistic beliefs without regard to reality. 23
  • 25. ©2019 FireEye©2019 FireEye  Normalized Discounted Cumulative Gain – Normalized: divide DCG by ideal DCG on a ground truth holdout dataset – Discounted: divides each string’s predicted relevance by a monotonically increasing function (log of its ranked position) – Cumulative: the cumulative gain or summed total of every string’s relevance – Gain: the magnitude of each string’s relevance Evaluation 25
  • 26. ©2019 FireEye©2019 FireEye Results 26 StringSifter performs well on a holdout set of 7+ years of FLARE malware reports.
  • 28. ©2019 FireEye©2019 FireEye Open Sourcing StringSifter 28  The tool is now live: – https://github.com/fireeye/stringsifter – pip install stringsifter – Command line and Docker tools  flarestrings <my_sample> | rank_strings  Versatility – FLOSS outputs – live memory dumps
  • 30. ©2019 FireEye©2019 FireEye  Git + local pip install – Easy access to source code  Pip install from PyPi – If you just want to use the tool  Docker container – Minimum impact to host Install and Use 30 git clone https://github.com/fireeye/stringsifter.git cd stringsifter pip install -e . flarestrings <my_sample> | rank_strings pip install stringsifter flarestrings <my_sample> | rank_strings git clone https://github.com/fireeye/stringsifter.git cd stringsifter docker build -t stringsifter -f docker/Dockerfile . docker run -v <malware_dir>:/samples -it stringsifter flarestrings /samples/<my_sample> | rank_strings
  • 31. ©2019 FireEye©2019 FireEye  There are many versions of "strings" – Gnu binutils, BSD, various windows implementations – Inconsistent features  flarestrings – Pure python implementation of "strings" – Consistent across platforms – Prints both ASCII and wide strings flarestrings * 31 * FLARE => FireEye Labs Advanced Reverse Engineering
  • 37. ©2019 FireEye©2019 FireEye  Rapid screening for potential capabilities  Detect and handle packed / obfuscated binaries – Tipoff for automated unpacker tooling  Leverage feature vectors to focus triage  Improve NLP  Improve ranking performance on mach-o, ELF Other Use Cases and Future Work 37
  • 38. ©2019 FireEye©2019 FireEye  Plug into your malware analysis stack  Seeking critical feedback – improve accuracy and utility – pertinent edge cases, non-PE files – contribute via GitHub Issues  Beginners and experts alike  Thank you for your attention! Community Support 38 https://github.com/fireeye/stringsifter pip install stringsifter

Editor's Notes

  1. Introduce what binary triage is and how it relates to malware analysis – add a slide about other users (incident response, soc analyst, researchers (move triage before incident response.
  2. Starts at hex 21 / 94 printable characters
  3. Reverse inference
  4. Traditional ML solves a prediction problem (classification or regression) on a single instance at a time. E.g. if you are doing spam detection on email, you will look at all the features associated with that email and classify it as spam or not. The aim of traditional ML is to come up with a class (spam or no-spam) or a single numerical score for that instance. LTR solves a ranking problem on a list of items. The aim of LTR is to come up with optimal ordering of those items. As such, LTR doesn’t care much about the exact score that each item gets, but cares more about the relative ordering among all the items.
  5. Traditional ML solves a prediction problem (classification or regression) on a single instance at a time. E.g. if you are doing spam detection on email, you will look at all the features associated with that email and classify it as spam or not. The aim of traditional ML is to come up with a class (spam or no-spam) or a single numerical score for that instance. LTR solves a ranking problem on a list of items. The aim of LTR is to come up with optimal ordering of those items. As such, LTR doesn’t care much about the exact score that each item gets, but cares more about the relative ordering among all the items.
  6. - Learning to rank learns to directly rank items by training a model to predict the probability of a certain item ranking over another item. - This is done by learning a scoring function where items ranked higher should have higher scores. The model can be trained via gradient descent on a loss function defined over these scores. - For each item, gradient descent pushes the score up for every item that ranks below it and pushes the score down for every item that ranks above it. The “strength” of the push is determined by the difference in scores. - To ensure that the model focuses on getting the higher ranks (which are generally more important) correct, we can weight the “strength” of the push by a factor that accounts for how important the ranking is.
  7. Discounted reflects the goal of having the most relevant strings ranked towards the top of our predictions Normalization makes it possible to compare scores across samples since the number of strings within different Strings outputs can vary widely. which we obtain from FLARE-identified relevant strings contained within historical malware reports.
  8. Discounted reflects the goal of having the most relevant strings ranked towards the top of our predictions Normalization makes it possible to compare scores across samples since the number of strings within different Strings outputs can vary widely. which we obtain from FLARE-identified relevant strings contained within historical malware reports.