Python-Iocextract - Advanced Indicator Of Compromise (Ioc) Extractor
Monday, September 9, 2019
Edit
Python-Iocextract - Advanced Indicator Of Compromise (Ioc) Extractor - Hi friends mederc, In the article that you read this time with the title Python-Iocextract - Advanced Indicator Of Compromise (Ioc) Extractor, We have prepared this article well for you to read and retrieve information from it. hopefully fill the posts
Article Decoding,
Article Indicators of Compromise,
Article IOC Extractor,
Article Malware Research,
Article Python-Iocextract,
Article Threat Intelligence,
Article Threat Sharing,
Article Threatintel, we write this you can understand. Alright, happy reading.
Title : Python-Iocextract - Advanced Indicator Of Compromise (Ioc) Extractor
link : Python-Iocextract - Advanced Indicator Of Compromise (Ioc) Extractor
Advanced Indicator of Compromise (IOC) extractor.
Overview
This library extracts URLs, IP addresses, MD5/SHA hashes, electronic mail addresses, in addition to YARA rules from text corpora. It includes around encoded in addition to "defanged" IOCs inwards the output, in addition to optionally decodes/refangs them.
The Problem
It is mutual do for malware analysts or endpoint software to "defang" IOCs such equally URLs in addition to IP addresses, inwards guild to forestall accidental exposure to alive malicious content. Being able to extract in addition to aggregate these IOCs is oft valuable for analysts. Unfortunately, existing "IOC extraction" tools oft overstep correct yesteryear them, equally they are non caught yesteryear criterion regex.
For example, the unproblematic defanging technique of surrounding periods alongside brackets:
The Solution
By combining especially crafted regex alongside around custom postprocessing, nosotros are able to both uncovering in addition to deobfuscate "defanged" IOCs. This saves fourth dimension in addition to endeavour for the analyst, who mightiness otherwise accept to manually uncovering in addition to convert IOCs into machine-readable format.
A Simple Use Case
Many Twitter users post C2s or other valuable IOC information alongside defanged URLs. For example, this tweet from @InQuest:
Installation
You may involve to install the Python evolution headers inwards guild to install the
Usage
Try extracting around defanged URLs:
If you lot want, you lot tin also "refang", or take mutual obfuscation methods from IOCs:
Should I Use iocextract?
Are you...
Extracting possibly-defanged IOCs from patently text, similar the contents of tweets or weblog posts?
Yes! This is just what iocextract was designed for, in addition to where it performs best. Want to instruct a footstep further in addition to automate extraction in addition to storage? Check out ThreatIngestor.
Extracting URLs that accept been hex or base64 encoded?
Yes, but the CLI mightiness non plow over you lot the best results. Try writing a Python script in addition to calling
Note that you lot volition virtually probable terminate upwards alongside extra garbage at the terminate of URLs.
Extracting IOCs that accept non been defanged, from HTML/XML/RTF?
Maybe, but you lot should regard using the
If you're extracting from HTML, regard using something similar Beautiful Soup to showtime isolate the text content, in addition to and then overstep that to iocextract, like this.
Extracting IOCs that accept non been defanged, from binary information similar executables, or really large inputs?
Probably not. The regex inwards iocextract is designed to live on flexible to select grip of defanged IOCs, therefore it performs significantly worse than a solution that is designed to select grip of simply criterion IOCs.
Consider using something similar Cacador instead.
You are now reading the article Python-Iocextract - Advanced Indicator Of Compromise (Ioc) Extractor with the link address https://mederc.blogspot.com/2019/09/python-iocextract-advanced-indicator-of.html
Title : Python-Iocextract - Advanced Indicator Of Compromise (Ioc) Extractor
link : Python-Iocextract - Advanced Indicator Of Compromise (Ioc) Extractor
Python-Iocextract - Advanced Indicator Of Compromise (Ioc) Extractor
Advanced Indicator of Compromise (IOC) extractor.
Overview
This library extracts URLs, IP addresses, MD5/SHA hashes, electronic mail addresses, in addition to YARA rules from text corpora. It includes around encoded in addition to "defanged" IOCs inwards the output, in addition to optionally decodes/refangs them.
The Problem
It is mutual do for malware analysts or endpoint software to "defang" IOCs such equally URLs in addition to IP addresses, inwards guild to forestall accidental exposure to alive malicious content. Being able to extract in addition to aggregate these IOCs is oft valuable for analysts. Unfortunately, existing "IOC extraction" tools oft overstep correct yesteryear them, equally they are non caught yesteryear criterion regex.
For example, the unproblematic defanging technique of surrounding periods alongside brackets:
127[.]0[.]0[.]1
Existing tools that utilization a unproblematic IP address regex volition ignore this IOC entirely.The Solution
By combining especially crafted regex alongside around custom postprocessing, nosotros are able to both uncovering in addition to deobfuscate "defanged" IOCs. This saves fourth dimension in addition to endeavour for the analyst, who mightiness otherwise accept to manually uncovering in addition to convert IOCs into machine-readable format.
A Simple Use Case
Many Twitter users post C2s or other valuable IOC information alongside defanged URLs. For example, this tweet from @InQuest:
Recommended reading in addition to cracking piece of employment from @unit42_intel: https://researchcenter.paloaltonetworks.com/2018/02/unit42-sofacy-attacks-multiple-government-entities/ ... InQuest customers accept had detection for threats delivered from hotfixmsupload[.]com since 6/3/2017 in addition to cdnverify[.]net since 2/1/18.
If nosotros run this through the extractor, nosotros tin easily delineate out the URLs:https://researchcenter.paloaltonetworks.com/2018/02/unit42-sofacy-attacks-multiple-government-entities/ hotfixmsupload[.]com cdnverify[.]net
Passing inwards refang=True
at extraction fourth dimension would take the obfuscation, but since these are existent IOCs, let's leave of absence them defanged inwards our documentation. :)Installation
You may involve to install the Python evolution headers inwards guild to install the
regex
dependency. On Ubuntu/Debian-based systems, try:sudo apt-get install python-dev
Then install iocextract
from pip:pip install iocextract
If you lot accept problems installing on Windows, essay installing regex
conduct yesteryear downloading the appropriate cycle from PyPI in addition to running e.g.:pip install regex-2018.06.21-cp27-none-win_amd64.whl
Usage
Try extracting around defanged URLs:
>>> content = """ ... I actually honey example[.]com! ... All the bots are on hxxp://example.com/bad/url these days. ... C2: tcp://example[.]com:8989/bad ... """ >>> import iocextract >>> for url inwards iocextract.extract_urls(content): ... impress url ... hxxp://example.com/bad/url tcp://example[.]com:8989/bad example[.]com tcp://example[.]com:8989/bad
Note that around URLs may exhibit upwards twice if they are caught yesteryear multiple regexes.If you lot want, you lot tin also "refang", or take mutual obfuscation methods from IOCs:
>>> for url inwards iocextract.extract_urls(content, refang=True): ... impress url ... http://example.com/bad/url http://example.com:8989/bad http://example.com http://example.com:8989/bad
You tin fifty-fifty extract in addition to decode hex-encoded in addition to base64-encoded URLs:>>> content = '612062756e6368206f6620776f72647320687474703a2f2f6578616d706c652e636f6d2f70617468206d6f726520776f726473' >>> for url inwards iocextract.extract_urls(content): ... impress url ... 687474703a2f2f6578616d706c652e636f6d2f70617468 >>> for url inwards iocextract.extract_urls(content, refang=True): ... impress url ... http://example.com/path
All extract_*
functions inwards this library provide iterators, non lists. The produce goodness of this demeanour is that iocextract
tin procedure extremely large inputs, alongside a really depression overhead. However, if for around argue you lot involve to iterate over the IOCs to a greater extent than than once, you lot volition accept to relieve the results equally a list:>>> list(iocextract.extract_urls(content)) ['hxxp://example.com/bad/url', 'tcp://example[.]com:8989/bad', 'example[.]com', 'tcp://example[.]com:8989/bad']
H5N1 command-line tool is also included:$ iocextract -h usage: iocextract [-h] [--input INPUT] [--output OUTPUT] [--extract-emails] [--extract-ips] [--extract-ipv4s] [--extract-ipv6s] [--extract-urls] [--extract-yara-rules] [--extract-hashes] [--custom-regex REGEX_FILE] [--refang] [--strip-urls] [--wide] Advanced Indicator of Compromise (IOC) extractor. If no arguments are specified, the default demeanour is to extract all IOCs. optional arguments: -h, --help exhibit this attention message in addition to leave of absence --input INPUT default: stdin --output OUTPUT default: stdout --extract-emails --extract-ips --extract-ipv4s --extract-ipv6s --extract-urls --extract-yara-rules --extract-hashes --custom-regex REGEX_FILE file alongside custom regex strings, 1 per line, alongside 1 capture grouping each --refang default: no --strip-urls take possible garbage from the terminate of urls. default: no --wide preprocess input to allow wide-encoded graphic symbol matches. default: no
Only URLs, emails, in addition to IPv4 addresses tin live on "refanged".Should I Use iocextract?
Are you...
Extracting possibly-defanged IOCs from patently text, similar the contents of tweets or weblog posts?
Yes! This is just what iocextract was designed for, in addition to where it performs best. Want to instruct a footstep further in addition to automate extraction in addition to storage? Check out ThreatIngestor.
Extracting URLs that accept been hex or base64 encoded?
Yes, but the CLI mightiness non plow over you lot the best results. Try writing a Python script in addition to calling
iocextract.extract_encoded_urls
directly.Note that you lot volition virtually probable terminate upwards alongside extra garbage at the terminate of URLs.
Extracting IOCs that accept non been defanged, from HTML/XML/RTF?
Maybe, but you lot should regard using the
--strip-urls
CLI flag (or the strip=True
parameter inwards the library), in addition to you lot may nevertheless instruct around extra garbage inwards your output.If you're extracting from HTML, regard using something similar Beautiful Soup to showtime isolate the text content, in addition to and then overstep that to iocextract, like this.
Extracting IOCs that accept non been defanged, from binary information similar executables, or really large inputs?
Probably not. The regex inwards iocextract is designed to live on flexible to select grip of defanged IOCs, therefore it performs significantly worse than a solution that is designed to select grip of simply criterion IOCs.
Consider using something similar Cacador instead.
Thus the article Python-Iocextract - Advanced Indicator Of Compromise (Ioc) Extractor
That's all the article Python-Iocextract - Advanced Indicator Of Compromise (Ioc) Extractor this time, hopefully can benefit you all. okay, see you in another article posting.
You are now reading the article Python-Iocextract - Advanced Indicator Of Compromise (Ioc) Extractor with the link address https://mederc.blogspot.com/2019/09/python-iocextract-advanced-indicator-of.html