Strelka - Scanning Files At Scale Amongst Python Together With Zeromq
Friday, September 20, 2019
Edit
Strelka - Scanning Files At Scale Amongst Python Together With Zeromq - Hi friends mederc, In the article that you read this time with the title Strelka - Scanning Files At Scale Amongst Python Together With Zeromq, We have prepared this article well for you to read and retrieve information from it. hopefully fill the posts
Article Linux,
Article Python,
Article Scanning,
Article Strelka,
Article ZeroMQ, we write this you can understand. Alright, happy reading.
Title : Strelka - Scanning Files At Scale Amongst Python Together With Zeromq
link : Strelka - Scanning Files At Scale Amongst Python Together With Zeromq
Strelka is a real-time file scanning organisation used for threat hunting, threat detection, together with incident response. Based on the pattern established yesteryear Lockheed Martin's Laika BOSS together with similar projects (see: related projects), Strelka's purpose is to perform file extraction together with metadata collection at huge scale.
Strelka differs from its sibling projects inward a few pregnant ways:
Frequently Asked Questions
"Who is Strelka?"
Strelka is 1 of the 2nd generation Soviet infinite dogs to accomplish orbital spaceflight -- the refer is an homage to Lockheed Martin's Laika BOSS, 1 of the get-go world projects of this type together with from which Strelka's inwardness pattern is based.
"Why would I desire a file scanning system?"
File metadata is an additional pillar of information (alongside network, endpoint, authentication, together with cloud) that is effective inward enabling threat hunting, threat detection, together with incident response together with tin assist lawsuit analysts together with incident responders duet visibility gaps inward their environment. This type of organisation is peculiarly useful for identifying threat actors during KC3 together with KC7. For examples of what Strelka tin do, delight read the use cases.
"Should I switch from my electrical flow file scanning organisation to Strelka?"
It depends -- nosotros recommend reviewing the features of each together with choosing the most appropriate tool for your needs. We believe the most pregnant motivating factors for switching to Strelka are:
"Are Strelka's scanners compatible with Laika BOSS, File Scanning Framework, or Assemblyline?"
Due to differences inward design, Strelka's scanners are non straight compatible with Laika BOSS, File Scanning Framework, or Assemblyline. With or thus effort, most scanners tin probable live ported to the other projects.
"Is Strelka an intrusion detection organisation (IDS)?"
Strelka shouldn't live idea of equally an IDS, but it tin live used for threat detection through YARA dominion matching together with downstream metadata interpretation. Strelka's pattern follows the philosophy established yesteryear other pop metadata collection systems (Bro, Sysmon, Volatility, etc.): it extracts information together with leaves the decision-making upward to the user.
"Does it piece of occupation at scale?"
Everyone has their ain Definition of "at scale," but nosotros direct maintain been using Strelka together with systems similar it to scan upward to 100 1 grand 1000 files each 24-hour interval for over a twelvemonth together with direct maintain never reached a betoken where the organisation could non scale to our needs -- equally file book together with variety increases, horizontally scaling the organisation should allow y'all to scan whatever number of files.
"Doesn't this utilization a lot of bandwidth?"
Yep! Strelka isn't designed to operate inward limited bandwidth environments, but nosotros direct maintain experimented with solutions to this together with in that place are tricks y'all tin utilization to trim back bandwidth. These are what we've found most successful:
"Should I run my Strelka cluster on my Bro/Suricata network sensor?"
No! Strelka clusters run CPU-intensive processes that volition negatively affect system-critical applications similar Bro together with Suricata. If y'all desire to integrate a network sensor with Strelka, together with then utilization
"I direct maintain other questions!"
Please file an number or contact the projection squad at TTS-CFC-OpenSource@target.com. The projection Pb tin also live reached on Twitter at @jshlbrd.
Installation
The recommended operating organisation for Strelka is Ubuntu 18.04 LTS (Bionic Beaver) -- it may piece of occupation with before versions of Ubuntu if the appropriate packages are installed. We recommend using the Docker container for production deployments together with welcome clitoris requests that add together instructions for installing on other operating systems.
Ubuntu 18.04 LTS
Docker
Quickstart
By default, Strelka is configured to utilization a minimal "quickstart" deployment that allows users to seek out the system. This configuration is non recommended for production deployments. Using ii Terminal windows, practise the following:
Terminal 1
Deployment
Utilities
Strelka's pattern equally a distributed organisation creates the demand for client-side together with server-side utilities. Client-side utilities render methods for sending file requests to a cluster together with server-side utilities render methods for distributing together with scanning files sent to a cluster.
strelka.py
The assist page for
strelka_dirstream.py
Additionally, for select file sources, this utility tin parse metadata embedded inward the file's filename together with shipping it to the cluster equally external metadata. Bro network sensors are currently the precisely supported file source, but other application-specific sources tin live added.
Using the utility with Bro requires no modification of the Bro source code, but it does require the network sensor to run a Bro script that enables file extraction. We recommend using our stub Bro script (
This utility is managed with 1 configuration file:
The assist page for
strelka_user_client.py
Using this utility, users tin shipping iii types of file requests:
generate_curve_certificates.py
The assist page for
validate_yara.py
The assist page for
Configuration Files
Strelka uses YAML for configuring client-side together with server-side utilities. We recommend using the default configurations together with modifying the options equally needed.
Strelka Configuration (
Strelka's cluster configuration file is stored inward
Daemon Configuration
The daemon configuration contains 5 sub-sections: processes, network, broker, workers, together with logrotate.
The "processes" department controls the processes launched yesteryear the daemon. The configuration options are:
Remote Configuration
The remote configuration contains 1 sub-section: remote.
The "remote" department controls how workers recall files from remote file stores. Google Cloud Storage, Amazon S3, OpenStack Swift, together with HTTP file stores are supported. All options inward this configuration file are optionally read from surroundings variables if they are "null". The configuration options are:
Scan Configuration
The scan configuration contains ii sub-sections: distribution together with scanners.
The "distribution" department controls how files are distributed through the system. The configuration options are:
Assignment occurs through a organisation of positive together with negative matches: whatever negative fit causes the scanner to skip assignment together with at to the lowest degree 1 positive fit causes the scanner to live assigned. Influenza A virus subtype H5N1 unique identifier (
Below is a sample configuration that runs the scanner "ScanHeader" on all files together with the scanner "ScanRar" on files that fit a YARA dominion named "rar_file".
The "positive" lexicon determines which flavors, filenames, together with sources exertion the scanner to live assigned. Flavors is a listing of literal strings spell filenames together with sources are regular expressions. One positive fit volition assign the scanner to the file.
Below is a sample configuration that shows how RAR files tin live matched against a YARA dominion (
Each scanner also supports negative matching through the "negative" dictionary. Negative matches occur before positive matches, thus whatever negative fit guarantees that the scanner volition non live assigned. Similar to positive matches, negative matches back upward flavors, filenames, together with sources.
Below is a sample configuration that shows how RAR files tin live positively matched against a YARA dominion (
Each scanner supports multiple mappings -- this makes it possible to assign dissimilar priorities together with options to the scanner based on the mapping variables. If a scanner has multiple mappings that fit a file, together with then the get-go mapping wins.
Below is a sample configuration that shows how a unmarried scanner tin apply dissimilar options depending on the mapping.
Python Logging Configuration (
DirStream Configuration (
Strelka's dirstream configuration file is stored inward
The "processes" department controls the processes launched yesteryear the utility. The configuration options are:
Encryption together with Authentication
Strelka has built-in, optional encryption together with authentication for customer connections provided yesteryear CurveZMQ.
CurveZMQ
CurveZMQ (Curve) is ZMQ's encryption together with authentication protocol. Read to a greater extent than well-nigh it here.
Using Curve
Strelka uses Curve to encrypt together with authenticate connections betwixt clients together with brokers. By default, Strelka's Curve back upward is setup to enable encryption but non authentication.
To enable Curve encryption, the broker must live loaded with a private cardinal -- whatever clients connecting to the broker must direct maintain the broker's world cardinal to successfully connect.
To enable Curve encryption together with authentication, the broker must live loaded with a private cardinal together with a directory of customer world keys -- whatever clients connecting to the broker must direct maintain the broker's world cardinal together with direct maintain their customer cardinal loaded on the broker to successfully connect.
The
Clusters
The next are recommendations together with considerations to proceed inward psyche when deploying clusters.
General Recommendations
The next recommendations apply to all clusters:
Sizing Considerations
Multiple variables should live considered when determining the appropriate size for a cluster:
Docker Considerations
Below is a listing of considerations to proceed inward psyche when running a cluster with Docker containers:
Management
Due to its distributed design, nosotros recommend using container orchestration (e.g. Kubernetes) or configuration management/provisioning (e.g. Ansible, SaltStack, etc.) systems for managing clusters.
You are now reading the article Strelka - Scanning Files At Scale Amongst Python Together With Zeromq with the link address https://mederc.blogspot.com/2019/09/strelka-scanning-files-at-scale-amongst.html
Title : Strelka - Scanning Files At Scale Amongst Python Together With Zeromq
link : Strelka - Scanning Files At Scale Amongst Python Together With Zeromq
Strelka - Scanning Files At Scale Amongst Python Together With Zeromq
Strelka is a real-time file scanning organisation used for threat hunting, threat detection, together with incident response. Based on the pattern established yesteryear Lockheed Martin's Laika BOSS together with similar projects (see: related projects), Strelka's purpose is to perform file extraction together with metadata collection at huge scale.
Strelka differs from its sibling projects inward a few pregnant ways:
- Codebase is Python 3 (minimum supported version is 3.6)
- Designed for non-interactive, distributed systems (network safety monitoring sensors, alive answer scripts, disk/memory extraction, etc.)
- Supports direct together with remote file requests (Amazon S3, Google Cloud Storage, etc.) with optional encryption together with authentication
- Uses widely supported networking, messaging, together with information libraries/formats (ZeroMQ, protocol buffers, YAML, JSON)
- Built-in scan final result logging together with log management (compatible with Filebeat/ElasticStack, Splunk, etc.)
Frequently Asked Questions
"Who is Strelka?"
Strelka is 1 of the 2nd generation Soviet infinite dogs to accomplish orbital spaceflight -- the refer is an homage to Lockheed Martin's Laika BOSS, 1 of the get-go world projects of this type together with from which Strelka's inwardness pattern is based.
"Why would I desire a file scanning system?"
File metadata is an additional pillar of information (alongside network, endpoint, authentication, together with cloud) that is effective inward enabling threat hunting, threat detection, together with incident response together with tin assist lawsuit analysts together with incident responders duet visibility gaps inward their environment. This type of organisation is peculiarly useful for identifying threat actors during KC3 together with KC7. For examples of what Strelka tin do, delight read the use cases.
"Should I switch from my electrical flow file scanning organisation to Strelka?"
It depends -- nosotros recommend reviewing the features of each together with choosing the most appropriate tool for your needs. We believe the most pregnant motivating factors for switching to Strelka are:
- Modern codebase (Python 3.6+)
- More scanners (40+ at release) together with file types (60+ at release) than related projects
- Supports direct together with remote file requests
- Built-in encryption together with authentication for customer connections
- Built using libraries together with formats that allow cross-platform, cross-language support
"Are Strelka's scanners compatible with Laika BOSS, File Scanning Framework, or Assemblyline?"
Due to differences inward design, Strelka's scanners are non straight compatible with Laika BOSS, File Scanning Framework, or Assemblyline. With or thus effort, most scanners tin probable live ported to the other projects.
"Is Strelka an intrusion detection organisation (IDS)?"
Strelka shouldn't live idea of equally an IDS, but it tin live used for threat detection through YARA dominion matching together with downstream metadata interpretation. Strelka's pattern follows the philosophy established yesteryear other pop metadata collection systems (Bro, Sysmon, Volatility, etc.): it extracts information together with leaves the decision-making upward to the user.
"Does it piece of occupation at scale?"
Everyone has their ain Definition of "at scale," but nosotros direct maintain been using Strelka together with systems similar it to scan upward to 100 1 grand 1000 files each 24-hour interval for over a twelvemonth together with direct maintain never reached a betoken where the organisation could non scale to our needs -- equally file book together with variety increases, horizontally scaling the organisation should allow y'all to scan whatever number of files.
"Doesn't this utilization a lot of bandwidth?"
Yep! Strelka isn't designed to operate inward limited bandwidth environments, but nosotros direct maintain experimented with solutions to this together with in that place are tricks y'all tin utilization to trim back bandwidth. These are what we've found most successful:
- Reduce the total book of files sent to Strelka
- Use a tracking organisation to precisely shipping unique files to Strelka (networked Redis servers are peculiarly useful for this)
- Use traffic command (tc) to shape connections to Strelka
"Should I run my Strelka cluster on my Bro/Suricata network sensor?"
No! Strelka clusters run CPU-intensive processes that volition negatively affect system-critical applications similar Bro together with Suricata. If y'all desire to integrate a network sensor with Strelka, together with then utilization
strelka_dirstream.py
. This utility is capable of sending millions of files per 24-hour interval from a unmarried network sensor to a Strelka cluster without impacting system-critical applications."I direct maintain other questions!"
Please file an number or contact the projection squad at TTS-CFC-OpenSource@target.com. The projection Pb tin also live reached on Twitter at @jshlbrd.
Installation
The recommended operating organisation for Strelka is Ubuntu 18.04 LTS (Bionic Beaver) -- it may piece of occupation with before versions of Ubuntu if the appropriate packages are installed. We recommend using the Docker container for production deployments together with welcome clitoris requests that add together instructions for installing on other operating systems.
Ubuntu 18.04 LTS
- Update packages together with install construct packages
apt-get update && apt-get install --no-install-recommends automake build-essential whorl gcc git libtool brand python3-dev python3-pip python3-wheel
- Install runtime packages
apt-get install --no-install-recommends antiword libarchive-dev libfuzzy-dev libimage-exiftool-perl libmagic-dev libssl-dev python3-setuptools tesseract-ocr unrar upx jq
- Install pip3 packages
pip3 install beautifulsoup4 boltons boto3 gevent google-cloud-storage html5lib inflection interruptingcow jsbeautifier libarchive-c lxml git+https://github.com/aaronst/macholibre.git olefile oletools pdfminer.six pefile pgpdump3 protobuf pyelftools pygments pyjsparser pylzma git+https://github.com/jshlbrd/pyopenssl.git python-docx git+https://github.com/jshlbrd/python-entropy.git python-keystoneclient python-magic python-swiftclient pyyaml pyzmq rarfile requests rpmfile schedule ssdeep tnefparse
- Install YARA
curl -OL https://github.com/VirusTotal/yara/archive/v3.8.1.tar.gz tar -zxvf v3.8.1.tar.gz cd yara-3.8.1/ ./bootstrap.sh ./configure --with-crypto --enable-dotnet --enable-magic brand && brand install && brand depository fiscal establishment check echo "/usr/local/lib" >> /etc/ld.so.conf ldconfig
- Install yara-python
curl -OL https://github.com/VirusTotal/yara-python/archive/v3.8.1.tar.gz tar -zxvf v3.8.1.tar.gz cd yara-python-3.8.1/ python3 setup.py construct --dynamic-linking python3 setup.py install
- Create Strelka directories
mkdir /var/log/strelka/ && mkdir /opt/strelka/
- Clone this repository
git clone https://github.com/target/strelka.git /opt/strelka/
- Compile the Strelka protobuf
cd /opt/strelka/server/ && protoc --python_out=. strelka.proto
- (Optional) Install the Strelka utilities
cd /opt/strelka/ && python3 setup.py -q construct && python3 setup.py -q install && python3 setup.py -q laid upward clean --all
Docker
- Clone this repository
git clone https://github.com/target/strelka.git /opt/strelka/
- Build the container
cd /opt/strelka/ && docker construct -t strelka .
Quickstart
By default, Strelka is configured to utilization a minimal "quickstart" deployment that allows users to seek out the system. This configuration is non recommended for production deployments. Using ii Terminal windows, practise the following:
Terminal 1
$ strelka.py
Terminal 2:$ strelka_user_client.py --broker 127.0.0.1:5558 --path <path to the file to scan> $ truthful cat /var/log/strelka/*.log | jq .
Terminal 1 runs a Strelka cluster (broker, iv workers, together with log rotation) with debug logging together with Terminal 2 is used to shipping file requests to the cluster together with read the scan results.Deployment
Utilities
Strelka's pattern equally a distributed organisation creates the demand for client-side together with server-side utilities. Client-side utilities render methods for sending file requests to a cluster together with server-side utilities render methods for distributing together with scanning files sent to a cluster.
strelka.py
strelka.py
is a non-interactive, server-side utility that contains everything needed for running a large-scale, distributed Strelka cluster. This includes:- Capability to run servers inward whatever combination of broker/workers
- Broker distributes file tasks to workers
- Workers perform file analysis on tasks
- On-disk scan final result logging
- Configurable log rotation together with management
- Compatible with external log shippers (e.g. Filebeat, Splunk Universal Forwarder, etc.)
- Supports encryption together with authentication for connections betwixt clients together with brokers
- Self-healing shaver processes (brokers, workers, log management)
etc/strelka/strelka.yml
together with etc/strelka/pylogging.ini
.The assist page for
strelka.py
is shown below:usage: strelka.py [options] runs Strelka equally a distributed cluster. optional arguments: -h, --help present this assist message together with instruct out -d, --debug enable debug messages to the console -c STRELKA_CFG, --strelka-config STRELKA_CFG path to strelka configuration file -l LOGGING_INI, --logging-ini LOGGING_INI path to python logging configuration file
strelka_dirstream.py
strelka_dirstream.py
is a non-interactive, client-side utility used for sending files from a directory to a Strelka cluster inward nigh real-time. This utility uses inotify to lookout adult man the directory together with sends files to the cluster equally presently equally possible subsequently they are written.Additionally, for select file sources, this utility tin parse metadata embedded inward the file's filename together with shipping it to the cluster equally external metadata. Bro network sensors are currently the precisely supported file source, but other application-specific sources tin live added.
Using the utility with Bro requires no modification of the Bro source code, but it does require the network sensor to run a Bro script that enables file extraction. We recommend using our stub Bro script (
etc/bro/extract-strelka.bro
) to extract files. Other extraction scripts volition also work, but they volition non parse Bro's metadata.This utility is managed with 1 configuration file:
etc/dirstream/dirstream.yml
.The assist page for
strelka_dirstream.py
is shown below:usage: strelka_dirstream.py [options] sends files from a directory to a Strelka cluster inward nigh real-time. optional arguments: -h, --help present this assist message together with instruct out -d, --debug enable debug messages to the console -c DIRSTREAM_CFG, --dirstream-config DIRSTREAM_CFG path to dirstream configuration file
strelka_user_client.py
strelka_user_client.py
is a user-driven, client-side utility that is used for sending ad-hoc file requests to a cluster. This customer should live used when file analysis is needed for a specific file or grouping of files -- it is explicitly designed for users together with should non live expected to perform long-lived or fully automated file requests. We recommend using this utility equally an instance of what is required inward edifice novel customer utilities.Using this utility, users tin shipping iii types of file requests:
- Individual file
- Directory of files
- Remote file (see: remote file requests)
strelka_user_client.py
is shown below:usage: strelka_user_client.py [options] sends ad-hoc file requests to a Strelka cluster. optional arguments: -h, --help present this assist message together with instruct out -d, --debug enable debug messages to the console -b BROKER, --broker BROKER network address together with network port of the broker (e.g. 127.0.0.1:5558) -p PATH, --path PATH path to the file or directory of files to shipping to the broker -l LOCATION, --location LOCATION JSON representation of a place for the cluster to recall files from -t TIMEOUT, --timeout TIMEOUT sum of fourth dimension (in seconds) to hold off until a file transfer times out -bpk BROKER_PUBLIC_KEY, --broker-public-key BROKER_PUBLIC_KEY place of the broker Curve world cardinal certificate (this selection enables bend encryption together with must live used if the broker has bend enabled) -csk CLIENT_SECRET_KEY, --client-secret-key CLIENT_SECRET_KEY place of the customer Curve tube cardinal certificate (this selection enables bend encryption together with must live used if the broker has bend enabled) -ug, --use-green determines if PyZMQ greenish should live used, which tin increment surgery at the opportunity of message loss
generate_curve_certificates.py
generate_curve_certificates.py
is a utility used for generating broker together with worker Curve certificates. This utility is required for setting upward Curve encryption/authentication.The assist page for
generate_curve_certificates.py
is shown below:usage: generate_curve_certificates.py [options] generates bend certificates used yesteryear brokers together with clients. optional arguments: -h, --help present this assist message together with instruct out -p PATH, --path PATH path to shop keys inward (defaults to electrical flow working directory) -b, --broker generate bend certificates for a broker -c, --client generate bend certificates for a customer -cf CLIENT_FILE, --client-file CLIENT_FILE path to a file containing line-separated listing of clients to generate keys for, useful for creating many customer keys at once
validate_yara.py
validate_yara.py
is a utility used for recursively validating a directory of YARA rules files. This tin live useful when debugging issues related to the ScanYara
scanner.The assist page for
validate_yara.py
is shown below:usage: validate_yara.py [options] validates YARA rules files. optional arguments: -h, --help present this assist message together with instruct out -p PATH, --path PATH path to directory containing YARA rules -e, --error boolean that determines if warnings should exertion errors
Configuration Files
Strelka uses YAML for configuring client-side together with server-side utilities. We recommend using the default configurations together with modifying the options equally needed.
Strelka Configuration (
strelka.py
)Strelka's cluster configuration file is stored inward
etc/strelka/strelka.yml
together with contains iii sections: daemon, remote, together with scan.Daemon Configuration
The daemon configuration contains 5 sub-sections: processes, network, broker, workers, together with logrotate.
The "processes" department controls the processes launched yesteryear the daemon. The configuration options are:
- "run_broker": boolean that determines if the server should run a Strelka broker procedure (defaults to True)
- "run_workers": boolean that determines if the server should run Strelka worker processes (defaults to True)
- "run_logrotate": boolean that determines if the server should run a Strelka log rotation procedure (defaults to True)
- "worker_count": number of workers to spawn (defaults to 4)
- "shutdown_timeout": sum of fourth dimension (in seconds) that volition elapse before the daemon forcibly kills shaver processes subsequently they direct maintain received a shutdown command (defaults to 45 seconds)
- "broker": network address of the broker (defaults to 127.0.0.1)
- "request_socket_port": network port used yesteryear clients to shipping file requests to the broker (defaults to 5558)
- "task_socket_port": network port used yesteryear workers to have tasks from the broker (defaults to 5559)
- "poller_timeout": sum of fourth dimension (in milliseconds) that the broker polls for customer requests together with worker statuses (defaults to 1000 milliseconds)
- "broker_secret_key": place of the broker Curve tube cardinal certificate (enables Curve encryption, requires clients to utilization Curve, defaults to None)
- "client_public_keys": place of the directory containing customer Curve world cardinal certificates (enables Curve encryption together with authentication, requires clients to utilization Curve, defaults to None)
- "prune_frequency": frequency (in seconds) at which the broker prunes dead workers (defaults to 5 seconds)
- "prune_delta": delta (in seconds) that must overstep since a worker terminal checked inward with the broker before it is considered dead together with is pruned (defaults to 10 seconds)
- "task_socket_reconnect": sum of fourth dimension (in milliseconds) that the chore socket volition exertion to reconnect inward the lawsuit of TCP disconnection, this volition direct maintain additional jitter applied (defaults to 100ms plus jitter)
- "task_socket_reconnect_max": maximum sum of fourth dimension (in milliseconds) that the chore socket volition exertion to reconnect inward the lawsuit of TCP disconnection, this volition direct maintain additional jitter applied (defaults to 4000ms plus jitter)
- "poller_timeout": sum of fourth dimension (in milliseconds) that workers poll for file tasks (defaults to 1000 milliseconds)
- "file_max": number of files a worker volition procedure before shutting downward (defaults to 10000)
- "time_to_live": sum of fourth dimension (in minutes) that a worker volition run before shutting downward (defaults to xxx minutes)
- "heartbeat_frequency": frequency (in seconds) at which a worker sends a heartbeat to the broker if it has non received whatever file tasks (defaults to 10 seconds)
- "log_directory": place where worker scan results are logged to (defaults to /var/log/strelka/)
- "log_field_case": champaign instance ("camel" or "snake") of the scan final result log file information (defaults to camel)
- "log_bundle_events": boolean that determines if scan results should live bundled inward unmarried lawsuit equally an array or inward multiple events (defaults to True)
- "directory": directory to run log rotation on (defaults to /var/log/strelka/)
- "compression_delta": delta (in minutes) that must overstep since a log file was terminal modified before it is compressed (defaults to fifteen minutes)
- "deletion_delta": delta (in minutes) that must overstep since a compressed log file was terminal modified before it is deleted (defaults to 360 minutes / 6 hours)
Remote Configuration
The remote configuration contains 1 sub-section: remote.
The "remote" department controls how workers recall files from remote file stores. Google Cloud Storage, Amazon S3, OpenStack Swift, together with HTTP file stores are supported. All options inward this configuration file are optionally read from surroundings variables if they are "null". The configuration options are:
- "remote_timeout": sum of fourth dimension (in seconds) to hold off before timing out private file retrieval
- "remote_retries": number of times private file retrieval volition live re-attempted inward the lawsuit of a timeout
- "google_application_credentials": path to the Google Cloud Storage JSON credentials file
- "aws_access_key_id": AWS access cardinal ID
- "aws_secret_access_key": AWS tube access key
- "aws_default_region": default AWS region
- "st_auth_version": OpenStack authentication version (defaults to 3)
- "os_auth_url": OpenStack Keystone authentication URL
- "os_username": OpenStack username
- "os_password": OpenStack password
- "os_cert": OpenStack Keystone certificate
- "os_cacert": OpenStack Keystone CA Certificate
- "os_user_domain_name": OpenStack user domain
- "os_project_name": OpenStack projection name
- "os_project_domain_name": OpenStack projection domain
- "http_basic_user": HTTP Basic authentication username
- "http_basic_pass": HTTP Basic authentication password
- "http_verify": path to the CA bundle (file or directory) used for SSL verification (defaults to False, no verification)
Scan Configuration
The scan configuration contains ii sub-sections: distribution together with scanners.
The "distribution" department controls how files are distributed through the system. The configuration options are:
- "close_timeout": sum of fourth dimension (in seconds) that a scanner tin pass closing itself (defaults to xxx seconds)
- "distribution_timeout": sum of fourth dimension (in seconds) that a unmarried file tin live distributed to all scanners (defaults to 1800 seconds / xxx minutes)
- "scanner_timeout": sum of fourth dimension (in seconds) that a scanner tin pass scanning a file (defaults to 600 seconds / 10 minutes, tin live overridden per-scanner)
- "maximum_depth": maximum depth that shaver files volition live processed yesteryear scanners
- "taste_mime_db": place of the MIME database used to gustation files (defaults to None, organisation default)
- "taste_yara_rules": place of the directory of YARA files that contains rules used to gustation files (defaults to etc/strelka/taste/)
ScanZip
) together with the value is a listing of dictionaries containing values for mappings, scanner priority, together with scanner options.Assignment occurs through a organisation of positive together with negative matches: whatever negative fit causes the scanner to skip assignment together with at to the lowest degree 1 positive fit causes the scanner to live assigned. Influenza A virus subtype H5N1 unique identifier (
*
) is used to assign scanners to all flavors. See File Distribution, Scanners, Flavors, together with Tasting for to a greater extent than details on flavors.Below is a sample configuration that runs the scanner "ScanHeader" on all files together with the scanner "ScanRar" on files that fit a YARA dominion named "rar_file".
scanners: 'ScanHeader': - positive: flavors: - '*' priority: 5 options: length: fifty 'ScanRar': - positive: flavors: - 'rar_file' priority: 5 options: limit: 1000
Below is a sample configuration that shows how RAR files tin live matched against a YARA dominion (
rar_file
), a MIME type (application/x-rar
), together with a filename (any that halt with .rar
).scanners: 'ScanRar': - positive: flavors: - 'application/x-rar' - 'rar_file' filename: '\.rar$' priority: 5 options: limit: 1000
Below is a sample configuration that shows how RAR files tin live positively matched against a YARA dominion (
rar_file
) together with a MIME type (application/x-rar
), but precisely if they are non negatively matched against a filename (\.rar$
). This configuration would exertion ScanRar
to precisely live assigned to RAR files that practise non direct maintain the extension ".rar".scanners: 'ScanRar': - negative: filename: '\.rar$' positive: flavors: - 'application/x-rar' - 'rar_file' priority: 5 options: limit: 1000
Below is a sample configuration that shows how a unmarried scanner tin apply dissimilar options depending on the mapping.
scanners: 'ScanX509': - positive: flavors: - 'x509_der_file' priority: 5 options: type: 'der' - positive: flavors: - 'x509_pem_file' priority: 5 options: type: 'pem'
Python Logging Configuration (
strelka.py
)strelka.py
uses an ini file (etc/strelka/pylogging.ini
) to create out cluster-level statistics together with information output yesteryear the Python logger. By default, this configuration file volition log information to stdout together with disable logging for packages imported yesteryear scanners.DirStream Configuration (
strelka_dirstream.py
)Strelka's dirstream configuration file is stored inward
etc/dirstream/dirstream.yml
together with contains ii sub-sections: processes together with workers.The "processes" department controls the processes launched yesteryear the utility. The configuration options are:
- "shutdown_timeout": sum of fourth dimension (in seconds) that volition elapse before the utility forcibly kills shaver processes subsequently they direct maintain received a shutdown command (defaults to 10 seconds)
- "directory": directory that files are sent from (defaults to None)
- "source": application that writes files to the directory, used to command metadata parsing functionality (defaults to None)
- "meta_separator": unique string used to divide pieces of metadata inward a filename, used to parse metadata together with shipping it along with the file to the cluster (defaults to "S^E^P")
- "file_mtime_delta": delta (in seconds) that must overstep since a file was terminal modified before it is sent to the cluster (defaults to 5 seconds)
- "delete_files": boolean that determines if files should live deleted subsequently they are sent to the cluster (defaults to False)
- "broker": network address together with network port of the broker (defaults to "127.0.0.1:5558")
- "timeout": sum of fourth dimension (in seconds) to hold off for a file to live successfully sent to the broker (defaults to 10)
- "use_green": boolean that determines if PyZMQ greenish should live used (this tin increment surgery at the opportunity of message loss, defaults to True)
- "broker_public_key": place of the broker Curve world cardinal certificate (enables Curve encryption, must live used if the broker has Curve enabled)
- "client_secret_key": place of the customer Curve tube cardinal certificate (enables Curve encryption, must live used if the broker has Curve enabled)
etc/bro/extract-strelka.bro
together with includes variables that tin live redefined at Bro runtime. These variables are:- "mime_table": tabular array of strings (Bro
source
) mapped to a laid of strings (Bromime_type
) -- this variable defines which file MIME types Bro extracts together with is configurable based on the place Bro identified the file (e.g. extractapplication/x-dosexec
files from SMTP, but non SMB or FTP) - "filename_re": regex pattern that tin extract files based on Bro
filename
- "unknown_mime_source": laid of strings (Bro
source
) that determines if files of an unknown MIME type should live extracted based on the place Bro identified the file (e.g. extract unknown files from SMTP, but non SMB or FTP) - "meta_separator": string used inward extracted filenames to divide embedded Bro metadata -- this must fit the equivalent value inward
etc/dirstream/dirstream.yml
- "directory_count_interval": interval used to schedule how oftentimes the script checks the file count inward the extraction directory
- "directory_count_threshold": int that is used equally a trigger to temporarily disable file extraction if the file count inward the extraction directory reaches the threshold
Encryption together with Authentication
Strelka has built-in, optional encryption together with authentication for customer connections provided yesteryear CurveZMQ.
CurveZMQ
CurveZMQ (Curve) is ZMQ's encryption together with authentication protocol. Read to a greater extent than well-nigh it here.
Using Curve
Strelka uses Curve to encrypt together with authenticate connections betwixt clients together with brokers. By default, Strelka's Curve back upward is setup to enable encryption but non authentication.
To enable Curve encryption, the broker must live loaded with a private cardinal -- whatever clients connecting to the broker must direct maintain the broker's world cardinal to successfully connect.
To enable Curve encryption together with authentication, the broker must live loaded with a private cardinal together with a directory of customer world keys -- whatever clients connecting to the broker must direct maintain the broker's world cardinal together with direct maintain their customer cardinal loaded on the broker to successfully connect.
The
generate_curve_certificates.py
utility tin live used to create customer together with broker certificates.Clusters
The next are recommendations together with considerations to proceed inward psyche when deploying clusters.
General Recommendations
The next recommendations apply to all clusters:
- Do non run workers on the same server equally a broker
- This puts the wellness of the entire cluster at opportunity if the server becomes over-utilized
- Do non over-allocate workers to CPUs
- 1 worker per CPU
- Allocate at to the lowest degree 1GB RAM per worker
- If workers practise non direct maintain plenty RAM, together with then in that place volition live excessive retentiveness errors
- Big files (especially compressed files) require to a greater extent than RAM
- In large clusters, diminishing returns laid out inward a higher house 4GB RAM per worker
- Allocate equally much RAM equally reasonable to the broker
- ZMQ messages are stored solely inward retentiveness -- inward large deployments with many clients, the broker may utilization a lot of RAM if the workers cannot proceed upward with the number of file tasks
Sizing Considerations
Multiple variables should live considered when determining the appropriate size for a cluster:
- Number of file requests per second
- Type of file requests
- Remote file requests direct maintain longer to procedure than direct file requests
- Diversity of files requested
- Binary files direct maintain longer to scan than text files
- Number of YARA rules deployed
- Scanning a file with 50,000 rules takes longer than scanning a file with fifty rules
Docker Considerations
Below is a listing of considerations to proceed inward psyche when running a cluster with Docker containers:
- Share volumes, non files, with the container
- Strelka's workers volition read configuration files together with YARA rules files when they startup -- sharing volumes with the container ensures that updated copies of these files on the localhost are reflected accurately within the container without needing to restart the container
- Increase stop-timeout
- By default, Docker volition forcibly kill a container if it has non stopped subsequently 10 seconds -- this value should live increased to greater than the
shutdown_timeout
value inwardetc/strelka/strelka.yml
- By default, Docker volition forcibly kill a container if it has non stopped subsequently 10 seconds -- this value should live increased to greater than the
- Increase shm-size
- By default, Docker limits a container's shm size to 64MB -- this tin exertion errors with Strelka scanners that utilize
tempfile
- By default, Docker limits a container's shm size to 64MB -- this tin exertion errors with Strelka scanners that utilize
- Set logging options
- By default, Docker has no log bound for logs output yesteryear a container
Management
Due to its distributed design, nosotros recommend using container orchestration (e.g. Kubernetes) or configuration management/provisioning (e.g. Ansible, SaltStack, etc.) systems for managing clusters.
Thus the article Strelka - Scanning Files At Scale Amongst Python Together With Zeromq
That's all the article Strelka - Scanning Files At Scale Amongst Python Together With Zeromq this time, hopefully can benefit you all. okay, see you in another article posting.
You are now reading the article Strelka - Scanning Files At Scale Amongst Python Together With Zeromq with the link address https://mederc.blogspot.com/2019/09/strelka-scanning-files-at-scale-amongst.html