IP Bulk Reporter
Instead of reporting IPs individually, you may compile a CSV of reports. This helps reduce bandwidth on both sides. Note: The abuse confidence score of a IP reported this way is not immediately calculated.
The CSV file must be under 2 MB and less than or equal to 10,000 lines, including the headings.
CSV Format Headings required, any order.
IP
— A valid IPv4 or IPv6 IP address.Categories
— At least one category ID. Comma separated for multiple categories. See: Report CategoriesReportDate
— Date and time of the attack or earliest observance of attack. Any format thatstrtotime()
can process is permitted. However, we strongly recommend a timezoned format such ISO 8601 e.g. 2017-09-08T10:00:37-04:00. A time lacking a timezone is assumed to be in UTC.Comment
— A description of the attack. Truncated after 1,024 characters (bytes).
Here is a sample of a valid bulk IP file:
IP,Categories,ReportDate,Comment 89.205.125.160,"18,22",2018-12-18T10:00:37-04:00,"Failed password for invalid user odoo from 89.205.125.160 port 39121 ssh2" 123.183.209.136,"18,22",2018-12-18T11:25:11-04:00,"Did not receive identification string from 123.183.209.136 port 57192" 197.156.104.113,"14,15,18,11,10,21",2018-12-18T16:10:58+04:00,"[SMB remote code execution attempt: port tcp/445] in blocklist.de:'listed [pop3]' in SpamCop:'listed' in sorbs:'listed [web], [spam]' in Unsubscore:'listed' *(RWIN=8192)(04:10)"
The Comment
must be enclosed in double quotes (") to include commas (,) and new line separators (\n, \r, \r\n). A blacklash (\) is needed to escape the enclosure. It does not escape itself.
Categories
must be enclosed if there is more than one.
Every data field must be filled with a valid value.
Collecting the Data
Attacks can easily be harvested from /var/log/secure/ (RedHat/CentOS) or /var/log/auth.log (Debian/Ubuntu). Take a gander at a sample python script we provide. Run the script with your log file as the input and it will generate a submittable CSV file.
e.g.
$ ./parse_logs.py secure.log > reports.csv && curl https://api.abuseipdb.com/api/v2/bulk-report -F [email protected] -H "Key: YOUR_KEY" > output.json
Note: You'll need the pytz module installed. You can install it with pip3 install pytz
If successful, the JSON response lists which reports were accepted and which were rejected. Pipe the output into jq if you'd like to peruse the response.
$ jq . output.json
parse_logs.py
#!/usr/bin/env python3
import os
import sys
import argparse
import re
import csv
from datetime import datetime, timezone
import pytz
def main(arguments):
parser = argparse.ArgumentParser(
description=__doc__,
formatter_class=argparse.RawDescriptionHelpFormatter)
parser.add_argument('infile', help="Input file", type=argparse.FileType('r'))
parser.add_argument('-o', '--outfile', help="Output file",
default=sys.stdout, type=argparse.FileType('w'))
args = parser.parse_args(arguments)
# Define field names.
fieldnames = ['IP', 'Categories', 'Comment', 'ReportDate']
# Begin CSV output.
writer = csv.DictWriter(args.outfile, fieldnames=fieldnames)
writer.writeheader()
# Initialize empty list to hold addresses
ipv4_addresses = list()
for line in args.infile:
# !! Match this format to your system's format.
timestamp = "([a-zA-Z]+\s+[0-9]+ [0-9]+:[0-9]+:[0-9]+)"
ipv4 = "([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})"
comment = "(Invalid user [a-zA-Z0-9]+ from " + ipv4 + " port [0-9]+)"
# The regex of the line we're looking for, built up from component regexps.
combined_re = timestamp + " .* " + comment
# Run the regexp.
matches = re.findall(combined_re, line)
# If this line is in the format we're looking for,
if matches:
# Pull the tuple out of the list.
matches_flat = matches[0]
# Remove duplicate addresses from the report.
if matches_flat[2] not in ipv4_addresses:
ipv4_addresses.append(matches_flat[2])
else:
continue
### !!! You may need to update this. ###
# Parse log datetime to Python datetime object so we can update the timezone.
# The format string should must your log files. Here we use the default in Debian/Redhat distros.
attack_datetime = datetime.strptime(matches_flat[0], '%b %d %H:%M:%S')
# Assume year is the current year.
attack_datetime = attack_datetime.replace(datetime.now().year)
# !! Set tzinfo to your system timezone using timezone.
my_tz = pytz.timezone('America/New_York')
attack_datetime = attack_datetime.replace(tzinfo=my_tz)
# Format to ISO 8601 to make it universal and portable.
attack_datetime_iso = attack_datetime.isoformat()
# We'll add the categories column statically at this step.
# Output as a CSV row.
writer.writerow({
'IP': matches_flat[2],
'Categories': "18,22",
'Comment': matches_flat[1],
'ReportDate': attack_datetime_iso
})
if __name__ == '__main__':
sys.exit(main(sys.argv[1:]))