IP Bulk Reporter

Instead of reporting IPs individually, you may compile a CSV of reports. This helps reduce bandwidth on both sides. Note: The abuse confidence score of a IP reported this way is not immediately calculated.

The CSV file must be under 2 MB and less than or equal to 10,000 lines, including the headings.

CSV Format Headings required, any order.

  • IP — A valid IPv4 or IPv6 IP address.
  • Categories — At least one category ID. Comma separated for multiple categories. See: Report Categories
  • ReportDate — Date and time of the attack or earliest observance of attack. Any format that strtotime() can process is permitted. However, we strongly recommend a timezoned format such ISO 8601 e.g. 2017-09-08T10:00:37-04:00. A time lacking a timezone is assumed to be in UTC.
  • Comment — A description of the attack. Truncated after 1,024 characters (bytes).

Here is a sample of a valid bulk IP file:

IP,Categories,ReportDate,Comment
89.205.125.160,"18,22",2018-12-18T10:00:37-04:00,"Failed password for invalid user odoo from 89.205.125.160 port 39121 ssh2"
123.183.209.136,"18,22",2018-12-18T11:25:11-04:00,"Did not receive identification string from 123.183.209.136 port 57192"
197.156.104.113,"14,15,18,11,10,21",2018-12-18T16:10:58+04:00,"[SMB remote code execution attempt: port tcp/445]
in blocklist.de:'listed [pop3]'
in SpamCop:'listed'
in sorbs:'listed [web], [spam]'
in Unsubscore:'listed'
*(RWIN=8192)(04:10)"

The Comment must be enclosed in double quotes (") to include commas (,) and new line separators (\n, \r, \r\n). A blacklash (\) is needed to escape the enclosure. It does not escape itself.

Categories must be enclosed if there is more than one.

Every data field must be filled with a valid value.

Max Filesize: 2 MB

Please abide by our reporting policy. Must be signed-in to use bulk reporter.

Collecting the Data

Attacks can easily be harvested from /var/log/secure/ (RedHat/CentOS) or /var/log/auth.log (Debian/Ubuntu). Take a gander at a sample python script we provide. Run the script with your log file as the input and it will generate a submittable CSV file.

e.g.

$ ./parse_logs.py secure.log > reports.csv && curl https://api.abuseipdb.com/api/v2/bulk-report -F [email protected] -H "Key: YOUR_KEY" > output.json

Note: You'll need the pytz module installed. You can install it with pip3 install pytz

If successful, the JSON response lists which reports were accepted and which were rejected. Pipe the output into jq if you'd like to peruse the response.

$ jq . output.json

parse_logs.py

#!/usr/bin/env python3

import os
import sys
import argparse
import re
import csv
from datetime import datetime, timezone
import pytz

def main(arguments):

    parser = argparse.ArgumentParser(
        description=__doc__,
        formatter_class=argparse.RawDescriptionHelpFormatter)
    parser.add_argument('infile', help="Input file", type=argparse.FileType('r'))
    parser.add_argument('-o', '--outfile', help="Output file",
                        default=sys.stdout, type=argparse.FileType('w'))

    args = parser.parse_args(arguments)

    # Define field names.
    fieldnames = ['IP', 'Categories', 'Comment', 'ReportDate']
    # Begin CSV output.
    writer = csv.DictWriter(args.outfile, fieldnames=fieldnames)
    writer.writeheader()

     # Initialize empty list to hold addresses
    ipv4_addresses = list()

    for line in args.infile:
        # !! Match this format to your system's format.
        timestamp = "([a-zA-Z]+\s+[0-9]+ [0-9]+:[0-9]+:[0-9]+)"
        ipv4 = "([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})"
        comment = "(Invalid user [a-zA-Z0-9]+ from " + ipv4 + " port [0-9]+)"

        # The regex of the line we're looking for, built up from component regexps.
        combined_re = timestamp + " .* " + comment

        # Run the regexp.
        matches = re.findall(combined_re, line)
        # If this line is in the format we're looking for,
        if matches:
            # Pull the tuple out of the list.
            matches_flat = matches[0]

            # Remove duplicate addresses from the report.
            if matches_flat[2] not in ipv4_addresses:
                ipv4_addresses.append(matches_flat[2])
            else:
                continue

            ### !!! You may need to update this. ###
            # Parse log datetime to Python datetime object so we can update the timezone.
            # The format string should must your log files. Here we use the default in Debian/Redhat distros.
            attack_datetime = datetime.strptime(matches_flat[0], '%b %d %H:%M:%S')
            # Assume year is the current year.
            attack_datetime = attack_datetime.replace(datetime.now().year)
            # !! Set tzinfo to your system timezone using timezone.
            my_tz = pytz.timezone('America/New_York')
            attack_datetime = attack_datetime.replace(tzinfo=my_tz)

            # Format to ISO 8601 to make it universal and portable.
            attack_datetime_iso = attack_datetime.isoformat()

            # We'll add the categories column statically at this step.
            # Output as a CSV row.
            writer.writerow({
                'IP': matches_flat[2],
                'Categories': "18,22",
                'Comment': matches_flat[1],
                'ReportDate': attack_datetime_iso
            })

if __name__ == '__main__':
    sys.exit(main(sys.argv[1:]))