Data Sets

A brief overview over the provided data sets.

Data is provided in CSV format. There are 2 files, one containing the measurement data and the other one containing information about the sensors. We will release measurement data sets along with the expected results for each category soon. The data which will be used for evaluation will be provided on a USB dongle at the competition in Montreal on April 15. It will be of the same format as the training data.

The data sets will be made available in our Downloads section.

Data Reference / File Format

Overview and description of all available data fields.

Sensor Meta Data File Format

The header line of the sensor meta data files is as follows:
serial,latitude,longitude,height,type

  • serial: the so called "serial number" of each sensor is a globally unique identifier for each sensor. It can be used to join sensor meta information with measurement data. It is a signed integer and can therefore be negative.
  • latitude/longitude/height: the three-dimensional location of the receiver. Latitude and longitude are in decimal degrees (WGS84), height is in meters. Note that this information is of varying/unknown accuracy since receiver locations are reported to the OpenSky Network in different ways. For sensors of type dump1090, these location are entered by the users themselves. Most users use services such as Google Maps or their smart phones to determine the location of their antenna. This process is prone to errors, but errors should usually be in the range of meters. Other users report wrong locations intentionally to masquerade their exact location for privacy reasons. In this case, locations might be off by several kilometers. For sensors of types Radarcape or GRX1090 the locations should generally be more accurate since they are automatically reported by the integrated GPS receivers and the GPS antenna is usually located close to the ADS-B antenna.
  • type: the type of the receiver setup. The OpenSky Network currently supports sensors based on the open-source Mode S software decoder dump1090 and the commercial off-the-shelf devices Radarcape, SBS-3, and GRX1090. The device type also determines the measurement noise distribution and timestamp accuracy. However, note that these things also depend on many other factors (e.g., temperature, inside/outdoor deployment, ...).

Example data: 6,51.759014,-1.256688,47.337597,Radarcape
10,48.266,10.052,510,dump1090

Measurement Data File Format

The header line of the measurement data files is as follows:
id,timeAtServer,aircraft,latitude,longitude,baroAltitude,geoAltitude,numMeasurements,measurements

  • id: a unique identifier for each received transponder signal. This identifier will be used to join the result file (output of your localization algorithm) with the measurement data.
  • timeAtServer: a timestamp (double) in seconds indicating the time when the information was received by OpenSky's ingestion server. In each 1h data set, this timestamp starts with 0 and counts up to 3600 seconds. This timestamp has roughly millisecond accuracy. Note that this timestamp was determined after propagation delay, processing delay, and Internet delay.
  • aircraft: an aircraft identifier which is unique for each aircraft within each data set. This identifier allows grouping the measurements that belong to signals from the same transponder. Note that aircraft 123 in data set X is not the same transponder as aircraft 123 in data set Y. This means that data sets cannot be combined.
  • latitude/longitude/geoAltitude: the location that was reported by the transmitting aircraft. As in the sensor meta data, latitude and longitude are provided in decimal degrees (WGS84), geoAltitude in meters. The accuracy of these location information is generally unknown. However, most aircraft should report their locations at a decent accuracy (10s of meters accuracy). It is worth noting here that there are some aircraft sending location information derived from their intertial system rather that their GPS sensor. In this case, location information is subject to drifting and might be wrong by several hundred meters. However, this case is rather rare but should nevertheless be considered when cleansing the data.
  • baroAltitude: the barometric altitude reported by the aircraft in meters. Note that the barometric altitude is weather dependent and might differ from the geometric altitude by several hundred meters. However, the difference might be learned from "known" aircraft and this information could then be used to estimate the geometric altitude or estimate bounds.
  • numMeasurements: the number of measurements, i.e., the number of sensors which received this particular transponder signal.
  • measurements: the measurement data for each receiver which received this particular transponder signal. The measurement data is provided as a JSON array string containing triples. Each triple contains the serial number of the sensor which reported the measurement data, the timestamp for the time of arrival of the signal at the receiver (in nanoseconds), and the received signal strength indicator (RSSI). The exact definition of the RSSI depends on the type of receiver but is usually provided in dB. The properties of the nanosecond timestamp also strongly depend of the type of receiver. For dump1090 receivers, these timestamps are generally unsynchronized and typically have a 12MHz resolution. Unsynchronized timestamp means in particular that these timestamps are subject to (sometimes heavy) drifting. For Radarcape and GRX1090, these timestamps are usually GPS synchronized and have a resolution of about 40-60MHz. GPS synchronized means that they are constantly resynchronized to compensate for clock drift.

Example data: 1600370,1162.80599999428,35,50.3303708868512,4.58204114759291,7879.08,8191.5,3,"[[436,1163789621843,96],[394,1163789735531,62],[275,1202505203916.67,180]]"

Additional Notes

  • For some aircraft, the fields latitude, longitude, and geoAltitude are empty. These aircraft positions should be determined by your localization algorithm. The fact that most aircraft have positions in the data sets reflects reality, where most aircraft transmit their location via ADS-B and only a few are missing this capabily. The idea for including the position data of other aircraft into the data sets is that competitors have additional data recorded over the same period of time with known transmitter locations. These data can be used to, e.g., synchronize clocks or learn the difference between geometric and barometric altitudes.

Additional Material

Tutorials & Examples

Here are some links you might find useful or inspirational:

Let's get started!

A brief introduction on how to use and analyze the data.

The following quick guide is an example for how to load, process, and visualize the data. The example is done in the GNU R language.

If the guide is not shown properly, click here.