This is a quick introduction to the OpenSky data set for the aircraft localization competition. The goal of the competition is to find the locations of some aircraft based on time-difference of arrival and signal strength measurements provided by a network of sensors on the ground.

The Data Set

Measurements

Let’s begin with the data set.

The measurements CSV files are structured like this:

id timeAtServer aircraft latitude longitude baroAltitude geoAltitude numMeasurements measurements
1439066 929.714 179 46.55903 14.060371 11277.60 11506.20 2 [[114,930672432312,116],[131,930672970687,74]]
2446659 1560.153 1640 42.95801 4.661205 10660.38 10904.22 2 [[132,1561125919656,56],[432,1566991243166.67,65]]

Column descriptions

  • id: a unique ID for each transponder transmission. This ID can be used to refer to specific measurements in the results file.
  • timeAtServer: timestamp denoting the time when the measurement arrived at OpenSky’s server. Unit is seconds and it starts at timeAtServer=0 in each data set.
  • aircraft: a randomized ID of the aircraft which sent the position report.
  • latitude: latitude reported by the aircraft in decimal degrees. This column is null for those positions which should be determined by the localization algorithm.
  • longitude: longitude reported by the aircraft in decimal degrees. This column is null for those positions which should be determined by the localization algorithm.
  • baroAltitude: barometric altitude reported by the aircraft in meters.
  • geoAltitude: geometric (GPS) height reported by the aircraft in meters. This column is null for those positions which should be determined by the localization algorithm.
  • numMeasurements: redundant field indicating the number of sensors which recorded the position report.
  • measurements: JSON array of triples [sensorID, timestamp, signalstrength].
    • serial: unique sensor ID which can be matched with the sensor information table (sensors data table below).
    • timestamp: precise timestamp for the detection of the position report at the sensor in nanoseconds.
    • signalstrength: indicator of the strength of the report’s signal at the sensor (often in dB).

Sensor information

You will also need additional information about the sensors during the competition.

How is the sensor information structured?

serial latitude longitude height type
178 58.24373 -6.351554 20 dump1090
417 47.37750 8.236600 417 dump1090

Column descriptions

  • serial: unique sensor ID which can be used to join the sensor information with the measurements data.
  • latitude: latitude of the sensor in decimal degrees. It has been reported either by the sensor hardware or manually by the sensor operator.
  • longitude: longitude of the sensor in decimal degrees. It has been reported either by the sensor hardware or manually by the sensor operator.
  • height: height of the sensor in meters. Is has been reported either by the sensor hardware or manually by the sensor operator.
  • type: type of the sensor hardware that was used to record the measurements.

Example Flight

As mentioned above, you will have to look at the timestamps (or rather the differences of timestamps) and the signal strength measurements to find the locations of the unknown aircraft. So let’s have a look at the measurement data for the specific flight of aircraft number 6 to see how it all fits together.

I will start by plotting a map with the flight an the sensor positions.

Interesting! So apparently this flight was over Germany at the beginning of the data set and was just about to land at Manchester airport when the recording stopped an hour later. The red dots are the positions of the sensors that are located nearby the flight.

So how does the measurement data of this flight actually look like?

Timestamp measurements

It’s not very surprising that the nanosecond timestamp counts from about 0 to about 3600e9 over a duration of 3600 seconds. However, this plot already shows some of the problems we’ll face during the competition: the timestamps of some sensors are broken (here: green)! Maybe the difference between timestamps of different sensors tells us something about the location of the aircraft?

Interesting, there are patterns! The sensor with the scrambled timestamps leads to those two branches leading away from the abscissa when combined with the timestamps of other sensors. Let’s have a look at the timestamp differences of a “good” pair of sensors:

That looks much more interesting! Something seems to happen here. Spoiler alert: the time difference correlates with the distance difference! Let’s check this.

Huh, what a coincidence! So in the beginning, the aircraft was slightly closer to sensor 168. At about t=2750 seconds, the aircraft turned towards sensor 191 and distance to sensor 168 exceeded that to sensor 191. As a result, the difference became positive. How does this look like on a map?

Signal strength measurements

The signal strength values of all sensors which recorded position reports of aircraft number 6 are:

Nice, modern art! And there are clear patterns visible. Spoiler alert: the signal strength correlates with the distance (and many other things). Let’s have a look at the signal strength measurements from one of the sensors of our favorit pair:

Much noiser but the correlation is clearly visible!

Conclusion

So it seems that the measurement data can be used to infer information on the aircraft positions. But beware: the above example was cherry picking! There is much noise in the data, people might lie about the exact locations of their sensors, most clocks will drift, and many other problems are looming on the horizon. You might have to ask the other aircraft for help!

Can you find the missing aircraft positions?


Appendix (Helper Functions)

Here’s the code of some of the little helpers I used above.