Random data sampling

To specify random data sampling, set the Sampling type option on the Editor Options (2 of 8) panel to 3. Random sampling.

These points describe the behavior of the FM/Db2 editor when using random data sampling:
  • Data sampling applies to browse, view, and edit.
  • When using data sampling, the editor always loads all sampled rows into memory. Therefore large table support is NOT available when sampling data.
  • The options that are pertinent to random sampling are:
    • Row count on the function entry panel.
    • Start position on the function entry panel.
    • Sampling limit on the Editor Options (2 of 8) panel.
    • Sampling frequency on the Editor Options (2 of 8) panel.
    • Sampling seed on the Editor Options (2 of 8) panel.
Behavior of FM/Db2 editor for options related to random data sampling describes the behavior of the FM/Db2 editor for various options related to random data sampling.
Table 1. Behavior of FM/Db2 editor for options related to random data sampling

This table has six columns, except for the final row, "Note:", which spans all six columns.

Row count Start position Sampling limit Sampling frequency Sample seed Behavior1
0 1 0 0.fff 0 Rows are fetched, starting at row 1 and continuing until the end of the result table.
0 1 bbb 0.fff 0 Rows are fetched, starting at row 1 and continuing until the end of the result table, or until bbb rows have been added to the editor.
rrr 1 0 0.fff 0 rrr rows are fetched, starting at the first row. Some subset (approximately 0.fff x rrr) of these rows is added to the sample data set.
rrr 1 bbb 0.fff 0 At most rrr rows are fetched, starting at the first row. Some subset (approximately 0.fff x rrr) of these rows, but at most bbb rows, is added to the sample data set.
0 sss 0 0.fff 0 Rows are sampled, starting at row sss, and continuing until the end of the result table.
0 sss bbb 0.fff 0 Rows are fetched, starting at row sss, and continuing until the end of the result table or until bbb rows have been added to the editor.
rrr sss 0 0.fff 0 rrr rows are fetched, starting at row sss. Some subset (approximately 0.fff x rrr) of these rows is added to the sample data set.
rrr sss bbb 0.fff 0 At most rrr rows are fetched, starting at row sss. Some subset (approximately 0.fff x rrr) of these rows, but at most bbb rows, is added to the sample data set.
0 1 0 0.fff 0 Rows are fetched, starting at row 1 and continuing until the end of the result table. The random number generator starts with seed yyy.
0 1 bbb 0.fff 0 Rows are fetched, starting at row 1 and continuing until the end of the result table, or until bbb rows have been added to the editor. The random number generator starts with seed yyy.
rrr 1 0 0.fff 0 rrr rows are fetched, starting at the first row. Some subset (approximately 0.fff x rrr) of these rows is added to the sample data set. The random number generator starts with seed yyy.
rrr 1 bbb 0.fff 0 At most rrr rows are fetched, starting at the first row. Some subset (approximately 0.fff x rrr) of these rows, but at most bbb rows, is added to the sample data set. The random number generator starts with seed yyy.
0 sss 0 0.fff 0 Rows are sampled, starting at row sss and continuing until the end of the result table. The random number generator starts with seed yyy.
0 sss bbb 0.fff 0 Rows are fetched, starting at row sss and continuing until the end of the result table, or until bbb rows have been added to the editor. The random number generator starts with seed yyy.
rrr sss 0 0.fff 0 rrr rows are fetched, starting at row sss. Some subset (approximately 0.fff x rrr) of these rows is added to the sample data set. The random number generator starts with seed yyy.
rrr sss bbb 0.fff 0 At most rrr rows are fetched, starting at row sss. Some subset (approximately 0.fff x rrr) of these rows, but at most bbb rows, is added to the sample data set. The random number generator starts with seed yyy.
Note:
  1. Sampling continues until one of the following conditions is met:
    • Any non-zero Sampling limit is reached.
    • Any non-zero Row count (fetch) limit is reached.
    • The end of the result table is reached.

    The Sampling limit sets an upper bound on the number of rows loaded into the editor. This is the number of rows in an editor session. To reach this many rows, approximately 1/(Sampling frequency x Sampling limit) rows must be fetched. By contrast, the Row count limit sets an upper bound on the number of rows that are fetched from the object. The number of rows that are sampled is approximately Row count x Sampling frequency.

    With "small" sampling frequencies, specifying a low Row count limit may result in no rows being sampled. For example, with a Row count limit of 60, a Sampling count of 2000, and a Sampling frequency of 0.01, there is a reasonable chance that no rows are sampled.

  2. A Sampling frequency of 0.fff results in, on average and for large numbers of rows sampled, 0.fff x 100% of rows being sampled. Therefore a frequency of 0.1 results in 10% of rows being sampled. For small frequencies, many rows need to be processed to find each matching row. For any particular random sample of data there is no guarantee that the number of rows in the sample will exactly reflect the sampling frequency.

    When a Sampling seed value of 0 is specified, FM/Db2 uses the fraction part of the second's value derived from the current system clock value to initiate the random number generator. This value is accurate to the microsecond; therefore each seed has a value in the range 0–999999 inclusive. It is unlikely that two random samples generated with a sampling seed of 0 will be identical.

    When the same user-specified Sampling seed is used for the same table and other conditions, the data samples produced will be identical.

Related references