Entropy-based noise reduction in hyperspectral images

Gorte Ben; Aria Enayat Hosseini; Menenti Massimo

УДК 528.8

УМЕНЬШЕНИЕ ДЕЙСТВИЯ ШУМОВ В ГИПЕРСПЕКТРАЛЬНЫХ ИЗОБРАЖЕНИЯХ С УЧЕТОМ ЭНТРОПИИ

Бен Горте

Дельфтский технический университет, факультет гражданского строительства и геонаук, кафедра геонаук и дистанционного зондирования, Нидерланды, 2600 GA Дельфт, ул. Сте-винвег, 1, п/я 5048, доктор наук, доцент, тел. +31 15 2781373, e-mail: [email protected]

Энаят Хоссейни Ариа

Дельфтский технический университет, факультет гражданского строительства и геонаук, кафедра геонаук и дистанционного зондирования, Нидерланды, 2600 GA Дельфт, ул. Сте-винвег, 1, п/я 5048, аспирант, тел. +31 15 2781373, e-mail: [email protected]

Массимо Мененти

Дельфтский технический университет, факультет гражданского строительства и геонаук, кафедра геонаук и дистанционного зондирования, Нидерланды, 2600 GA Дельфт, ул. Стевинвег, 1, п/я 5048, доктор наук, профессор, тел./факс: +31 15 2784244/3711, e-mail: [email protected]

Целью понижения размерности гиперспектральных изображений является сокращение количества спектральных каналов посредством их выбора или комбинации без значительной потери визуальной информации. В статье анализируется использование энтропии в качестве меры объема информации и связанные с этим проблемы в высокоразмерных пространствах признаков. Исследуется поиск ближайшего соседнего элемента с помощью адаптированного критерия для объемов информации соседних элементов. Из этого следует, что мы имеем дело с прямым методом уменьшения действия шумов изображения в спектральной области, что приводит также к уменьшению пространственного шума, но при этом сохраняет пространственные детали и четкость изображения.

Ключевые слова: гиперспектральное изображение, энтропия, уменьшение размерности, поиск ближайшего соседнего элемента, уменьшение шума.

ENTROPY-BASED NOISE REDUCTION IN HYPERSPECTRAL IMAGES

Ben Gorte

Delft University of Technology, Faculty of Civil Engineering and Geoscience, Department of Geo-science and Remote Sensing, the Netherlands, 2600 GA Delft, Stevinweg 1, P. O. Box 5048, Dr. ir., Assistant Professor in Optical and Laser Remote Sensing, tel. +31 15 2781373, e-mail: [email protected]

Enayat Hosseini Aria

Delft University of Technology, Faculty of Civil Engineering and Geoscience, Department of Geo-science and Remote Sensing, the Netherlands, 2600 GA Delft, Stevinweg 1, P. O. Box 5048, Ph. D. candidate, tel. +31 15 2781373, e-mail: [email protected]

Massimo Menenti

Delft University of Technology, Faculty of Civil Engineering and Geoscience, Department of Geo-science and Remote Sensing, the Netherlands, 2600 GA Delft, Stevinweg 1, P. O. Box 5048,

Dr. Professor in Optical and Laser Remote Sensing, tel./fax: +31 15 278 4244/3711, e-mail: [email protected]

The goal of dimensionality reduction of hyperspectral images is to reduce the number of spectral channels, either by selection or by combination, without losing too much of the image information. The paper investigates the use of entropy as a measure of information content, and addresses the associated difficulties in high-dimensional feature spaces. A nearest neighbor solution is studied with an adapted measure for neighborhood volumes. It appears this also provides a straightforward method to reduce image noise in the spectral domain, which also provides spatial noise reduction, while preserving spatial details and image sharpness.

Key words: hyperspectral imagery, entropy, dimensionality reduction, neighbor search, noise reduction.

INTRODUCTION

Hyperspectral images are an important class of remote sensing data products. They are recorded by instruments called imaging spectrometers, where the imaging characteristic relates to the coverage of an object (for example a region on the earth surface) with a two-dimensional set of measurements. Each measurement produces a one-dimensional series of data, being radiances emitted or reflected from that region at different spectral wavelengths (hence: spectrometer). This gives three-dimensional data volumes, with two spatial dimensions corresponding to terrain coordinates in the scene, and on spectral dimension corresponding to wavelengths in the electromagnetic spectrum.

Along all dimensions the data are sampled: the spatial resp. spectral distances (resolutions) between successive samples lead to pixels resp. spectral channels in the data. In remote sensing, using airborne and space borne instruments, spatial resolutions may be anywhere between a few centimeters and hundreds of meters. Spectral resolutions are somewhat more uniform: the spectral ranges considered are usually in the visible (400-700 um), near infra-red (700-1200um) and/or short-wave infrared (1200-2500) parts of the spectrum with channels of (typically) 5 - 50 um wide.

Several satellite missions are equipped with imaging spectrometers, such as MODIS (36 spectral channels), CHRIS (18 spectral channels) and HYPERION (220 spectral channels). Among airborne imaging spectrometers AVIRIS is a well-known system (224 spectral channels), which is used in the examples given in this paper.

Figure 1: Five spectra at arbitrary pixels in a hyperspectral image with 185 spectral channels between 400 and 2500 nano-meters.

The set of measurements in the range of spectral channels, representing the spectrum that is measured at one spot in the scene and is stored at a pixel of the image, is called the feature vector of that pixel. The vector space is the M-dimensional feature space (with M the number of spectral channels). Each point in the feature space is a (measured) spectrum, which may be also represented as a spectral curve (Figure 1). When points in the feature space are near (perhaps in one cluster), the corresponding spectral curves look the same.

The noise reduction method presented in this paper fits in an effort to reduce the dimensionality of hyperspectral image data. The goal of such reduction may be, for example, to reduce the data volume, or to make the data suitable for certain types of statistical analysis, including classification, where high dimensionality could get problematic (e.g. during the estimation of distribution parameters). In this context the notion of entropy, as a a measure of information content, plays an important role. The ^-nearest-neighbor method of estimating feature vector probabilities in high-dimensional feature spaces provides an elegant method for noise reduction, which at the same time reduces entropy and provides a measure of information loss during dimensionality reduction.

DIMENSONALITY REDUCTION

A general goal of dimensionality reduction is to minimize data volume while maintaining information content. This will only be successful if there is a degree of redundancy in the input data. It requires relevant data to be separated from redundant data, after which the latter may be omitted.

A straightforward method for dimensionality reduction is channel selection; where as many redundant channels are removed as possible, under suitable criteria for 'redundant' and for 'as possible'. Other dimensionality reduction schemes use transformations (rotations) in the feature space, the most common method being Principal Component Transform.

As an alternative to channel selection, band formation is considered [Hosseini Aria et al., 2012] in our study. From a large number of narrow channels (each occupying a small range of spectral wave-lengths) a smaller number of wider bands is formed, by averaging the values at every pixel. Assuming that each narrow channel is being used for exactly one wide band, the problem can be formulated as finding the wavelengths at which the boundaries of the new bands are placed, and method can be called Spectral region splitting. The success in terms of dimensionality reduction vs. information loss depends on the number of new bands and their locations in the spectrum. The challenge is to find optimal locations with a given number of bands, or to find the minimum number of bands, and their locations, with a given value for certain criteria. As to the criteria for a set being optimal we consider:

• Representation accuracy measures how similar the original (narrow) channels are to the averages used in the bands.

• Class separability addresses whether dimensionality reduction reduces the capability to separate classes from each other in supervised land use/land cover classification.

• Information content: By quantifying the amount of information in a dataset, one can search for a configuration of spectral bands where the loss of information is minimal.

In all cases a single set of spectral bands is assumed to be used for an entire image, and one of the concerns is how (the reduction in) representation accuracy, class separability and/or information content is spatially distributed over the image. For images recorded at different types of scenes (for example urban, agricultural, forested, natural, marine, etc.) different sets of spectral bands may be optimal. Therefore we consider dimensionality reduction by spectral region splitting to be part of the analysis of an image.

ENTROPY

The remainder of the paper is restricted to information content measurement, notably to entropy. Entropy is a notion from information and communication science. It attempts to express the amount of information contained in a particular message (among all the possible messages in a given context) as the (theoretical) minimum number of data elements needed to precisely transmit that message from a sender to a receiver1. In remote sensing terms this translates into the minimum data volume needed to represent (store, transmit) the information in an image. In provides a theoretical minimum for the data size to which images could be reduced by a ideal lossless image compression method. Obviously, the information content of an image depends on the image size, and therefore it is customary to normalize entropy into bits per pixel.

Entropy is a powerful, yet quite simple, concept. Considering that information can be subdivided in pieces called packets, a set Q exists containing all possible packets p. Associated with each packet is the probability P(p) that an arbitrary piece of an arbitrary, but valid, message is that packet. The sum LQ P(p) over all p in Q therefore equals 1.

The amount of information in a packet, measured as the data size within a message needed to encode that packet efficiently, equals minus the logarithm of its probability. One million packets, all being equally probable with P(p) = 1/1000000, can be encoded in 6 digits from 000000 to 999999, and 10log 1/1000000 = - 6. When probabilities differ, it is efficient to use short strings for frequent packets, at the expense of having to use longer strings for rarer ones. The length of each packet depends on its probability: the number of decimal digits needed to represent each p e Q most efficiently equals 10log P(p). Instead of using decimal digits, we can express message lengths in bits by using 2log P(p) instead of 10log P(p). The notion of entropy agrees with the intuition that common, predictable pieces of information are those with high probabilities, and that these are less informative than unusual, surprising ones.

When the number of samples (of packets) in a dataset D is much larger than the total number of possible samples Q, the probabilities P(p) for all p in Q can be estimated from the dataset by counting the frequencies of occurrence. If certain packets p e Q with very low probabilities do not occur in D at all, they will get an estimated probability of P(p) = 0, which is close enough to the truth. The entropy H can now be computed over the entire space Q as the weighted sum of logarithms of probabilities, where those probabilities are also used as weight factors.

H(Q) = X p e q P(p) . 2log P(p) (1)

1 They know and understand each other very well. They are allowed to design a strategy, and to make any agreement

on how to communicate efficiently. They are not supposed to reduce the set of possible messages, however, and if a

strategy is tailored to a message, the strategy has to be included in the message.

When the probabilities are already known beforehand, the entropy in a dataset D can be computed as the sum of logarithms of each packet in D. Now weighting is not necessary, since packets with higher probability are occurring more often. It is necessary, however, to divide by the sample size #D.

H(D) = (X p e d 2log P(p)) / #D (2)

ESTIMATING ENTROPY IN HYPERSPECTRAL IMAGES

A pixel in a hyperspectral image has a feature vector containing a number of measurements, each having a value from a certain range. The range defines the radiometric resolution and translates into a bit count. The sum of bit counts over the channels provides an upper bound for the number of bits required to store the information in a pixel. This number of bits would indeed be needed if any combination of spectral values randomly occurred in the image. In reality, given the small widths of spectral channels in hyperspectral sensors, spectral curves are usually smooth, or at least piecewise smooth (having only a few jumps) - this greatly reduces the number of possible feature vectors, and makes spectral region splitting a promising effort in the first place. Furthermore, spectral curves are consequences of physical properties (absorption, reflection, scattering) of materials at and above the earth surface, and many of the spectra that would form valid feature vectors are not resulting from any material (or combination of materials). Finally, entropy takes the frequencies into account at which messages are expected to occur. By using compact encodings for frequent messages further efficiency can be gained.

Despite the above, the number of physically possible hyperspectral feature vectors is still much larger than the number of different feature vectors that actually does occur in any given image, which is bounded by the number of pixels.

Because Q is too large, (1) cannot be used to compute the entropy in an image. Method (2) is only useful when the probabilities of the feature vectors (information packets) in the image are already known. It is not possible to compute these from the frequencies at which they occur in the image directly, because the number of feature vectors that might occur is much larger than the number that does occur. Therefore, counting the latter does not lead to correct probability estimation.

NEAREST NEIGHBOR ESTIMATION

To estimate high-dimensional feature vector probabilities we investigate Nearest Neighbor estimation, which was successfully applied in multispectral feature spaces during previous work [Gorte and Stein, 1997]. We describe the overall method first and provide details later.

For any p that is present in the feature space of an image, we find the k nearest neighbors, also being actually present. In each spectral channel i we take the minimum and maximum value mini and maxi among the k neighbors and compute a size s{ = max{ - mini + 1. Adding 1 prevents the size to be 0 in case min and max{ happen

to be equal. By multiplying all sizes over the dimensions i we obtain the volume VB = ni si of the smallest rectangular «-dimensional hyper-block B that contains k image feature vectors: the p we were searching for, plus its k-1 nearest neighbors. When we denote the total number of pixels in the image as N, the probability for an arbitrary image pixel to have its feature vector (other than p) inside B equals:

Pb =(k-1)/N

Assuming VB is small (because k << N), the probability density inside B can be considered homogeneous, and the probability for finding a particular vector in size B, such as p, equals:

P(p) = Pb/Vb = (k-1) / NVb = (k-1) / (N ni s)

For entropy computation according to (2) the base-2 logarithm of P(p) is needed:

2log P(p) = 2log(k-1) - 2log N - l 2log Si

Substituting this into (2), the entropy of a hyperspectral image I of N pixels with feature vectors p (in which duplicates may occur) equals:

H(I) = (li 2log P(p)) / N = (li 2log(k-1) - 2log N - l 2log Si) / N

The following remarks apply:

- k is a (small) positive integer - we used k = 11 in the experiments.

- Nearest refers to the distance between features; the nearest neighbors may be far apart in the image (in the scene).

- Distances in the feature space can be measured Euclidian, but literature also suggests other metrics, such as city block distance or fractional distance [Kybic, J. & I. Vnucko, 2012].

- Distance computations and neighbor searches in high-dimensional feature spaces are controversial because of the so-called curse of dimensionality, which leads to the observation that the distance between near points gets very similar to the distance between far points. However, the differences between the volumes of hyperblocks, spanned around near vs. far point pairs, are significant, because volumes are increasing with the n-th power of sizes.

- Another "curse" of dimensionality is that neighbor searches require brute force: they are slow and hard to optimize when exact results are wanted. By accepting approximate neighbors, as, for example, provided by the ANN-software package [Arya and Mount, 1998], significant performance increase is obtained at the expense of little additional inaccuracy.

Nearest neighbor approaches to the high-dimensional entropy problem have been proposed earlier.

[Kybic and Vnucko, 2005] describe optimisations to a 1-NN method, known as the [Kozachenko and Leonenko, 2005] (KL) estimator. [Kybic, 2007] extends from 1-NN to k-NN; a major difference between his approach and ours is that he uses a single distance to constuct an M-dimensional hypersphere, where we use the mimina and maxima in each channel to compute a hyper-block with different sizes along the dimensions.

NOISE REDUCTION

As a bonus to the above-described probability estimation method, we obtain a very effective method for the suppression of random variations of values in the spatial/spectral domain. These variations materialize as a noisy appearance of individual channels: spatial variation that occurs at single pixels, which cannot be explained by actual heterogeneity in the terrain, and are not observed by any other observation technique. On the other hand, the variations appear as pixel spectra being less "smooth" than, for instance, those measured by field spectrometers (see also Figure 1). We consider these variations to be noise.

The proposed suppression is based on the neighbor search introduced above: for each p find k neighbors and construct the box B, defined by the mimima and maxima per channel, and finally replace p by the center c of box B; c is a new vector where each element (channel) is halfway between the minimum and the maximum of that diamension of B. After doing this for the entire image most feature vectors will have changed; therefore also the boxes, as well as their centers, change. This gives rise to an iterative process. After a number of iterations, the nearest neighbor search can be re-computed and a new series of iterations performed. This defines higher iteration level. All in all these iterations will not quickly converge, because feature vectors at the outside of the feature space will continue to "go inward" further and further. However, for most of the feature vectors the changes will become very small after a while, and this is when the process can be terminated.

The results after tree iterations of Nearest Neighbor search, with 15 sub-iterations of replacing spectra by spectra of nearest neighbor boxes, as described above, is shown in Figure 2. The image is from the well-known AVIRIS sample image recorded at Indiana Pines agricultural scene in 1992. The image consists of 224 spectral channels between 370 and 2500 nano-meters, from which 39 channels, mostly in the water absorption parts of spectrum, had been removed prior to noise reduction.

The spectra of the five pixels shown in Figure 1 (before noise reduction) are transformed into those in Figure 3 by the operation.

Figure 2: Original (left) and Noise-reduced (right) color composites of a hyper-spectral AVIRIS images. Above: channels (177, 77, 17) mapped to

RGB, below: channels (42, 22, 10)

0 20 40 60 SO 100 120 140 160 180

Figure 3: Noise-reduced spectra (cf. Figure 1)

CONCLUSION

In the context of dimensionality reduction of hyperspectral imagery we described a k-nearest-neighbor based approach for entropy computation, from which a novel method for noise reduction was derived. The method performs spectral smoothing, which is also clearly visible in the spatial domain. So far, the method has only been applied to AVIRIS imagery; its effect of data from other sensors, with different signal-to-hoise ratios, has to be further investigated.

REFERENCES

S. Arya, D. M. Mount, N. S. Netanyahu, R. Silverman and A. Wu, Optimal Algorithm for Approximate Nearest Neighbor Searching in Fixed Dimensions, Journal of the ACM, 45(6):891-923, 1998.

Gorte, B.G.H and A. Stein, Bayesian classification and class area estimation of satellite images using stratification, IEEE TGRS 36(3) 803-812 (1998)

S. E. Hosseini Aria; M. Menenti; B. Gorte, Spectral discrimination based on the optimal informative parts of the spectrum, Proc. SPIE 8537, Image and Signal Processing for Remote Sensing XVIII, 853709 (2012)

L.F. Kozachenko, N.N. Leonenko, On statistical estimation of entropy of random vector, Problems of Information Transmission 23 (9) (1987)

Kybic, J. & I. Vnucko, Approximate all nearest neighbor search for high dimensional entropy estimation for image registration, Signal Processing 92 (2012) 1302-1316.

Kybic J .High-dimensional entropy estimation for finite accuracy data: R-NN entropy estimator. Inf. Process Med Imaging. (2007) 20:569-80.

Entropy-based noise reduction in hyperspectral images Текст научной статьи по специальности «Компьютерные и информационные науки»

Аннотация научной статьи по компьютерным и информационным наукам, автор научной работы — Gorte Ben, Aria Enayat Hosseini, Menenti Massimo

Похожие темы научных работ по компьютерным и информационным наукам , автор научной работы — Gorte Ben, Aria Enayat Hosseini, Menenti Massimo

Текст научной работы на тему «Entropy-based noise reduction in hyperspectral images»