In what ways does the discovery of 1.5 million previously uncatalogued infrared cosmic objects by a high‑school algorithm challenge existing astronomical survey methodologies and data‑processing pipelines?
In the realm of modern astrophysics, the detection and classification of variable celestial phenomena rely heavily on the robustness of automated data-processing pipelines. However, an 18-year-old high school student, Matteo Paz, fundamentally disrupted the established methodological consensus by developing a novel artificial intelligence algorithm known as VARnet, which identified 1.5 million previously uncatalogued infrared cosmic objectsHE BEAT NASA! 😱 18-Year-Old Uses AI To Find 1.5 MILLION Hidden Space Objects!youtube +1. Operating under the mentorship of IPAC astronomer Davy Kirkpatrick at the Caltech Planet Finder Academy, Paz analyzed nearly 200 billion rows of data, comprising roughly 200 terabytes, collected over 10.5 years by NASA’s Near-Earth Object Wide-field Infrared Survey Explorer (NEOWISE) satelliteTeen Discovers 1.5 Million Unidentified Space Objects Based On ...yahoo +2. Out of 450 million celestial objects evaluated, the VARnet model flagged 1.9 million as variable sources, with 1.5 million showing no prior record in existing astronomical catalogsLocal High School Student Won $250,000 for Discovering 1.5 Million New Space Objects Using Alspiritsciencecentral +1.
This unprecedented discovery of supermassive black holes, quasars, binary stars, supernovae, and pulsing variable stars earned Paz a $250,000 prize in the Regeneron Science Talent Search and resulted in a peer-reviewed, single-author paper published in The Astronomical JournalThis High Schooler Found 1.5 Million Unknown Space Objects That ...bgr +2. More critically, the success of the VARnet architecture exposes profound structural, computational, and epistemological limitations in how existing astronomical survey methodologies approach faint, irregularly sampled time-domain signals.
A fundamental challenge exposed by the discovery of these 1.5 million objects is the astronomical community's prevailing reliance on image co-addition to achieve deep-sky sensitivity, an approach that systematically destroys high-frequency time-domain data. Following the conclusion of primary mission phases, standard pipelines routinely aggregate archival images to suppress background noise. For instance, the unWISE Catalog was constructed by co-adding all publicly available 3 to 5 micron WISE imaging, including NEOWISE-Reactivation data, increasing the total exposure time by a factor of 5 relative to the AllWISE catalogThe unWISE Catalog: Two Billion Infrared Sources from Five Years of WISE Imaging - IOPscienceiop . By simultaneously fitting thousands of crowded sources, unWISE detected objects roughly 0.7 magnitudes fainter than AllWISE at a 5σ confidence level, doubling the number of galaxies detected between redshifts 0 and 1, and tripling the number between redshifts 1 and 2, ultimately cataloging over half a billion galaxiesThe unWISE Catalog: Two Billion Infrared Sources from Five Years of WISE Imaging - IOPscienceiop .
While co-addition effectively recovers faint infrared sources missed by earlier pipelines due to sensitivity limits and source blending, it eliminates the temporal dimension entirelyA Sub-Millisecond Fourier and Wavelet Based Model to Extract Variable Candidates from the NEOWISE Single-Exposure Databasearxiv . To address this, compromise projects such as "unTimely" were developed to perform image co-additions only on temporally close exposures, attempting to improve data quality without completely flattening the timelineA Sub-Millisecond Fourier and Wavelet Based Model to Extract Variable Candidates from the NEOWISE Single-Exposure Databasearxiv . However, this methodology still degrades or entirely erases transient events characterized by very short observation windowsA Sub-Millisecond Fourier and Wavelet Based Model to Extract Variable Candidates from the NEOWISE Single-Exposure Databasearxiv .
Because NEOWISE's observational rhythm involved scanning in great circles centered on the Sun, its cadence was inherently irregular; it could not systematically detect variable objects that flashed once rapidly and faded, or those that altered their luminosity gradually over multi-year scales using standard differential detection“I Mapped the Invisible”: American High-School Student Stuns ...dailygalaxy +1. As a result, the time-domain data sat unread in institutional archives for over a decade, buried beneath the sheer volume of single-exposure detections“I Mapped the Invisible”: American High-School Student Stuns ...dailygalaxy +1. VARnet circumvented this structural blind spot by directly analyzing the "raw" NEOWISE single-exposure database—processing the individual photometric apparitions without the destructive smoothing inherent to co-added image stacksA Sub-Millisecond Fourier and Wavelet Based Model to Extract Variable Candidates from the NEOWISE Single-Exposure Databasearxiv .
The inability of professional astronomers to sift through the 200 billion rows of NEOWISE data was not merely a matter of human labor constraints; it represented a hard mathematical limitation of standard astronomical period-finding algorithmsTeen Discovers 1.5 Million Unidentified Space Objects Based On ...yahoo +1. Historically, astronomers employ the Lomb-Scargle periodogram to detect and characterize periodicity in unevenly-sampled time-series data, effectively evaluating a Fourier-like power spectrum estimator to locate oscillating signalsUnderstanding the Lomb-Scargle Periodogram - ar5iv - arXivarxiv .
However, the naive implementation of the standard Lomb-Scargle formula calculates periodograms with a computational scaling cost of O(NM), or essentially O(N2), where N is the number of data points and M is the number of evaluated frequenciesLomb-Scargle Periodograms — Astropy v8.0.0.dev511+g7dfb65162astropy +2. While optimized variations exist—such as the Press & Rybicki algorithm, which utilizes inverse interpolation and a fast Fourier transform to achieve a scaling of O(NlogM)—the algorithm remains exceptionally resource-intensive when deployed at a gigascale or terascale levelLomb-Scargle Periodograms — Astropy v8.0.0.dev511+g7dfb65162astropy +1. A brute-force grid search across wide frequency ranges requires highly granular step spacing to avoid missing narrow periodogram peaks; for instance, analyzing millions of sources with 400,000 to 450,000 different frequency samples can necessitate trillions of computational seconds, serving as a terminal bottleneck for large-scale databasesSession 13: Introduction To The Lomb-Scargle Periodogram (Lecture III)youtube +1.
Even with parallelized GPU-accelerated versions of Lomb-Scargle—which leverage NVIDIA architectures and specialized Nonuniform Fast Fourier Transform (NUFFT) libraries to achieve a 100-fold speedup on Intel CPUs and up to a 5600-fold acceleration on A100 GPUs—the algorithm still constructs multiple independent trigonometric matrices for multi-term harmonic fittingSC25 Postersupercomputing +1. Furthermore, the classic fast Fourier transform matrix, structured as a square matrix of rank N, reduces the Discrete Fourier Transform (DFT) to O(NlogN) time but produces variable-length feature vectors that are highly inconvenient for modern machine-learning classifier networks, which demand set-length prediction vectorsA Submillisecond Fourier and Wavelet-based Model to Extract ...iop .
To bypass this critical bottleneck, VARnet completely discarded the Lomb-Scargle approach. Instead, Paz engineered a novel mathematical methodology termed the Finite-Embedding Fourier Transform (FEFT)A Submillisecond Fourier and Wavelet-based Model to Extract ...iop +1. The FEFT modifies the original DFT by constructing a rectangular matrix Rm×N, effectively creating a DFT-like mapping that projects from an N-dimensional Euclidean basis into a coarser basis in m dimensionsA Submillisecond Fourier and Wavelet-based Model to Extract ...iop . Paz defined this transformation using two vectors, u and v, noting that after taking the natural logarithm of the original Vandermonde matrix, the result equals the outer product of the two vectorsA Submillisecond Fourier and Wavelet-based Model to Extract ...iop . By setting z=N−2iπ=ln(ω), establishing a restriction where v∈RN, and initializing the FEFT with u={0,1,…,N−1}, the matrix is transformedA Submillisecond Fourier and Wavelet-based Model to Extract ...iop . In the finalized VARnet architecture, the hyperparameter for samples was set to 700A Submillisecond Fourier and Wavelet-based Model to Extract ...iop .
Because the algorithm does not alter the vector v—but rather constructs it via integer multiples of z up to the length of the input—the element-wise exponentiation and outer product operations are exceptionally fastA Submillisecond Fourier and Wavelet-based Model to Extract ...iop . The FEFT possesses a time complexity of O(samples×n), making it mathematically faster than a full fast Fourier transform by projecting irregular light curves directly into a definite vector spaceA Submillisecond Fourier and Wavelet-based Model to Extract ...iop .
The structural advantages of the Finite-Embedding Fourier Transform cascade directly into VARnet's machine-learning architecture, exposing the inefficiencies of the standard neural networks commonly utilized in astronomy. Traditionally, when processing unevenly sampled multivariate time series, astronomers rely on Long-Short Term Memory (LSTM) models and Recurrent Neural Networks (RNNs)Transient classifiers for Fink - Benchmarks for LSSTaanda . Because observational intervals are irregular, these models require the injection of artificial "delta-time" features—variables indicating the time elapsed since the last observation—fed alongside flux and flux-error metrics, allowing the RNN to estimate temporal evolutionTransient classifiers for Fink - Benchmarks for LSSTaanda .
However, NEOWISE single-exposure sequences are wildly heterogeneous, encompassing input lengths ranging from tens to thousands of individual entriesA Submillisecond Fourier and Wavelet-based Model to Extract ...iop . When applied to such extreme variable-length sequences, recursive models like RNNs or LSTMs demonstrate significantly inferior efficacy compared to fixed-length embedding techniquesA Submillisecond Fourier and Wavelet-based Model to Extract ...iop . Similarly, modern Attention-based models (such as Transformers), while adept at identifying semantic contexts in some visual perception algorithms, impose a severe speed penalty and are notoriously difficult to train on immense, unstructured photometric datasetsA Submillisecond Fourier and Wavelet-based Model to Extract ...iop +1.
By utilizing the FEFT, VARnet instantaneously outputs a vector of a fixed sizeA Submillisecond Fourier and Wavelet-based Model to Extract ...iop . This fixed vector is seamlessly ingested by downstream convolutional and fully connected network layersA Submillisecond Fourier and Wavelet-based Model to Extract ...iop . To further enhance detection sensitivity for objects that pulse unpredictably, Paz integrated wavelet decomposition, an advanced mathematical technique that cleans and compresses noisy measurements while extracting underlying periodic patterns without erasing meaningful signal fluctuationsLocal High School Student Won $250,000 for Discovering 1.5 Million New Space Objects Using Alspiritsciencecentral +1.
The integration of FEFT, wavelet decomposition, and convolutional deep learning allows VARnet to leverage the parallel-processing capabilities of modern graphic processing unitsA Sub-Millisecond Fourier and Wavelet Based Model to Extract Variable Candidates from the NEOWISE Single-Exposure Databasearxiv . Operating on a standard GPU equipped with 22 GB of VRAM, VARnet achieved a sub-millisecond per-source processing runtime of less than 53 μs, even when processing sequences comprising approximately 2,000 points per light curveA Submillisecond Fourier and Wavelet-based Model to Extract ...iop +1. Rigorously evaluated, the model secured an F1 score of 0.91 across a four-class classification scheme on a validation dataset of real variable infrared sources, proving both highly precise and sensitive to previously uncatalogued celestial bodiesA Submillisecond Fourier and Wavelet-based Model to Extract ...iop .
The sheer volume of new candidates discovered by VARnet—1.5 million variables out of 1.9 million flagged sources—highlights the overly aggressive noise-filtering protocols embedded in historical data-processing pipelinesLocal High School Student Won $250,000 for Discovering 1.5 Million New Space Objects Using Alspiritsciencecentral .
In standard transient searches, the primary technique is Difference Image Analysis (DIA), where a new "science image" is digitally subtracted from an older "template image" to isolate changing pixelsTransient Astronomy Pipelines: Automating Sky Surveillanceyoutube . However, atmospheric turbulence, detector bad pixels, cosmic ray strikes, and low Earth orbit satellites create massive volumes of artifacts, resulting in a "real vs. bogus" problem where difference imaging yields 100 false positives for every genuine astrophysical transientTransient Astronomy Pipelines: Automating Sky Surveillanceyoutube .
To combat this, historical infrared surveys instituted deliberately conservative signal-to-noise ratio (SNR) cutoffs. The Two Micron All Sky Survey (2MASS) Point Source Catalog restricted valid detections to an SNR ≈10, explicitly discarding faint sources near the detection limit to minimize false-positive rates2MASS Re-processing I: The Search for Faint Objectsarxiv . Similarly, the NEOWISE Scan/Frame pipeline set its detection threshold to an SNR of 3.0 in combined W1 and W2 detection imagesNEOWISE Data Release Explanatory Supplement: Data Processingcaltech . Furthermore, the WISE Moving Object Pipeline System (WMOPS) actively utilized transient pixel metadata and adjusted SNR thresholds for reduced χ2 filtering specifically to reject stationary objects and prevent large numbers of spurious single-exposure detections from contaminating moving asteroid trackletsNEOWISE Data Release Explanatory Supplement: Data Processingcaltech . By prioritizing the tracking of highly luminous near-Earth asteroids, the pipeline systematically ignored the subtle heat variations of faint, distant background variablesTeen Discovers 1.5 Million Unidentified Space Objects Based On ...yahoo .
To reclaim these discarded targets from the single-exposure database, Paz confronted a dataset entirely lacking spatial structure, where data rows detailing the same astronomical source were scattered arbitrarily among noise generated by streaks, uncorrected cosmic rays, and galactic nebulosity(PDF) A Submillisecond Fourier and Wavelet-based Model to Extract ...researchgate . He implemented the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm, which groups data points by assessing reachability and spatial density without requiring prior assumptions regarding the expected number of clusters or their geometric shapes(PDF) A Submillisecond Fourier and Wavelet-based Model to Extract ...researchgate .
Paz precisely calibrated the DBSCAN hyper-parameters based on the telescope's error profile and observational cadence. Recognizing that points separated by up to 0.85 degrees could still relate to the same source due to point-spread function fitting errors, the search radius (ε) was defined as 0.85 degrees(PDF) A Submillisecond Fourier and Wavelet-based Model to Extract ...researchgate . Furthermore, because NEOWISE collected between 12 and 16 apparitions roughly every six months, the minimum points threshold (minPts) was set to 12, ensuring that the majority of detections within an observational epoch were appropriately classified as core points(PDF) A Submillisecond Fourier and Wavelet-based Model to Extract ...researchgate . This configuration successfully recovered legitimate border points and extracted unified light curves from complex, noisy spatial distributions where competing algorithms, such as OPTICS and HDBSCAN, failed by clustering on abnormally dense regions of nebulosity(PDF) A Submillisecond Fourier and Wavelet-based Model to Extract ...researchgate .
The public release of this catalog in late 2025 immediately redirected observational priorities across the astronomical community, driving active targeting campaigns at major research facilities, including the Vera C. Rubin ObservatoryAmerican high school student stuns scientists by mapping ... - Futurafutura-sciences +1.
For the Rubin Observatory’s upcoming Legacy Survey of Space and Time (LSST), which will traverse the southern sky with a 3.2-gigapixel camera to produce up to 20 terabytes of data every single night, the VARnet methodology presents a highly relevant technological breakthroughUsing Accelerated Computing to Live-Steer Scientific Experiments at Massive Research Facilitiesnvidia . A fundamental challenge for the LSST is the "kiloscale" problem: generating actionable real-time alert streams based on multiband, multi-epoch photometryLSST Data Science Overviewlsst . LSST objects will yield multivariate time series comprising roughly 100 asynchronously and irregularly sampled measurements per year, growing to approximately 1,000 measurements per object by Data Release 11LSST Data Science Overviewlsst .
The current open-source, CPU-based data-processing pipelines utilized by the astronomy community require up to 10 minutes to process images that are acquired every 40 seconds, a massive bottleneck for live data-steering and global alert-broker networksUsing Accelerated Computing to Live-Steer Scientific Experiments at Massive Research Facilitiesnvidia . Furthermore, generating inferences on partially populated, irregularly cadenced light curves using current alert production frameworks remains mechanically restricted; baseline DIAObject alert records currently lack comprehensive time-series featuresREADME.md - Rubin Observatory Sample Alertsgithub . By demonstrating that raw, un-coadded time-domain signals can be processed with extraordinary precision using finite-embedding transformations at a rate of 53 μs per object, Matteo Paz’s VARnet algorithm establishes a new computational standard for mapping the transient universeA Submillisecond Fourier and Wavelet-based Model to Extract ...iop .