What advancements in clinical neurosciences need to occur in the next 10 years?
In 2013, Henry Markram laid out his “seven challenges for neuroscience”, the very first of which was for neuroscience to become a “big science”: for it to integrate into itself the big data methods increasingly seen in other fields. A further five of the six other challenges explicitly necessitated big data in some form -- from multi-scale data collection and curation, to the accurate modelling and simulation of brain states in health and disease. The same year saw the outset of Markram’s €1-billion brain-child: the Human Brain Project (HBP) -- a flagship neuroscience research consortium that demonstrated the optimism of the time.
This optimism was well-placed. Big data had already transformed the fields of astronomy, particle physics and genomics, and this significant progress had started to leak into healthcare and neuroscience. Within a few years either side of the launch of HBP, methods had been developed for the automated classification of neuron morphology (leading more recently to pipelines for big data approaches to morphological analysis), publicly-available large scale neuroimaging datasets had been made available, and 1054 neuron lines in Drosophila had been mapped and optogenetically activated in a landmark paper involving over 37,000 animals.
Accordingly, in 2019, “big data” was added as a Medical Subject Heading (MeSH), solidifying its place in biomedical research settings. Over the past years, the number of neuroscience papers involving big data or machine learning has risen supra-linearly -- both in absolute numbers and as a proportion (see Figure 1). In this paper, I propose that the drive of big data will stimulate significant change in neuroscience over the next decade -- changes that need to occur if neuroscience is to maximise the potential big data holds.
As big data have begun to revolutionise healthcare and neuroscience, a number of proposed definitions have been suggested. For instance, Baro et al. suggest a threshold of data volume > log(np), where n is the number of subjects and p is the number of variables. This definition has been criticised as too restrictive, with others recognising alternative meanings of “big” in big data -- including data with large variety, velocity, value, and veracity. Ultimately, the definition adopted in the MeSH database was a generalist one: “Extremely large amounts of data which require rapid and often complex computational analyses to reveal patterns, trends, and associations, relating to various facets of human and non-human entities.”
No matter how one slices it -- and which definition one uses -- data in neuroscience are big. The data from one recent study occupied 30GB of storage per participant (6TB in total).[13,14] Human brain imaging within the UK Biobank project has already resulted in a dataset of multimodal scans (including structural, functional and diffusion MRI) from over 5000 participants, with the ultimate target set at a sample size of 100,000. Healthcare and clinical neuroscience generates huge, complex, and often unstructured datasets containing patient records, histories, blood tests, imaging, amongst other modalities. Small, heterogeneous datasets -- such as single-unit recordings in animal models from various areas of the brain -- could also in principle be combined to form big datasets from which improved knowledge can be drawn. Further, as the relentless advancement of technology continues, even these “small” datasets become far larger and more cumbersome: over the last few decades we have moved from recording a single neuron at a time, to having the potential to simultaneously record many hundreds of neurons using multiple electrode arrays or large-channel silicon probes.
These advancements highlight a number of contentious issues and potential pitfalls for any given study that have recently begun to be discussed in the literature: those of data governance,[18,19] ethics,[20,21] sharing,[15,22,23] and the challenges of appropriate and rigorous analysis.[21,24] The modern neuroscientist must navigate these pitfalls, although precious few are explicitly addressed in neuroscience teaching at the undergraduate or postgraduate level. The future of neuroscience entails the development of novel solutions and practices that directly address the complexities of modern data.
Changing practices in neuroscience
The oft-quoted potential of big data to empower exploratory science and accelerate discovery necessitates the appropriate and meticulous design and implementation of analysis pipelines. Analysis and experimental design are intimately related -- analyses must take into account the limitations of the data due to the experimental setup; experimental design should consider which statistical approaches are valid to address the question in hand and what quantity and quality of data are needed for an adequate statistical power -- and so the tasks of big data analysis fall within the purview of the neuroscientist.
In particular, Bzdok & Yeo identify a number of unique statistical and analytic difficulties that emerge as dataset volume increases: including the limited expressive capacity of the traditional parametric model (such as the Gaussian distribution), the tendency for p values to reduce with increases in sample size, increased complexities in design and interpretation of non-parametric or Bayesian approaches, and the dramatically increased computational cost of fitting non-parametric or Bayesian models to datasets, where these models may not scale well in high-dimensional spaces.
Neuroscientists that address these must walk a line between thinking neuroscientifically and thinking statistically; they must develop competence in varied aspects of computer science and programming; they must become comfortable with the new ways neuroscience is done. As a field, neuroscience must become integrated with the new methods it has adopted. I suggest this is especially true of clinical neuroscience, in which the potential exists for the generation of large, complex datasets from patients (often with long histories of chronic disease) that include various sources of information -- from brain scans to clinical records to blood test results to surveys and follow-up data.
Potential of big data approaches in clinical neuroscience
As an illustrative example of the complexities that neuroscience must face in the near future -- and the potential that solving them holds -- analysing large populations of brain scans necessitates novel analytic techniques that are robust to variability between subjects and scanners. Novel methods for intersubject alignment of MRI scans and machine learning-based parcellation of human cortex delineated 180 distinct areas per hemisphere, of which 97 were newly described. This finding epitomises the power of big data: it enables neuroscientists to extract robust and statistically significant signals, fueling discoveries that otherwise would be lost in experimental noise.
Similarly, genome-wide association data from over a million participants highlighted the distinctness of a number of neurological disorders (including Alzheimer’s disease, epilepsy and migraine), which showed few significant genetic correlations with one another. Big data genomic analyses like these can help to provide robust evidence for diagnostic practice, or to guide changes in practice: for instance, in psychiatry, different diseases were associated with significant genetic overlap, indicating shared underlying physiology and -- according to the authors -- that current diagnostic boundaries may not reflect distinct disease processes. Reliable clustering methods, many of which require large datasets, may be a means to capture some of the remaining variability in psychiatric patient phenotypes, guiding updates in psychiatric nosology.
Further applications of machine-learning in clinical neuroscience includes the use of brain imaging to diagnose neurological and psychiatric disorders, with models having been successfully trained to diagnose schizophrenia from structural and resting-state functional MRI,[28,29] major depressive disorder from resting-state functional MRI, and intracranial haemorrhage and stroke using CT images. Machine-learning approaches in clinical neuroscience and neurology are likely to become more commonplace as big data penetrates the field, as these methods can handle large and sparse datasets, and perform best when presented with large quantities of training input.
Neuroscience in the next decade
These challenges situate neuroscience at an exciting but complex impasse: the bottleneck is no longer how to collect sufficient data, but what to do with it. If neuroscience is to make the most of the big datasets that will increasingly become available, changes must be made in its practice and praxis. Some of the likely changes needed in neuroscience over the next ten years will be listed here.
Firstly, it is likely that big data will continue to be used more often in papers. If the trends seen in Figure 1 are to continue, and as big datasets become easier to collect, store and curate, clinical neuroscience will be increasingly able to make use of big data to generate knowledge and guide clinical practice. This will require cultural shifts towards data sharing between centres and ethical data mining from patient records, and will lead to large, open-access datasets which have the potential to spur large outputs of discovery science.
Secondly, statistical and analytic methods will be added into neuroscientists’ repertoires and become a central part of day-to-day activity. As pipelines become more complicated, and more sophisticated techniques are required to extract and interpret information from big datasets, neuroscience will likely see a shift towards the use of a more mathematical and statistical framework for understanding data and, ultimately, the brain. Amongst other things, changes should be seen in how experiments are designed -- taking this beyond the approach undertaken by most labs currently -- so modern neuroscience becomes robust to the complexities of big data and integrates the statistical approaches expected from a big science. In doing this, neuroscientists will become data scientists of a sort: comfortable with programming, modelling, and non-frequentist (Bayesian) statistics.
Additional methods increasingly employed by neuroscientists, and which are thought to hold considerable promise, are those of machine learning. Artificial intelligence approaches are capable of learning complex input-output relationships, are robust to variance introduced across individuals and cohorts (for instance, when employed in the diagnosis of Alzheimer’s disease) and, importantly, are amenable to large datasets in ways that conventional analyses often are not. Machine learning approaches are likely to become more ubiquitous in the neurosciences over the next ten years, as data become increasingly complex and mathematical aptitude amongst neuroscientists grows.
Thirdly, neuroscience is likely to take significant steps towards standardisation. If big data are to bridge across disparate methods and scales in order to generate an integrated understanding, this new knowledge must be both inter-species and cross-modal -- from single-unit recording, patch clamping, field potentials, calcium-imaging, and other approaches with small spatial resolution; to neuroimaging, transcranial magnetic stimulation, EEG/MEG, and ultimately behaviour. Consistent collection, labelling, and maintenance of data will enable easier data sharing and for data to be combined into big data repositories for further analysis by individual labs.
Finally, I hope, conceptual clarity and theoretical constructs will emerge that unify the vast quantity of data that are generated by neuroscience -- but this can happen only after the technical issues described above have been addressed and neuroscience becomes the “big science” Markram describes.
Sorting out these problems is to carve a new form of neuroscience: it is to drop the traditionalist methods and concepts of old, and embrace the multidisciplinarity we see in today’s neuroscience labs. Tomorrow’s neuroscientist -- and by extension, tomorrow’s basic and clinical neuroscience -- will be a computer scientist: those with statistical prowess, programming competence, and the confidence to lay down new pipelines and explore new data horizons. This decade begins with a plethora of unstructured data, and -- if we are successful -- will end with a structured understanding of the brain at a scale we have never known before.
Figure 1: The number of neuroscience papers involving big data or machine learning has grown supralinearly over the last decade. Data were extracted from PubMed (NIH) using the searches ‘“neuroscience” & “big data”’ (black line) and ‘“neuroscience” & “machine learning”’ (grey line) respectively. A: Number of papers as an absolute count. B: Number of papers as a percentage of all neuroscience papers indexed for that year (identified through a search for “neuroscience”).
(1) Markram H. Seven challenges for neuroscience. Functional Neurology. 2013;28(3):145–151. https://doi.org/10.11138/FNeur/2013.28.3.144
(2) Kandel ER, Markram H, Matthews PM, Yuste R, Koch C. Neuroscience thinks big (and collaboratively). Nature Reviews. Neuroscience. 2013;14(9):659–664. https://doi.org/10.1038/nrn3578
(3) Zhang Y, Zhao Y. Astronomy in the Big Data Era. Data Science Journal. 2015;14:11. http://doi.org/10.5334/dsj-2015-011
(4) Gutsche O, Cremonesi M, Elmer P, Jayatilaka B, Kowalkowski J, Pivarski J, Sehrish S, Mantilla Surez C, Svyatkovskiy A, Tran N. Journal of Physics: Conference Series. 2017;898:072012.
(5) Stephens ZD, Lee SY, Faghri F, Campbell RH, Zhai C, Efron MJ, Iyer R, Schatz MC, Sinha S, Robinson GE. Big Data: Astronomical or Genomical?. PLoS Biology. 2015;13(7), e1002195.
(6) Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nature Medicine. 2019;25:44–56 https://doi.org/10.1038/s41591-018-0300-7
(7) Vasques X, Vanel L, Villette G, Cif L. Morphological Neuron Classification Using Machine Learning. Frontiers in Neuroanatomy 2016;10:102. https://doi.org/10.3389/fnana.2016.00102
(8) Aghili M, Fang R. Mining Big Neuron Morphological Data. Computational Intelligence and Neuroscience, 2018;8234734. https://doi.org/10.1155/2018/8234734
(9) Miller KL, Alfaro-Almagro F, Bangerter NK, Thomas DL, Yacoub E, Xu J, Bartsch AJ, Jbabdi S, Sotiropoulos SN, Andersson JL, Griffanti L, Douaud G, Okell TW, Weale P, Dragonu I, Garratt S, Hudson S, Collins R, Jenkinson M, Matthews PM, … Smith SM. Multimodal population brain imaging in the UK Biobank prospective epidemiological study. Nature Neuroscience. 2016;19(11):1523–1536. https://doi.org/10.1038/nn.4393
(10) Vogelstein JT, Park Y, Ohyama T, Kerr RA, Truman JW, Priebe CE, Zlatic M. Discovery of brainwide neural-behavioral maps via multiscale unsupervised structure learning. Science. 2014;344(6182):386–392. https://doi.org/10.1126/science.1250298
(11) Baro E, Degoul S, Beuscart R, Chazard E. Toward a Literature-Driven Definition of Big Data in Healthcare. BioMed Research International. 2015;639021. https://doi.org/10.1155/2015/639021
(12) Glezerson BA, Flexman AM. The Promise and Perils of Big Data in the Clinical Neurosciences. Journal of Neurosurgical Anesthesiology. 2020;32(1):1-3. https://doi.org/10.1097/ANA.0000000000000659
(13) Landhuis E. Neuroscience: Big brain, big data. Nature. 2017;541(7638):559–561. https://doi.org/10.1038/541559a
(14) Glasser MF, Coalson TS, Robinson EC, Hacker CD, Harwell J, Yacoub E, Ugurbil K, Andersson J, Beckmann CF, Jenkinson M, Smith SM, Van Essen DC. A multi-modal parcellation of human cerebral cortex. Nature, 536(7615), 171–178. https://doi.org/10.1038/nature18933
(15) Ferguson AR, Nielson JL, Cragin MH, Bandrowski AE, Martone ME. Big data from small data: data-sharing in the 'long tail' of neuroscience. Nature Neuroscience. 2014;17(11):1442–1447. https://doi.org/10.1038/nn.3838
(16) Miccoli B, Lopez CM, Goikoetxea E, Putzeys J, Sekeri M, Krylychkina O, Chang SW, Firrincieli A, Andrei A, Reumers V, Braeken D. High-Density Electrical Recording and Impedance Imaging With a Multi-Modal CMOS Multi-Electrode Array Chip. Frontiers in Neuroscience. 2019;13:641. https://doi.org/10.3389/fnins.2019.00641
(17) Jun JJ, Steinmetz NA, Siegle JH, Denman DJ, Bauza M, Barbarits B, Lee AK, Anastassiou CA, Andrei A, Aydın Ç, Barbic M, Blanche TJ, Bonin V, Couto J, Dutta B, Gratiy SL, Gutnisky DA, Häusser M, Karsh B, Ledochowitsch P, … Harris TD. Fully integrated silicon probes for high-density recording of neural activity. Nature. 2017;551(7679):232–236. https://doi.org/10.1038/nature24636
(18) Fothergill BT, Knight W, Stahl BC, Ulnicane I. Responsible Data Governance of Neuroscience Big Data. Frontiers in Neuroinformatics. 2019;13:28. https://doi.org/10.3389/fninf.2019.00028
(19) Lefaivre S, Behan B, Vaccarino A, Evans K, Dharsee M, Gee T, Dafnas C, Mikkelsen T, Theriault E. Big Data Needs Big Governance: Best Practices From Brain-CODE, the Ontario-Brain Institute's Neuroinformatics Platform. Frontiers in Genetics. 2019;10:191. https://doi.org/10.3389/fgene.2019.00191
(20) Kellmeyer P. Big Brain Data: On the Responsible Use of Brain Data from Clinical and Consumer-Directed Neurotechnological Devices. Neuroethics. 2018. https://doi.org/10.1007/s12152-018-9371-x
(21) Ienca M, Ignatiadis K. Artificial Intelligence in Clinical Neuroscience: Methodological and Ethical Challenges. AJOB Neuroscience. 2020;11(2):77–87. https://doi.org/21507740.2020.1740352
(22) Ascoli GA. Sharing Neuron Data: Carrots, Sticks, and Digital Records. PLoS Biology. 2015;13(10):e1002275. https://doi.org/10.1371/journal.pbio.1002275
(23) Ascoli GA, Maraver P, Nanda S, Polavaram S, Armañanzas R. Win-win data sharing in neuroscience. Nature Methods. 2017;14(2):112–116. https://doi.org/10.1038/nmeth.4152
(24) Bzdok D, Yeo B. Inference in the age of big data: Future perspectives on neuroscience. NeuroImage. 2017;155:549–564. https://doi.org/10.1016/j.neuroimage.2017.04.061
(25) Toga AW, Foster I, Kesselman C, Madduri R, Chard K, Deutsch EW, Price ND, Glusman G, Heavner BD, Dinov ID, Ames J, Van Horn J, Kramer R, Hood L. Big biomedical data as the key resource for discovery science. Journal of the American Medical Informatics Association: JAMIA. 2015;22(6):1126–1131. https://doi.org/10.1093/jamia/ocv077
(26) Brainstorm Consortium, Anttila V, Bulik-Sullivan B, Finucane HK, Walters RK, Bras J, Duncan L, Escott-Price V, Falcone GJ, Gormley P, Malik R, Patsopoulos NA, Ripke S, Wei Z, Yu D, Lee PH, Turley P, Grenier-Boley B, Chouraki V, Kamatani Y, … Murray R. Analysis of shared heritability in common disorders of the brain. Science. 2018;360(6395):eaap8757. https://doi.org/10.1126/science.aap8757
(27) Pillow J, Sahani M. Editorial overview: Machine learning, big data, and neuroscience. Current Opinion in Neurobiology. 2019;55:iii–iv. https://doi.org/10.1016/j.conb.2019.05.002
(28) Plis SM, Hjelm DR, Salakhutdinov R, Allen EA, Bockholt HJ, Long JD, Johnson HJ, Paulsen JS, Turner JA, Calhoun VD. Deep learning for neuroimaging: a validation study. Frontiers in Neuroscience. 2014;8:229. https://doi.org/10.3389/fnins.2014.00229
(29) Kim J, Calhoun VD, Shim E, Lee JH. Deep neural network with weight sparsity control and pre-training extracts hierarchical features and enhances classification performance: Evidence from whole-brain resting-state functional connectivity patterns of schizophrenia. NeuroImage. 2016;124(Pt A):127–146. https://doi.org/10.1016/j.neuroimage.2015.05.018
(30) Guo H, Cheng C, Cao X, Xiang J, Chen J, Zhang K. Resting-state functional connectivity abnormalities in first-onset unmedicated depression. Neural Regeneration Research. 2014;9:153-63. https://doi.org/10.4103/1673-5374.125344
(31) Titano, JJ, Badgeley M, Schefflein J, Pain M, Su A, Cai M, Swinburne N, Zech J, Kim J, Bederson J, Mocco J, Drayer B, Lehar J, Cho S, Costa A, Oermann EK. Automated deep-neural-network surveillance of cranial images for acute neurologic events. Nature Medicine. 2018;24(9):1337–1341. https://doi.org/10.1038/s41591-018-0147-y
(32) Vu MT, Adalı T, Ba D, Buzsáki G, Carlson D, Heller K, Liston C, Rudin C, Sohal VS, Widge AS, Mayberg HS, Sapiro G, Dzirasa K. A Shared Vision for Machine Learning in Neuroscience. Journal of Neuroscience. 2018;38(7):1601–1607. https://doi.org/10.1523/JNEUROSCI.0508-17.2018
(33) Lebedev AV, Westman E, Van Westen GJ, Kramberger MG, Lundervold A, Aarsland D, Soininen H, Kłoszewska I, Mecocci P, Tsolaki M, Vellas B, Lovestone S, Simmons A, Alzheimer's Disease Neuroimaging Initiative and the AddNeuroMed consortium. Random Forest ensembles for detection and prediction of Alzheimer's disease with a good between-cohort robustness. NeuroImage: Clinical. 2014;6:115–125. https://doi.org/10.1016/j.nicl.2014.08.023
(34) Sejnowski TJ, Churchland PS, Movshon JA. Putting big data to good use in neuroscience. Nature Neuroscience. 2014;17(11):1440–1441. https://doi.org/10.1038/nn.3839