ResearchHub | Open Science Community

Provable data possession at untrusted stores

Giuseppe Ateniese et al.Oct 28, 2007

We introduce a model for provable data possession (PDP) that allows a client that has stored data at an untrusted server to verify that the server possesses the original data without retrieving it. The model generates probabilistic proofs of possession by sampling random sets of blocks from the server, which drastically reduces I/O costs. The client maintains a constant amount of metadata to verify the proof. The challenge/response protocol transmits a small, constant amount of data, which minimizes network communication. Thus, the PDP model for remote data checking supports large data sets in widely-distributed storage system.

Philosophy

Artificial Intelligence

0

Paper

Save

Saturated Reconstruction of a Volume of Neocortex

Narayanan Kasthuri et al.Jul 1, 2015

We describe automated technologies to probe the structure of neural tissue at nanometer resolution and use them to generate a saturated reconstruction of a sub-volume of mouse neocortex in which all cellular objects (axons, dendrites, and glia) and many sub-cellular components (synapses, synaptic vesicles, spines, spine apparati, postsynaptic densities, and mitochondria) are rendered and itemized in a database. We explore these data to study physical properties of brain tissue. For example, by tracing the trajectories of all excitatory axons and noting their juxtapositions, both synaptic and non-synaptic, with every dendritic spine we refute the idea that physical proximity is sufficient to predict synaptic connectivity (the so-called Peters’ rule). This online minable database provides general access to the intrinsic complexity of the neocortex and enables further data-driven inquiries.Video Abstracthttps://www.cell.com/cms/asset/70b54e3e-5615-4052-a9cb-b4f3f0eb5513/mmc18.mp4Loading ...(mp4, 23.08 MB) Download video

Biochemistry

Biophysics

0

Paper

Save

A public turbulence database cluster and applications to study Lagrangian evolution of velocity increments in turbulence

Yi Li et al.Jan 1, 2008

A public database system archiving a direct numerical simulation (DNS) data set of isotropic, forced turbulence is described in this paper. The data set consists of the DNS output on 10243 spatial points and 1024 time samples spanning about one large-scale turnover time. This complete 10244 spacetime history of turbulence is accessible to users remotely through an interface that is based on the Web-services model. Users may write and execute analysis programs on their host computers, while the programs make subroutine-like calls that request desired parts of the data over the network. The users are thus able to perform numerical experiments by accessing the 27 terabytes (TB) of DNS data using regular platforms such as laptops. The architecture of the database is explained, as are some of the locally defined functions, such as differentiation and interpolation. Test calculations are performed to illustrate the usage of the system and to verify the accuracy of the methods. The database is then used to analyze a dynamical model for small-scale intermittency in turbulence. Specifically, the dynamical effects of pressure and viscous terms on the Lagrangian evolution of velocity increments are evaluated using conditional averages calculated from the DNS data in the database. It is shown that these effects differ considerably among themselves and thus require different modeling strategies in Lagrangian models of velocity increments and intermittency.

Artificial Intelligence

Atmospheric Science

0

Paper

Artificial Intelligence

510

0

Save

0

MR-PDP: Multiple-Replica Provable Data Possession

Reza Curtmola et al.Jun 1, 2008

Many storage systems rely on replication to increase the availability and durability of data on untrusted storage systems. At present, such storage systems provide no strong evidence that multiple copies of the data are actually stored. Storage servers can collude to make it look like they are storing many copies of the data, whereas in reality they only store a single copy. We address this shortcoming through multiple-replica provable data possession (MR-PDP): A provably-secure scheme that allows a client that stores t replicas of a file in a storage system to verify through a challenge-response protocol that (1) each unique replica can be produced at the time of the challenge and that (2) the storage system uses t times the storage required to store a single replica. MR-PDP extends previous work on data possession proofs for a single copy of a file in a client/server storage system (Ateniese et al., 2007). Using MR-PDP to store t replicas is computationally much more efficient than using a single-replica PDP scheme to store t separate, unrelated files (e.g., by encrypting each file separately prior to storing it). Another advantage of MR-PDP is that it can generate further replicas on demand, at little expense, when some of the existing replicas fail.

Artificial Intelligence

Information Systems

0

Paper

Artificial Intelligence

467

0

Save

0

Remote data checking using provable data possession

Giuseppe Ateniese et al.May 1, 2011

+5

R

G

We introduce a model for provable data possession (PDP) that can be used for remote data checking: A client that has stored data at an untrusted server can verify that the server possesses the original data without retrieving it. The model generates probabilistic proofs of possession by sampling random sets of blocks from the server, which drastically reduces I/O costs. The client maintains a constant amount of metadata to verify the proof. The challenge/response protocol transmits a small, constant amount of data, which minimizes network communication. Thus, the PDP model for remote data checking is lightweight and supports large data sets in distributed storage systems. The model is also robust in that it incorporates mechanisms for mitigating arbitrary amounts of data corruption. We present two provably-secure PDP schemes that are more efficient than previous solutions. In particular, the overhead at the server is low (or even constant), as opposed to linear in the size of the data. We then propose a generic transformation that adds robustness to any remote data checking scheme based on spot checking. Experiments using our implementation verify the practicality of PDP and reveal that the performance of PDP is bounded by disk I/O and not by cryptographic computation. Finally, we conduct an in-depth experimental evaluation to study the tradeoffs in performance, security, and space overheads when adding robustness to a remote data checking scheme.

Artificial Intelligence

Biochemistry

0

Paper

Artificial Intelligence

385

0

Save

6

A low-resource reliable pipeline to democratize multi-modal connectome estimation and analysis

Jaewon Chung et al.Nov 3, 2021

Abstract Connectomics—the study of brain networks—provides a unique and valuable opportunity to study the brain. Research in human connectomics, leveraging functional and diffusion Magnetic Resonance Imaging (MRI), is a resource-intensive practice. Typical analysis routines require significant computational capabilities and subject matter expertise. Establishing a pipeline that is low-resource, easy to use, and off-the-shelf (can be applied across multifarious datasets without parameter tuning to reliably estimate plausible connectomes), would significantly lower the barrier to entry into connectomics, thereby democratizing the field by empowering a more diverse and inclusive community of connectomists. We therefore introduce ‘MRI to Graphs’ ( m2g ). To illustrate its properties, we used m2g to process MRI data from 35 different studies (≈ 6,000 scans) from 15 sites without any manual intervention or parameter tuning. Every single scan yielded an estimated connectome that adhered to established properties, such as stronger ipsilateral than contralateral connections in structural connectomes, and stronger homotopic than heterotopic correlations in functional connectomes. Moreover, the connectomes estimated by m2g are more similar within individuals than between them, suggesting that m2g preserves biological variability. m2g is portable, and can run on a single CPU with 16 GB of RAM in less than a couple hours, or be deployed on the cloud using its docker container. All code is available on https://github.com/neurodata/m2g and documentation is available on docs.neurodata.io/m2g.

Artificial Intelligence

Cognitive Neuroscience

6

Paper

Artificial Intelligence

2

0

Save

0

A High-Throughput Pipeline Identifies Robust Connectomes But Troublesome Variability

Gregory Kiar et al.Sep 14, 2017

Modern scientific discovery depends on collecting large heterogeneous datasets with many sources of variability, and applying domain-specific pipelines from which one can draw insight or clinical utility. For example, macroscale connectomics studies require complex pipelines to process raw functional or diffusion data and estimate connectomes. Individual studies tend to customize pipelines to their needs, raising concerns about their reproducibility, and adding to a longer list of factors that may differ across studies (including sampling, experimental design, and data acquisition protocols), resulting in failures to replicate. Mitigating these issues requires multi-study datasets and the development of pipelines that can be applied across them. We developed NeuroData's MRI to Graphs (NDMG) pipeline using several functional and diffusion studies, including the Consortium for Reliability and Reproducibility, to estimate connectomes. Without any manual intervention or parameter tuning, NDMG ran on 25 different studies (~6,000 scans) from 15 sites, with each scan resulting in a biologically plausible connectome (as assessed by multiple quality assurance metrics at each processing stage). For each study, the connectomes from NDMG are more similar within than across individuals, indicating that NDMG is preserving biological variability. Moreover, the connectomes exhibit near perfect consistency for certain connectional properties across every scan, individual, study, site, and modality; these include stronger ipsilateral than contralateral connections and stronger homotopic than heterotopic connections. Yet, the magnitude of the differences varied across individuals and studies - much more so when pooling data across sites, even after controlling for study, site, and basic demographic variables (i.e., age, sex, and ethnicity). This indicates that other experimental variables (possibly those not measured or reported) are contributing to this variability, which if not accounted for can limit the value of aggregate datasets, as well as expectations regarding the accuracy of findings and likelihood of replication. We, therefore, provide a set of principles to guide the development of pipelines capable of pooling data across studies while maintaining biological variability and minimizing measurement error. This open science approach provides us with an opportunity to understand and eventually mitigate spurious results for both past and future studies.

Artificial Intelligence

Cognitive Neuroscience

0

Paper

Artificial Intelligence

Cognitive Neuroscience

0

Save

0

Whole-Brain Serial-Section Electron Microscopy In Larval Zebrafish

David Hildebrand et al.May 7, 2017

Investigating the dense meshwork of wires and synapses that form neuronal circuits is possible with the high resolution of serial-section electron microscopy (ssEM)1. However, the imaging scale required to comprehensively reconstruct axons and dendrites is more than 10 orders of magnitude smaller than the spatial extents occupied by networks of interconnected neurons2, some of which span nearly the entire brain. The difficulties in generating and handling data for relatively large volumes at nanoscale resolution has thus restricted all studies in vertebrates to neuron fragments, thereby hindering investigations of complete circuits. These efforts were transformed by recent advances in computing, sample handling, and imaging techniques1, but examining entire brains at high resolution remains a challenge. Here we present ssEM data for a complete 5.5 days post-fertilisation larval zebrafish brain. Our approach utilizes multiple rounds of targeted imaging at different scales to reduce acquisition time and data management. The resulting dataset can be analysed to reconstruct neuronal processes, allowing us to, for example, survey all the myelinated axons (the projectome). Further, our reconstructions enabled us to investigate the precise projections of neurons and their contralateral counterparts. In particular, we observed that myelinated axons of reticulospinal and lateral line afferent neurons exhibit remarkable bilateral symmetry. Additionally, we found that fasciculated reticulospinal axons maintain the same neighbour relations throughout the extent of their projections. Furthermore, we use the dataset to set the stage for whole-brain comparisons of structure and function by co-registering functional reference atlases and in vivo two-photon fluorescence microscopy data from the same specimen. We provide the complete dataset and reconstructions as an open-access resource for neurobiologists and others interested in the ultrastructure of the larval zebrafish.

Artificial Intelligence

Biochemistry

0

Paper

Artificial Intelligence

Biochemistry

0

Save

0

Edge-Parallel Graph Encoder Embedding

Ariel Lubonja et al.May 27, 2024

R

C

A

Artificial Intelligence

Theoretical Computer Science

0

Paper

Artificial Intelligence

Theoretical Computer Science

0

Save

0

T-Rex (Tree-Rectangles): Reformulating Decision Tree Traversal as Hyperrectangle Enclosure

Meghana Madhyastha et al.May 13, 2024

Artificial Intelligence

Computer Science

0

Paper

Artificial Intelligence

Computer Science

0

Save