On Wednesday 1 May I attended a vote of the University of Virginia Faculty Senate to approve the second phase of plans to establish the School of Data Science at the University of Virginia. At this midpoint I will review the process and circumstances for founding a school.
The university in the United States typically starts with a college of arts and sciences and supporting institutions, like a library and administrative system. The school of arts and sciences will teach liberal arts, which includes all the academic subjects which the incoming students had available at the secondary school from which they just graduated. Various people must have theories of education about how much liberal arts background graduates ought to have. I suppose that I expect all students with an undergraduate degree should be able to do basic science research, do critical thinking, appreciate art, compose an essay, do literature review with library resources, participate in group project management, socialize with educated peers, set life skills, and the least important is gaining some introductory specialized knowledge related to any chosen academic major.
If a school grows, then it might add other schools for applied sciences or profession. Common types of additional schools are for medicine, law, engineering, business, accounting, architecture, nursing, education, agriculture, public health, social work, media, or computer science. Reasons for starting an additional school include needed special capacity particularly for that field of study, the existence of a culture of separate professionalism in that field, availability of targeted funding for that field, and a need for separate administration to provide education and conduct research in that field.
We right now at the University of Virginia are establishing a school of data science, which is new and different. “Data science” is a concept which we and everyone else are still defining, but for us so far we have made a convergence of the academic disciplines of computer science, statistics, systems engineering, communication usually in the form of data visualization, and anthropology usually in the form of data ethics. Our research output has been varied, with projects for industries including biomedicine, finance, business administration, logistics, library science, media studies, literature, physics, and for the military. Running through all these applications there is also a concept of pure or theoretical data science technique, which irrespective to subject matter could find trends in weird seemingly meaningless collections of data.
To establish a new school at a university the process is to seek approval from the “faculty senate“, which is the body of all university faculty. In the senate every faculty member gets a vote. The senate routinely meets. Typical business of the senate includes reviewing the budget for shared university resources among all schools, the introduction of new degrees within any school, and entertaining weird demands from any curmudgeonly faculty hecklers. After doing so much community organizing for the global Wikipedia movement it is interesting for me to observe how focused Wikipedia’s community is on digital communication versus how this university’s older tradition operates with pre-digital processes.
For our school of data science, we planned a set of three phases of votes within the faculty senate. The idea behind dividing the school’s founding into phases was that negotiating the entirety of the institution’s founding would be too complicated and unlikely to pass a senate vote. Instead the establishment would happen in parts. A phase would include us drafting documentation to present some ideas, then a series of public discussions about that documentation, then a vote on whether to pass that documentation and move to the next phase.
The first phase was to seek senate support for the idea of establishing a school of data science at all. This phase defined our idea of data science, separated the concept from existing education and research at the university, described the professional landscape of employers and industrial sectors who would hire graduates and fund research, shared a vision, defined a mission within that vision, and gave an overview of the work of the existing Data Science Institute founded about 5 years ago in 2013 from which the school would grow. Missing from this discussion is the unusual circumstance by means of which we proposed to found this school at this time: the donation to establish the school which is between a state of rumor and realness. The situation about money is odd, that social decorum requires us to write out and debate a background narrative and ideology but not mention that there is an invitation from a donor.
The situation with the donor was that no one should talk about the donation until and unless some initial negotiation confirmed that the university would accept the donation to found a school of data science. The social process of accepting the donation weirdly required slighting various communities. The donor’s objective was to give money to found a school. To found a new school at the University of Virginia, the process is to get permission from the “Board of Visitors”, which is the board of trustees of the entire University of Virginia; the faculty senate, the board of trustees of the Data Science Institute itself, and His Excellency the Governor of Virginia. Seemingly not necessary by rule but necessary in practice is the support of the president of the university who brokers with the Board of Visitors and His Excellency.
From one perspective the faculty senate holds the majority investment of power. From another perspective, because the senate is thousands of people and diverse it has communication and coordination difficulty. The governor is one person and the Board of Visitors is a small group, so Phil Bourne the School of Data Science proposed director and Jim Ryan the university president did initial negotiation to get their approval. I imagine the negotiation went something like, “Can we accept $120 million from this guy”, and they said “Sure why not.” I presume that if the donor wanted to establish a petting zoo at the university they would have approved just so long as the proposal was a net positive and lacked major drawbacks. It seems a bit silly to me to require the governor to grant a politician’s opinion for university management, and for its relative oddness as compared to the governance of other universities, the Board of Visitors seems unlikely to have strong opinions on this matter, not that I know what they do in private. Whatever the case, these old ties do give character and tradition to this ancient American institution so I suppose we should grasp as much historicity as we can muster.
The social offense we committed was that technically we should have sought permission from the senate first, but it would have been awkward to tell the senate that we wanted to start a school before we confirmed that we even had a donation. Suppose the donor would have backed out – everyone would have been embarrassed, and by telling thousands of people before confirming the donation we would have solicited tens of thousands of collective labor hours in so many people thinking about and discussing the possibility. By getting permission from the small power holders, that meant that when we went to the senate, then they could be sure that the donation was certain.
While every university might see a changing professional landscape for establishing a school of data science, the cost of being a founder and first mover is high because everyone who delays can enter the field without losing money in the basic research and administrative chaos which anyone can copy after someone does it first. The donor in this case wants to speed the progress of the world along with data science; one school gets established now, then national and global conversation changes, and schools of data science founded later will necessarily respond to the precedent we set. In discussion we came up with “five pillars” of data science education – ethics, data acquisition, data engineering, data analytics, and dissemination. To me as a Wikipedian, I see myself bringing dissemination through the popular Wikipedia / Wikidata platform to the program, and a ready-made set of ethics in openness and diversity, and nifty existing amateur community of practice in acquisition, engineering, and analytics through the routine activity of the Wikimedia community of editors.
I had wondered what it is that Phil Bourne, the director of our institute and dean apparent of the school, can do special. He seems modest and unimposing. He asks simple questions and listens more than he talks. He delegates as much as possible and while he likes to hear updates, instead of guiding anyone he is more likely to lightly change the direction of what someone is doing in a way that seems arbitrary but then matters later. I might not be able to detect all of his expertise but after being around for a few months I had already come to appreciate the way that he does documentation and reporting. My respect for him greatly increased when I came to realize just how fast he writes, and how well organized his writing is on the first draft, and his ability to communicate so concisely. I often write twice as much then delete half, except in these blog posts which I publish with less review. I also sleep on my writing, and Phil does too and he seeks comment, but the way that Phil spins out good first drafts is specialized writing skill set which I wish I had when I want to negotiate with the Wikipedia Senate.
The second phase of negotiation with the senate was to confirm some details about the school, including how many students it would admit, generalities of what kinds of educational programs it would have for which levels of students, how the School of Data Science will collaborate with other schools at the University of Virginia, a schedule of developmental milestones, some projections of funding amounts from various sources, anticipated investment in infrastructure, the plans for granting tenure and seating endowed chairs, an an organizational chart of the various administrative divisions in the school. In summary – this second phase of negotiation defines the logistics of establishing and operating the school. At this phase, the faculty senate has already approved phase one that if there were money, then there could be a school, and now at this phase they consider whether this sort of school is one they could approve.
Some characteristics of the proposed School of Data Science seemed new and different to me. The first difference is the establishment of “data science” as an independent discipline. We imagined that very soon, 18-year olds would join an undergraduate program in data science with intent to get undergraduate degrees called “bachelor of data science” at age 22 then go into the work force with that. I started my imagining of this by imagining the educational sacrifices, as these students would not have as much proficiency in computer science as someone with a degree in that field, nor would they have as much proficiency in statistics, design, communication, or ethics as anyone with those specializations. We all spent a lot of time repeating ourselves in group meetings, talking about whether data science could really be an independent specialty, and whether people would really want it.
Another difference is the related concept of non-traditional tenure and the flat organization. Various people in conversation continually expressed worry that if we granted anyone tenure, then that person’s skills might become outdated more rapidly than is the case in other fields. Data science is something of a technical field depending on software tools in the consumer marketplace, and a person who becomes accustomed to using particular software products may continue to use them after they become obsolete. Software products are not the only issue, but they are a visible issue, as various products also encourage specific workflows and ideologies around themselves. Right now many of us are conscious of the software product choices of each other, and how choosing different products affects worldview and practice. A good example of this is the commercial statistics software SAS versus the free and open R. These are competing products which can address the same problems, but people who use one versus the other often arrive at different conclusions and different social context. Technical proficiency in one does not bring an ability to use the other, and also users tend to have different worldviews about online communication. We do not know what new products will come in the future, but technology changes quickly, and we are sure that we want the school to be able to adapt to new opportunities more quickly than an organization with no planning for technological change. Similarly, we all support and Phil especially encourages us to have a flat hierarchy where proposals for change can come from the bottom from community organization or crowdsourced conversation. This philosophy has been great for me as a Wikipedian because I love digital communication where more people participate.
Another weird part of the school plan is the idea of maximizing integration with other schools. The imagined future is that every field of study will incorporate some data science, and that data science will be part of the curriculum of every degree. In our conversations we do not speculate with science fiction and we hardly talk about future technology. I will only speak for my own self, but in my own head, I am imagining a future where more people collaborate with bots and automated assistants, and where having artificial intelligence complement all other fields is normal. Because of this, our school should begin research collaborations with all other schools. Adoption of new technology is a social challenge more than a limitation of technical infrastructure, so the sooner we can begin the social and ethical conversations in other schools, the sooner that other fields can take advantage of AI and data science opportunities when they become available.
The third phase of voting is in the future. I expect that will go into details about the broad planning of the second phase.